Turning a chair into a table, or vice versa, might sound like somewhat of a magic trick. In this case, zero magic is involved, just plenty of complex geometry and machine learning.
Called LOGAN, the deep neural network, i.e., a machine of sorts, can learn to transform the shapes of two different objects, for example, a chair and a table, in a natural way, without seeing any paired transforms between the shapes. All the machine had seen was a bunch of tables and a bunch of chairs, and it could automatically translate shapes between the two unpaired domains. LOGAN can also automatically perform both content and style transfers between two different types of shapes without any changes to its network architecture.
The team of researchers behind LOGAN, from Simon Fraser University, Shenzhen University, and Tel Aviv University, are set to present their work at ACM SIGGRAPH Asia held Nov. 17 to 20 in Brisbane, Australia. SIGGRAPH Asia, now in its 12th year, attracts the most respected technical and creative people from around the world in computer graphics, animation, interactivity, gaming, and emerging technologies.
"Shape transform is one of the most fundamental and frequently encountered problems in computer graphics and geometric modeling," says senior coauthor of the work, Hao (Richard) Zhang, professor of computing science at Simon Fraser University. "What is new and emerging is to tie this important problem to deep learning--can a machine learn to transform shapes, particularly under the unsupervised or unpaired setting?"
In this work, the researchers turned to a well-known technique in machine learning, Generative Adversarial Network (GAN), for unpaired general-purpose shape transforms. Their network is trained on two sets of shapes, e.g., tables and chairs or different letters. There is neither a pairing between shapes in the two domains to guide shape translation nor any point-wise correspondence between any shapes. Once trained, the researchers' method takes a point-set shape from one domain, a table or a chair, and transforms into the other.
LOGAN overcomes a key challenge in shape transform techniques. Given two sets of shapes--chairs and tables--it is challenging for the network to learn which particular shape features should be preserved or altered to result in realistic transformation of the object, from chair to table and vice versa. The team's method learns the unique differences in features and can automatically determine which features should be kept or discarded in order to achieve the desired shape transform, and can do so without supervision.
Other techniques in computer vision for unpaired image-to-image translation have been developed and have been successful in translating style features, but most have not achieved shape translation. "In 2017, CycleGAN and DualGAN, two highly influential works from computer vision were developed for unpaired image-to-image style translation. LOGAN specifically produces realistic shape translations, both in style and content, for the first time,'' notes Zhang. Additionally, the researchers demonstrate that LOGAN can learn "style-preserving" content transfers. For instance, the network can automatically transform a letter 'R' into a 'P' of the same font style, or with respect to style translation, their method is able to translate a bold-face letter 'A' into an italicized 'A'.
To devise their method, the researchers train a neural network which encodes the two types of input shapes into a common latent space. In deep learning, the latent space is represented by the bottleneck layer where the network captures the features of the input data. LOGAN is not only trained to turn a chair code to a table code, but also trained to turn a table code to the same table code. The latter enables "feature preservation" and helps maintain certain table features during chair-to-table shape translations.
In ablation studies, the researchers demonstrate LOGAN's superior capabilities in unpaired shape transforms on a variety of examples over baselines and state-of-the-art approaches. Their study shows that LOGAN is able to learn what shape features to keep during transforms, and the results accurately resemble the desired object.
In future work, the team aims to fine tune LOGAN to work on all domain pairs to make it truly general-purpose. The current version of LOGAN also is not yet smart enough to understand the meanings of the shapes, and the researchers are working on making the network "smarter" to incorporate this information.