Subtle attribute interpolation
UTSGAN can trigger smooth and evident subtle image editing along with single attribute interpolation.
In this work, we revisit and approach Image2Image (I2I) translation from a novel consistency perspective. We particularly study the generalization of consistency in I2I, which is the key factor to endlow a model ability to acheive satisfactory attribute-consistent results with arbitrary complex attribute configurations. To be specific, in an I2I translation task, we expect consistent results that can correspond to its input, while showing specific desired attribute property (change). Existing GAN-based generative frameworks commonly focus on maintaining such consistency based on the final translated results, with commonly adopted strategies including image reconstruction, cycle-consistency, attribute-preserving constraints or intermediate image-feature consistent constraints. However, with a traditional attribute-conditional formulation and those classic result consistency constraints, a translation method is restricted to only leverage predefined conditions exampled in the training set for consistency learning. This dilima would limit the model's generalization ability, making them achieve unsatisfactory results in challenging tasks where the expected attribute configuration is unseen/unknown during training. However, to controllably generate/design data for unseen domains, or with unobserved attributes, is a highly demanding expection in real-world applications.
We overcome this dilemma by proposing a transition-aware I2I formulation and a generalized transition consistency strategy which particularly highlights the consistency regularization on unseen translations to gain model generalization ability without acessing massive training data.
(1). We explicitly parameterize the translation mapping with a transition variable \(\small{t \triangleq t(x,y)}\) and adopt a simple linear operator on the desireable semantic annotation of input-output data pairs, denoted as \(\small{S}\), to define the transition metric for an I2I translation task. In this way, each data mapping is characterized with the desired semantic change through the translation, i.e.
This results in a novel transition-aware I2I formulation . With the elegancy of simple linear operator, we can explain the self-reconstruction, cycle-consistency, and attribute preserving strategies (\(\small{S_{x}=0}\), i.e. to intuitively discard the attribute of the input image) adopted in existing works. More importantly, such an explicit transition paramerization enables us to easily instantiate unobserved translations with arbituary attribute property (change) and allows for an essential consistency enforcement on the translation process itself, i.e., transition consistency. Transition consistency is also valid on unobserved translations/conditions. It works in higher granularity than result consistency, and does not solicit paired data to be generalized to a distribution-level .
(2). We propose UTSGAN, which introduces a generative transition mechanism that models a transition manifold with a stochastic encoder. In this way, we obtain triplets consisting of three interrelated variables \(\small{(X, T, Y)}\), where triplet consistency can be naturally enforced through bidirectional interactive processes, including result consistency for transition-conditional generation and transition consistency for transition prediction. We coherently regularize transition consistency on both observed translations with predefined transitions in the training set and unobserved transitions sampled from the manifold, together with the traditional result consistency. With a coherent triple consistency regularization, the virtuous collaboration between transition generation, and transition-conditioned I2I translation contribute to a consistent data interpolation paradium that can benefit the model's generalization ability with a Self Supervised Learning (SSL) mechanism supported by the annotated transitions of data pairs .
(3). We perfrom triplet distribution matching for UTSGAN, by equiping it with a triplet discriminator that generalize the coherent sample-level consistency regularization to the distribution-level. In this way, UTSGAN enforces a comprehensive consistency regularization for I2I translation based on the second-order logic of \(\small{X}\) and \(\small{Y}\), where for \(\small{\forall t \in T}\), \(\small{\forall x \in X}\), \(\small{x \overset{t}{\mapsto}y}\) to be true, i.e., \(\small{X \overset{T}{\mapsto}Y}\), must hold. This gives UTSGAN superiority in model generalization regarding attribute-consistent generalization. A simple model design of UTSGAN is depicted as follows
We adopt I2I, where a consistent transaltion can be easily demonstrated, to verify the efficiency of our proposed paradigm. Our UTSGAN is capable to handle various I2I translation tasks, with the transition characterized by handcrafted aspects of attribute change . Here we present the challenging tasks of multiple attribute face editing (complex attribute configuration), single attribute interpolation and multi-domain style generalization (unseen attribute/domain configuration). You can refer to our paper for more transition configuration and more results of attributed natural scenes editing and image inpainting tasks.
@article{shi2023utsgan,
title={UTSGAN: Unseen Transition Suss GAN for Transition-Aware Image-to-image Translation},
author={Shi, Yaxin and Zhou, Xiaowei and Liu, Ping and Tsang, Ivor W},
journal={arXiv preprint arXiv:2304.11955},
year={2023}
}