[Paper Review] Cross-Domain and Disentangled Face Manipulation with 3D Guidance

업데이트: July 30, 2021

Paper : CDDFM3D: Cross-Domain and Disentangled Face Manipulation with 3D Guidance (2021): review, project, code
Paper : Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (arxiv 2019 /Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky)
GAN-Zoos! (GAN 포스팅 모음집)

⭐️ Cross-Domain and Disentangled Face Manipulation with 3D Guidance

• We propose the first method to manipulate semantic attributes of out-of-domain faces with human face 3DMM

• Our reduced StyleSpace and disentangled attribute-latent mapping guarantee disentangled and precise controls of all facial attributes, achieving disentangled and high-quality face manipulation

• Our cross-domain adaptation approach bridges domain discrepancies and makes human face 3DMM applicable to outof-domain faces by enforcing a consistent latent space embedding.

Main Idea

1. Disentangled attribute-latent mapping

human face domain에서 3DMM parameter에 따라 latent vector도 변화할 수 있도록 disentangle한 attribute-latent mapping structure를 구성
related work
- (1) StyleRig : StyleGAN2의 latent code를 3DMM param에 따라 indivisible element로 mapping
- (2) PIE : hierarchical optimization으로 input image를 StyleGAN2 latent space ($\mathcal{W+}$)으로 embedding
Reduced StyleSpace (this model) : semantic 3DMM attribute에 따라 subspace로 decompose. 저자들은 이 방법이 StyleRig나 PIE보다 disentanglement하다고 주장.

2. Cross-Domain Adaptation

human face뿐만 아니라 모든 domain에서 가능하도록 adaptation
related work - toonify : latent-consistent finetuning → human face domain과 webtoon domain이 latent space를 공유하고 있기 때문에 latent inversion과 manipulation을 통해 다양한 attribute들을 manipulation할 수 있음.
In this model, human face stylegan generator를 사용해서 이미지를 manipulation한 후, latent inversion을 통해 얻어진 latent code를 다시 fine-tuned StyleGAN2에 넣어서 이미지를 editing

PipeLine

Attribute-Adaptive Latent Decomposition

(1) Reduced StyleSpace

StyleGAN2 : latent vector로부터 이미지 생성 $I = G(w)$
extended latent vector(1024-StyleGAN2) : $w \in \mathcal{W^+} \subsetneq \mathbb{R}^{18 \times 512}$
StyleSpace : 26 layer의 style parameters를 사용 (total-dim = 9088)
- 본 논문에서는 $\mathcal{W^+}$ 보다 dientangle한 $\mathcal{S} $를 사용
Reduced StyleSpace : 특정 attribute $P^i$에 관련되어있는 latent code $w^i \in \mathcal{S^i}$를 찾은 후 이를 조절

(2) Attribute-Adaptive Layer Selection

⭐️ Goal : 각 individual attribute와 관련되어있는 stylespace의 layer를 찾기

→ StyleRig network을 차용하여 attribute-to-latent encoding !! (in StyleSpace)

source code $w_{s}$를 target attribute parameter $\boldsymbol{P}{t}$에 따라 optimization → $w{g}$

\[w_{g}=\mathrm{S}\left(w_{s}, \boldsymbol{P}_{t}\right)\]

random attribute-latent pair $(w, P)$ 가 주어졌을 때, random하게 $P^i$를 변화시킨 후 pre-trained StyleRig Network를 통해 상응하게 변화하는 $\Delta{w}$를 찾기

Disentangled Face Manipulation

Mapping Structure between the latent space $\mathcal{S}$ and the attribute space $\mathcal{P}$

(1) Attribute Prediction Network

source image의 latent code $w_{s} \in S$ 가 주어지면 이 코드로 조작할 수 있는 3D facial attributes $P_s$를 찾아야함.
이를 관계를 찾아주는 network가 Attribute Prediction Network $P=T(w)$

(2) Latent Manipulation Network

source image와 관련있는 attribute $P_s$를 찾았으면, 이제 이를 target attribute $P_t$ manipulation 해야함.
attribute-to-latent translation : mapping function으로

\[\Delta w^{i}=\mathrm{F}^{i}\left(w_{s}^{i}, P_{t}^{i}-P_{s}^{i}\right)=\mathrm{D}^{i}\left(\mathrm{E}^{i}\left(w_{s}^{i}\right), P_{t}^{i}-P_{s}^{i}\right)\]

이후 linear 하게 update $w_{g}^{i}=w_{s}+\Delta w^{i}$

Result

User Manipulation InterFace

ex. webtoon image → latent code (by FineTuned StyleGAN2)
latent code → 3D parameters (by attribute prediction network)
user editing : 3D
edited 3D parm → latent space (by latent manipulation network)
edited latent code → final editing result (by FineTuned StyleGAN2)

Result Image

Twitter Facebook LinkedIn

Jihye Back

[Paper Review] Cross-Domain and Disentangled Face Manipulation with 3D Guidance

Main Idea

PipeLine

Attribute-Adaptive Latent Decomposition

(1) Reduced StyleSpace

(2) Attribute-Adaptive Layer Selection

Disentangled Face Manipulation

(1) Attribute Prediction Network

(2) Latent Manipulation Network

Result

User Manipulation InterFace

공유하기

댓글남기기

참고

[Paper Review] RAFT: Adapting Language Model to Domain Specific RAG 논문 리뷰

[4] RAG (Retriever Augumented Generation)

[3-2] A Survey of Large Language Models - Adaptation of LLMs

[3-1] A Survey of Large Language Models - 다양한 LLMs부터, Data, Architecture, Training 까지