[Paper Review] GANSketching: Sketch Your Own GAN 논문 리뷰

업데이트: February 10, 2022

Paper : Sketch Your Own GAN (ICCV 2021) (arxiv, project, code)
GAN-Zoos! (GAN 포스팅 모음집)

1. GAN Sketching

Generative model: input(user's hand-drawn sketch) → output(realistic image)

user가 그린 sketch의 shape과 pose를 그대로 반영하는 진짜 같은 이미지를 생성

sketch dataset은 매우 few-shot

large-scale의 실사 image dataset으로 generative를 학습시킨 다음, 이 pre-trained model과 user sketch가 잘 match되도록 fine-tuning method를 도입

cross-domain adversarial loss

regularization method

Sketch Editing

Sketch Generation

A Neural Representation of Sketch Drawings

Unpaired learning

Photo2Sketch

Photo-Sketching:Inferring Contour Drawings from Images

3. Method

⭐ Challenging ⭐

user-provided sketch data의 수가 매우 적음

input과 output data의 domain이 달라서 sketch data로부터 실사 이미지 생성이 어려움

3.1 Cross-Domain Adversarial Learning

large-scale training dataset: $\mathbf{x} \sim p_{\text {data }}(\mathbf{x})$
- pre-trained GAN: $G(\mathbf{z} ; \theta)$
  - low-dim z 에서 image x를 생성하는 Generator
few human sketch: $\mathbf{y} \sim p_{\text {data }}(\mathbf{y})$
Goal: 새로운 GAN model $G(\mathbf{z} ; \theta’)$를 학습
- 이 모델은 X의 distribution을 가지는 이미지를 생성하지만, output 이미지가 sketch (y)와 비슷함

Cross-Domain Image Translation Network

우리는 user가 제공한 sketch에 대응되는 ground truth이미지를 가지고 있지 않음 (이전 연구들이 어려움을 겪은 원인)
- 새로운 cross-domain image translation network 도입
\[F: \mathcal{X} \rightarrow \mathcal{Y}\]
- $F$: pre-trained Photo-sketch network 사용
  - Paper: Photo-Sketching:Inferring Contour Drawings from Images

Cross-Domain Adversarial Learning

\[\begin{aligned}\mathcal{L}_{\text {sketch }} &=\mathbb{E}_{\mathbf{y} \sim p_{\text {data }}(\mathbf{y})} \log \left(D_{Y}(\mathbf{y})\right) \\&+\mathbb{E}_{\mathbf{z} \sim p(\mathbf{z})} \log \left(1-D_{Y}(F(G(\mathbf{z})))\right)\end{aligned}\]

이 loss는 sketches $\mathcal{Y}$랑 match되는 generated image를 생성하도록 도와줌
PhotoSkech(F)의 output과 user sketch의 차이가 오히려 object의 shape을 더 잘 학습하는데 도움을 줌

3.2 Image Space Regularization

3.1의 $\mathcal{L}_{\text {sketch }}$만을 사용하면 생성시 image의 quality와 diversity가 현저히 떨어진다.

→ Generator가 original dataset의 분포를 잘 따르도록 loss를 추가

\[\begin{aligned}\mathcal{L}_{\text {image }} &=\mathbb{E}_{\mathbf{x} \sim p_{\text {data }}(\mathbf{x})} \log \left(D_{X}(\mathbf{x})\right) \\&+\mathbb{E}_{\mathbf{z} \sim p(\mathbf{z})} \log \left(1-D_{X}(G(\mathbf{z}))\right)\end{aligned}\]

Weight regularization as an alternative

다음 loss를 통해 weight regularization을 할 수도 있다.

\[\mathcal{L}_{\text {weight }}=\left\|\theta^{\prime}-\theta\right\|_{1}\]

이 regularization은 $\mathcal{L}_{\text {image }}$를 통한 image space regularization보다 성능이 약간 저하된다. 그러나 이를 적용하면 image quality와 shape matching의 balance를 맞추기 용이하다.

3.3 Optimization

Full objective

\[\mathcal{L}=\mathcal{L}_{\text {sketch }}+\lambda_{\text {image }} \mathcal{L}_{\text {image}}\]

$\lambda_{\text {image }} =0.7$ 로 두어 적절히 image를 regularization

minimax objective

\[\theta^{\prime}=\arg \min _{\theta^{\prime}} \max _{D_{X}, D_{Y}} \mathcal{L}\]

Generator $G(\mathbf{z} ; \theta’)$

Which layers to edit.

model이 overfitting되는 걸 막고, fine-tuning 속도를 높이기 위해 저자들은 StyleGAN2의 mapping network를 fine-tuning하였다.

→ MLP를 수정하면 target distribution을 더 잘 포착하기 때문에 few-shot 연구에 효과적이라고 보고

Pre-trained weight

pre-trained Photosketch network $F$ 사용 (training 동안 fix)
$D_X, D_Y$: original GAN의 pre-trained weight를 사용 → fine tuning

Data augmentation

Differentiable augmentation for data-efficient gan training
- few-shot으로 GAN을 학습하는 경우, Discriminator가 훈련 데이터를 기억하여 성능이 크게 저하되는 문제가 있음
- 이를 해결하기 위해 differentiable augmentation 논문은 다양하게 data를 augmentation!(cut-off, translation, color …)

저자들은 위 논문의 augmentation 기법을 sketch 훈련에 사용(특히 translation aug를 사용했다고 함)

이 기법은 한장 or 몇장 없는 hand-drawn sketch input를 aug하는데 많은 도움을 주었다고 report

4. Experiments

4.1 Evaluations

Datasets

Photosketch model로 실제 이미지(LSUN [69] horses, cats, and churches)를 sketch로 변환
user sketch input와 비슷한 shape과 pose을 가진 이미지를 hand-select (1에서 30장정도)
input sketch와 비슷한 이미지 2500장을 추가적으로 hand-select
- chamfer distance가 작은 이미지(비슷한 이미지들) 10,000장 중에서 골랐다고 함

Our method is given access only to the 30 designated sketches; the sets of 2,500 real images represent the real but unseen target distributions.

Baselines

Quantitative analysis

baseline보다 FID가 더 좋음
baseline은 sketch를 잘 반영 못함

Ablation study

Augmentation

aug를 한다고 반드시 성능이 좋아지지는 않음
그러나 human-created sketch에 대해서는 처음에 반드시 augmentation을 해야함

Comparing regularization methods

regularization을 하면 $\mathcal{L}_{\text {sketch}}$로만 학습을 했을 때보다 FID가 좋아짐
- 이미지 quality도 향상되며
- 더 다양한 이미지가 생성
$\mathcal{L}{\text {image}}$ 로 학습한 결과가 $\mathcal{L}{\text {sketch}}$ 로 학습한 것보다 좋음

D scratch

pre-trained Discriminator를 사용하지 않고 random으로 초기화해서 scratch부터 학습시킨 $D_Y$를 사용했더니 현저하게 성능이 안좋아졌다고 함

GAN fine-tuning 쪽 논문들 참고

W-shift

저자들은 mapping network에 약간의 bias를 추가하여 W space를 shift 해봤다고 한다. 그 결과 성능이 좋아지기는 하지만, 그냥 전체 mapping network를 fine-tuning하는게 더 효과적이라고 한다.

Fewer sketch samples

Quantitative analysis

1 or 5-shot으로 학습한 GAN Sketching 모델이 Original 모델보다 효과적
30-shot은 결과가 엄청남

Testing using real human sketches

Single sketch로 학습시킨 결과

Multiple sketch로 학습시킨 결과

single sketch로 학습시켰을 때 실패했던 case에 대해서 성공하기도 함

User sketch를 augmentation

data를 augmentation했더니 성능이 더 향상됨

StyleGAN2 FFHQ model을 customizing & 4 user sketch로 학습한 결과 (w/ aug)

4.2 Applications

Latent space edits

GANSpace에서 제시한 방법대로 latent editing을 할 수 있음

interpolation도 가능

Natural image editing with our models

(a) original model을 활용하여 real image를 latent space로 projection할 수 있음 (z)

다음 논문을 참고하여 image projection
- pix2latent: framework for inverting images into generative models

(b) sketch를 가지고 훈련된 standing cat customed model에 projected z를 feeding

(d) GANSpace 모델로 editing!

Interpolating between customized models

2가지 방식으로 interpolation이 가능

W-space에서 interpolation
- 같은 noise vector z 를 두 개의 다른 model에 넣어서 two latents를 얻음 $w_1, w_2$
- interpolation: $(1-\alpha) w_{1}+\alpha w_{2}$
model weight를 interpolation

$(1-\alpha) \theta_{1}+\alpha \theta_{2}$

5. Limitation

모든 sketch에 대해서 동작하는게 아님, failure case들이 있음

real-time으로 모델을 customizing 하는게 불가능함
- 현재 모델은 훈련동안 30K의 iteration이 필요
pre-trained model을 사용하지만 pre-trained model을 학습시키는데 사용한 training dataset도 필요하다.
- $\mathcal{L}_{\text {image }}$로 모델을 학습시킬 때, original training dataset이 필요했었음
pose나 shape을 변경할 수는 있지만, color나 texture는 control이 어려움

Twitter Facebook LinkedIn

Jihye Back

[Paper Review] GANSketching: Sketch Your Own GAN 논문 리뷰

1. GAN Sketching

3. Method

3.1 Cross-Domain Adversarial Learning

3.2 Image Space Regularization

3.3 Optimization

4. Experiments

4.1 Evaluations

4.2 Applications

5. Limitation

공유하기

댓글남기기

참고

[Paper Review] RAFT: Adapting Language Model to Domain Specific RAG 논문 리뷰

[4] RAG (Retriever Augumented Generation)

[3-2] A Survey of Large Language Models - Adaptation of LLMs

[3-1] A Survey of Large Language Models - 다양한 LLMs부터, Data, Architecture, Training 까지

Jihye Back

1. GAN Sketching

2. Related Work

3. Method

3.1 Cross-Domain Adversarial Learning

3.2 Image Space Regularization

3.3 Optimization

4. Experiments

4.1 Evaluations

4.2 Applications

5. Limitation

공유하기

댓글남기기

참고

[Paper Review] RAFT: Adapting Language Model to Domain Specific RAG 논문 리뷰

[4] RAG (Retriever Augumented Generation)

[3-2] A Survey of Large Language Models - Adaptation of LLMs

[3-1] A Survey of Large Language Models - 다양한 LLMs부터, Data, Architecture, Training 까지