[Paper Review] AnimeGAN: A Novel Lightweight GAN for Photo Animation 논문 리뷰

업데이트: March 16, 2022

Paper: AnimeGAN: A Novel Lightweight GAN for Photo Animation (2020): paper, project, code
GAN-Zoos! (GAN 포스팅 모음집)

AnimeGAN

light-weight GAN, Gram matrix

unsupervised learning (using unpaired data)

3 loss: (1) grayscale style loss (2) grayscale adv-loss (3) color recon loss (+) edge-promoting adv-loss

Architecture

Generator

symmetrical encoder-decoder network
mainly composed of the standard convolutions, the depthwise separable convolutions, the inverted residual blocks (IRBs), the upsampling and downsampling modules.
- last conv layer는 1x1 convolution kernels

Conv-Block, DSConv, IRB의 구조는 Fig 2 참고
Down-Conv
- max-pooling을 사용하여 downsampling을 하면 feature 정보가 손실되는 문제가 있음
- 저자들은 Down-Conv 를 사용하여 이 문제를 해결
Up-Conv
- feature map의 resolution을 키우는 upsampling 모듈
- 보통 Up-Conv로 stride 1/2 를 사용하는 경우가 많은데, 이렇게 하면 checkerboard artifacts가 많이 생겨서 이렇게 안했다고 함
8 consecutive & identical IRBs
- network의 중앙에 IRBs 모듈을 두어서 generator의 parameter를 많이 줄임
- pointwise convolution (w 512 kernels), depthwise conv (w 512 kernels), pointwise conv (w 256 kernels) 로 구성됨

Discriminator

Generator보다 간단함
모든 conv는 standard conv, 각 conv에다가 spectral normalization을 적용하여 훈련을 보다 안정하게 해줌

Loss

Unpaired training dataset

real photo domain: $S_{\text {data }}(p)={p_{i} \mid i=1, \cdots, N} \subset P$
animation domain: $S_{d a t a}(a)={a_{i} \mid i=1, \cdots, M} \subset A$
- gray scale: color animation image $a_i$ 에서 grayscale Gram matrix를 취해 gray scale image $x_i$ 를 만듦
  \[S_{\text {data }}(x)=\{x_{i} \mid i=1, \cdots, M\} \subset X\]
- edge를 제거한 이미지
  \[S_{\text {data }}(e)=\{e_{i} \mid i=1, \cdots, M\} \subset E\]
  - gray scale images, $S_{\text {data }}(y)$: color 정보가 포함되어있는 $S_{\text {data }}(e)$에서 color 정보를 제거하여 grayscale image를 만듦

Loss Function

\[L(G, D)=\omega_{a d v} L_{a d v}(G, D)+\omega_{c o n} L_{c o n}(G, D)+\omega_{g r a} L_{g r a}(G, D)+\omega_{c o l} L_{c o l}(G, D)\]

$\omega_{a d v}=300, \omega_{c o n}=1.5, \omega_{\text {rra }}=3 \text { and } \omega_{c o l}=10$
content loss
- 생성된 이미지가 input image의 content를 유지하도록 강제하는 loss
- pre-trained VGG19를 사용
\[L_{c o n}(G, D)=E_{p_{i} \sim S_{\text {data }}(p)}[\|V G G_{l}(p_{i})-V G G_{l}(G(p_{i}))\|_{1}]\]
- $G(p_{i})$ : 생성된 이미지
grayscale style loss
- 생성된 이미지의 texture나 line이 anime style이도록 강제하는 loss
\[\begin{array}{r}L_{\text {gra }}(G, D)=E_{p_{i} \sim S_{\text {data }}(p)}, E_{x_{i} \sim S_{\text {data }}(x)}[\| \operatorname{Gram}(\operatorname{VG} G_{l}(G(p_{i}))).-\operatorname{Gram}(\operatorname{VGG}(G_{l}(x_{i})) \|_{1}]\end{array}\]
color recon loss
- 생성된 이미지와 original image의 color가 유사하도록 강제하는 loss
- RGB 색공간 대신 YUV 색공간을 사용
  - luminance에 해당하는 Y channel에 대해서는 L1 loss를 사용하고
  - color에 해당하는 U, V channel에 대해서는 Huber loss를 사용하여 color에서 차이가 크면 좀 더 penalty를 줌
\[\begin{array}{r}L_{c o l}(G, D)=E_{p_{i} \sim S_{\text {data }}(p)}[\|Y(G(p_{i}))-Y(p_{i})\|_{1}+\|U(G(p_{i}))-U(p_{i})\|_{H}. \\.+\|V(G(p_{i}))-V(p_{i})\|_{H}]\end{array}\]

최종 Generator Loss

\[\begin{array}{r}L(G)=\omega_{a d v} E_{p_{i} \sim S_{\text {data }}(p)}[(G(p_{i})-1)^{2}]+\omega_{c o n} L_{c o n}(G, D) \\+\omega_{g r a} L_{g r a}(G, D)+\omega_{c o l} L_{c o l}(G, D)\end{array}\]

Discriminator Loss

\[\begin{array}{r}L(D)=\omega_{a d v}[E_{a_{i} \sim S_{d a t a}(a)}[(D(a_{i})-1)^{2}]+E_{p_{i} \sim S_{d a t a}(p)}[(D(G(p_{i})))^{2}]. \\.+E_{x_{i} \sim S_{\text {data }}(x)}[(D(x_{i}))^{2}]+0.1 E_{y_{j} \sim S_{\text {data }}(y)}[(D(y_{i}))^{2}]]\end{array}\]

$E_{x_{i} \sim S_{\text {data }}(x)}[(D(x_{i}))^{2}]$ : grayscale adv-loss
$E_{y_{j} \sim S_{\text {data }}(y)}[(D(y_{i}))^{2}]$ : edge-promoting adv-loss
- 생성된 이미지가 좀 더 clear한 이미지를 생성하도록 하는 loss
- 이미지가 너무 sharp 해질까봐 loss 앞에 0.1을 붙였다고 함

Training

GAN 모델은 highly nonlinear하고 random으로 initilization하기 때문에 local optima에 빠지기 쉬움

⇒ CartoonGAN처럼 generator를 먼저 pre-training 한 다음에 fine tuning하는 방법을 채택

Experiments

Dataset

real-world content data: 6656 photos
style images
- 1792 images from the movie “The Wind Rises” (Miyazaki Hayao style)
- 1650 images from the movie “Your Name” (Makoto Shinkai)
- 1553 images from the movie “Paprika” (Kon Satoshi)

Results

모델 사이즈가 훨씬 작음

Loss Function

\[L(G, D)=\omega_{a d v} L_{a d v}(G, D)+\omega_{c o n} L_{c o n}(G, D)+\omega_{g r a} L_{g r a}(G, D)+\omega_{c o l} L_{c o l}(G, D)\]

$\omega_{a d v}=300, \omega_{c o n}=1.5, \omega_{\text {rra }}=3 \text { and } \omega_{c o l}=10$
content loss가 너무 커버리면 생성된 이미지가 real image와 유사해지고, style loss가 너무 크면 생성된 이미지가 original photo의 content를 잃어버리게 됨
- 이 두 loss는 다른 loss들에 비해 커서 small weight를 갖도록 parameter를 조절

color recon loss는 생성된 이미지가 input image의 loss를 따라가도록 강제하는 loss이기 때문에 이 값이 클수록 두 이미지간의 색감은 비슷해짐
$\omega_{a d v}=300, \omega_{col}=10$ 일 때의 결과가 가장 좋음

A: grayscale adv loss없이 실험한 결과
B: edge promoting adv-loss의 이미지로 gray scale image $S_{\text {data }}(y)$가 아닌 blurred color edge images $S_{\text {data }}(e)$를 사용
C: AnimeGAN 결과
A와 C의 결과를 비교해보면, A는 색감에 대한 학습이 잘 안되고 있음

→ grayscale adv loss가 색감을 더 잘 학습하도록 도움
B와 C의 결과 → grayscale image로 edge-promote adv loss를 썼을 때 edge가 더 분명하게 학습이 될 뿐만 아니라 색깔도 더 잘 학습된다.

Main Contributions ⭐️

1) the novel grayscale style loss for transforming the anime style textures and lines; 2) the novel color reconstruction loss to preserve the color of the content images; 3) the novel grayscale adversarial loss for preventing the generated images from being displayed as the grayscale images; 4) a lightweight generator using depth- wise separable convolutions and inverted residual blocks to achieve faster transfer.

Twitter Facebook LinkedIn

Jihye Back

[Paper Review] AnimeGAN: A Novel Lightweight GAN for Photo Animation 논문 리뷰

Architecture

Loss

Training

Experiments

Dataset

Results

공유하기

댓글남기기

참고

[Paper Review] RAFT: Adapting Language Model to Domain Specific RAG 논문 리뷰

[4] RAG (Retriever Augumented Generation)

[3-2] A Survey of Large Language Models - Adaptation of LLMs

[3-1] A Survey of Large Language Models - 다양한 LLMs부터, Data, Architecture, Training 까지