[Paper Review] ACGAN: Conditional Image Synthesis with Auxiliary Classifier GANs 간단한 논문 리뷰

업데이트: April 17, 2021

✍🏻 이번 포스팅에서는 class에 따라 이미지를 합성하는 Auxiliary Classfier GANs, ACGAN model에 대해 살펴본다.

Paper : Conditional Image Synthesis With Auxiliary Classifier GANs (ICML 2017 / Augustus Odena, Christopher Olah, Jonathon Shlens)
GAN-Zoos! (GAN 포스팅 모음집)

ACGAN은 CGAN이후에 나온 논문으로, Discriminator가 real/fake를 판별할 뿐만 아니라 class prediction도 같이 학습을 한다.

[CGAN] Conditional Generative Adversarial Nets (2014) : Review

모델의 architecture는 매우 간단하며, 생성 이미지의 다양성을 측정하기 위한 metric 역시 추가적으로 제시한다.

현재에는 잘 사용하지 않는 모델이므로 간단하게 리뷰를 작성하였다😉

1. Introduction

Demonstrate an image synthesis model for all 1000 ImageNet classes at a 128x128 spatial resolution (or any spatial resolution - see Section 3).
Measure how much an image synthesis model actually uses its output resolution (Section 4.1).
Measure perceptual variability and ’collapsing’ behavior in a GAN with a fast, easy-to-compute metric (Section 4.2).
Highlight that a high number of classes is what makes ImageNet synthesis difficult for GANs and provide an explicit solution (Section 4.6).
Demonstrate experimentally that GANs that perform well perceptually are not those that memorize a small number of examples (Section 4.3).
Achieve state of the art on the Inception score metric when trained on CIFAR-10 without using any of the techniques from (Salimans et al., 2016) (Section 4.4).

2. AC-GANs (Auxiliary Classifier GAN)

(출처) Naver AI Lab 최윤제 연구원님 발표자료

ACGAN의

Generator는 noise를 sampling할 때 class label도 같이 sampling한다.

\[X_{\text {fake }}=G(c, z)\]

Discriminator : source와 class label의 확률분포를 바탕으로 학습을 한다.

\[P(S \mid X), P(C \mid X)=D(X)\]

Object Function

log likelihood of the correct source $L_s$

\[\begin{array}{r} L_{S}=E\left[\log P\left(S=\text { real } \mid X_{\text {real }}\right)\right]+E\left[\log P\left(S=\text { fake } \mid X_{\text {fake }}\right)\right] \end{array}\]

log likelihood of the correct class $L_c$

\[\begin{array}{r} L_{C}=E\left[\log P\left(C=c \mid X_{\text {real }}\right)\right]+E\left[\log P\left(C=c \mid X_{\text {fake }}\right)\right] \end{array}\]

D는 $L_s + L_c$를, G는 $L_c - L_s$를 최대화하는 방향으로 학습한다.

real/fake 쪽을 판별하는 부분은 G와 D가 적대적으로 학습을 하지만, class prediction 쪽은 adversarial 하지 않게 학습한다.

또한, 이전 연구들은 class의 수를 늘리면 quality가 줄어들었지만, ACGAN은 class별로 큰 데이터셋을 나눈 후 각 subset에 대해 G와 D를 학습할 수 있기 때문에 안정적이며 quality 역시 괜찮다.

1000개 class의 ImageNet를 학습시킬 때 10개단위로 크게 나눈 후 100개의 AC-GANs을 학습시켰다.

3. Results

본 논문은 ImageNET의 dataset을 가지고 ACGAN을 훈련시킨다.

Generating High Resolution Images Improves Discriminability

ACGAN은 64 x 64와 128 x 128의 resolution을 가진 이미지를 생성한다. 실험을 해보면 고해상도의 이미지는 저해상도의 sample을 단순히 resizing하는 것이 아니라, 실제 이미지의 특성을 더 잘 반영함을 확인할 수 있었다.

고해상도의 이미지를 생성할 수록 discriminability가 증가하였다.

Measuring the Diversity of Generated Images

✍🏻 본 논문에서는 image discriminability를 측정하기 위한 metric인 MS-SSIM metric을 제안하였다. (자세한 설명은 생략)

각 class내의 이미지의 개수가 적다면, 생성모델은 해당 클래스 내의 사진들을 암기할 수도 있다. mode collapse가 생길 수도 있는데, Inception score로는 이를 측정하지 못한다.

따라서 생성모델을 평가하려면 다양성을 측정하는 것이 중요하다. 본 논문에서는 MS-SSIM(0~1)의 method를 사용하여 모델의 다양성을 평가한다. 다양성이 높을수록 MS-SSIM score가 낮게 나온다.

ACGAN에서는 기존의 GAN과는 다르게 높은 해상도의 이미지를 생성할수록 다양한 이미지가 생성된다. (mode collapse에 빠질 확률이 줄어듦)

기존의 SoTA model보다 이미지가 잘 생성된다.

4. Discussion

class에 대한 정보를 제공하면 high-resolution 이미지의 discriminability 훈련이 더 잘므로, 더 다양한 이미지를 생성할 확률이 높다. 본 논문은 100개의 class를 가진 ImageNET dataset에 대해서도 고해상도(128 x 128)의 이미지를 합성할 수 있음을 보여주었다.

Twitter Facebook LinkedIn

Jihye Back

[Paper Review] ACGAN: Conditional Image Synthesis with Auxiliary Classifier GANs 간단한 논문 리뷰

1. Introduction

2. AC-GANs (Auxiliary Classifier GAN)

3. Results

Generating High Resolution Images Improves Discriminability

Measuring the Diversity of Generated Images

4. Discussion

공유하기

댓글남기기

참고

[Paper Review] RAFT: Adapting Language Model to Domain Specific RAG 논문 리뷰

[4] RAG (Retriever Augumented Generation)

[3-2] A Survey of Large Language Models - Adaptation of LLMs

[3-1] A Survey of Large Language Models - 다양한 LLMs부터, Data, Architecture, Training 까지