[Generative Model] Variational Auto-Encoder

업데이트: May 09, 2022

VAE 수식 정리 및 개념 정리

대충 이해한 내용들을 적어놓은 글이라.. 가독성이 떨어질 수 있습니다 😭

Scalable Generative model
- Amortized variational inference
  - 전통적인 variational inference: 기존에 학습했던 데이터들의 정보를 가지고 있는게 아니라, loss function 내에서 gradient를 어떤 방향으로만 업데이트하면 됐기 때문에 데이터가 어떤 것이든 상관이 없었음
  - stochastic gradient에 기반한 backpropagation을 이용한 모델이기 때문에 sequential 하게 학습이 가능하며, 기존에 학습된 모델을 추가 데이터셋으로 weight update가 가능
- Stochastic gradient descent (Back-Propagation)
  - 큰 데이터도 학습 가능
Inference based on probability graphical model
- Continuous latent space
- data reduction / data imputation에 강함
  - 기존의 auto-encoder는 단순히 raw data의 dimension을 그대로 넘겨주는 방식이어서 기존의 데이터와 약간 다른 형식의 데이터를 input으로 주면 문제가 생기나
  - VAE는 확률 자체를 학습하는 방법이기 때문에 noise가 있는 데이터가 들어와도 inference가 잘됨

`AE` VS `VAE`

Auto-Encoder

input $x$ 를 recon하도록 하는 neural net을 훈련하는 auto-encoder 구조
- bottleneck hidden-layer: latent space z, 굉장히 중요한 정보들을 갖고 있는 layer
architecture
- 출처: 서울대 딥러닝 강좌 (윤성로 교수님)
- stochastic encoder / decoder
  - encoder: x 라는 input이 conditioning되어있을 때 이를 바탕으로 latent code를 출력
- 보통 dimension이 큰 input image에서 중요한 feature들을 뽑아서 latent code z에 저장하기 때문에 undercomplete문제라고 볼 수 있음
- 만약 encoder layer (f)와 decoder layer (g)가 linear하고, loss가 mse loss라면, h의 latent space는 PCA의 space라고 볼 수 있음
training
- recon loss
  - mean square loss, cross entropy loss, …
  \[L_{A E}=\|x-y\|^{2}\]
applications
- dimensionality reduction
- feature learning
- generative model의 forefront
regularization
- AE에서는 encoder, decoder의 capacity가 너무나도 크다면, 입력 이미지를 copy하도록 모델이 학습될 수 있음 → 유의미한 data distribution을 학습하진 못하게 됨
- 따라서 AE에 대한 정규화는 필수적
  - 모델의 capacity를 제한하거나
  - 원하는 property를 보다 잘 학습시키도록 loss를 줄 수 있음
- 정규화를 잘하면..
  - noise된 Input이나 학습 때 보지 못한 input을 주어도 robust하게 값을 뽑아낼 수 있게 됨
  - sparse representation
  - representation의 미분값을 작게 할 수 있음

Various AE

Denoising AE(DAE)

input에 noise를 섞어준 AE → Decoder는 denoiser의 역할을 하게 됨
noise는 gaussian noise를 추가해줘도 되고, 입력의 일부를 dropout으로 날려줘도 됨

Sparse AE

reconstruction error
sparsity penalty
\[J(\phi, \boldsymbol{\theta})=\underbrace{L(\boldsymbol{x}, \tilde{\boldsymbol{x}})}_{\begin{array}{c}\text { reconstruction } \\\text { loss }\end{array}}+\underbrace{\Omega(\boldsymbol{h})}_{\begin{array}{c}\text { sparsity } \\\text { loss }\end{array}}\]
- hidden layer의 activation function의 결과값이 대부분 0이 나오도록 penalty를 걸어줌
- hidden layer에서 5% 정도만 활성화되도록 제약을 걸어줘서 오토인코더가 5%의 뉴런의 조합을 사용해서 input을 재구성할 수 있게 함
- 이렇게 되면 AE가 단순히 이미지를 copy & paste하는게 아니라, 각각의 뉴럴넷이 의미있는 정보를 학습할 수 있게 됨
- UFLDL Tutorial – 1. 오토인코더(Sparse Autoencoder) 1 – AutoEncoders & Sparsity

Variational Auto-Encoder

AE는 latent space가 discrete했다면, VAE는 continuous한 확률분포

Auto-Encoder가 단순히 이미지의 차원을 축소하기 위해 만든 것이라면,(Encoder를 학습하기 위해 Decoder를 활용)

Varational AE는 Decoder를 위해 개발된 것으로, input 이미지가 주어지면 이를 확률 분포로 모델링할 수 있도록 encoding을 한다. (다양한 이미지의 sampling이 가능)

출처: https://taeu.github.io/paper/deeplearning-paper-vae/

PipeLine

input image $x_i$ 를 encoder에 넣어 평균과 표준편차를 구하고, 이를 바탕으로 normal distribution을 만들어 z를 sampling
이후 이 z를 decoder에 넣어 output을 복원
Reparameterization Trick: back-prop 과정에서 미분이 가능하도록 trick을 적용

Loss Function

Reconstruction Error
- Cross Entropy를 Loss function으로 사용 (z를 decoder에 넣어 얻은 output이 bernoulli distribution을 따른다고 가정)
Regularization
- encoder를 통과하여 얻은 z의 확률분포 $q(z \mid x)$ 가 original distribution $p(z \mid x)$ 의 정규분포를 따르도록 KL-Divergence를 사용

VAE는 우리가 찾고자하는 data의 distribution을 찾는 것이 목표이다. 즉, $p_{model}(x;\theta)$를 최대화하여 우리가 원하는 distribution을 찾아야하는데, 실제 상황에서 이 확률분포는 구하기 매우 어렵다.

따라서 이를 직접적으로 구하지 않고, variational approximation을 통해 간접적으로 구하게 된다. (variational bound를 maximize하도록)

자세한 내용은 위 그림을 참고

참고할만한 개념

Bayes’ theorem
Variational Inference

만약에 $p(x)$ 라는 target distribution을 근사하고 싶은데 이 확률분포가 complexity가 높고 intractable하여 구하기가 어렵다면, 대신 이와 비슷하지만 simple한 tractable distribution $q(x)$를 찾는 것이 VI이다.
- 이때, 두 distribution이 얼마나 비슷한지는 KL-Divergence를 통해 계산하며 이 값을 minimize함으로써 원하는 distribution $q(x)$를 찾는다.
  \[q^{*}=\operatorname{argmin}_{q \in Q} K L(q \| p)\]
- q는 우리가 임의로 바꿀 수 있는 tractable한 확률분포이기 때문에, 직접적으로 p를 구하는 것보다 q와 p사이의 거리를 최소화하도록 식을 전개함으로써 p와 비슷한 q를 찾는게 편하다.
읽어보면 좋은 글

Optimization

Reparameterization Trick

back-propagation이 가능하도록 표준 정규분포에서 먼저 sampling을 한 후, 식을 약간 바꿔줌

Regularization

Reconstruction Error

Reference

Twitter Facebook LinkedIn

Jihye Back

[Generative Model] Variational Auto-Encoder

`AE` VS `VAE`

Auto-Encoder

Various AE

Variational Auto-Encoder

Optimization

공유하기

댓글남기기

참고

[Paper Review] RAFT: Adapting Language Model to Domain Specific RAG 논문 리뷰

[4] RAG (Retriever Augumented Generation)

[3-2] A Survey of Large Language Models - Adaptation of LLMs

[3-1] A Survey of Large Language Models - 다양한 LLMs부터, Data, Architecture, Training 까지

Jihye Back

AE VS VAE

Auto-Encoder

Various AE

Variational Auto-Encoder

Optimization

공유하기

댓글남기기

참고

[Paper Review] RAFT: Adapting Language Model to Domain Specific RAG 논문 리뷰

[4] RAG (Retriever Augumented Generation)

[3-2] A Survey of Large Language Models - Adaptation of LLMs

[3-1] A Survey of Large Language Models - 다양한 LLMs부터, Data, Architecture, Training 까지

`AE` VS `VAE`