Jihye Back

Jihye Back

꿈꾸는 꼬마 개발자 👩🏻‍💻 (Interested In GenAI)

[Paper Review] StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN 논문 리뷰

업데이트: March 09, 2022

Paper: StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN (arXiv 2021): arxiv, code, hugging face
GAN-Zoos! (GAN 포스팅 모음집)

Pre-trained StyleGAN 모델을 이용해서 이것저것 실험해본 report

StyleGAN에서 i 번째 layer의 intermediate feature를 $f_{i} \in \mathcal{R}^{B \times C \times H \times W}$ 라고 표현

Simple spatial operation

$f_{i}$ 의 spatial dimension을 조절해서 다양한 크기의 이미지를 생성해봤다고 함
이때, Fig 2처럼 다양하게 padding을 줘서 input image를 키워봤는데, padding 방식에 따라 이미지 manipulation이 가능해졌음
관련 논문
- BDInvert: GAN Inversion for Out-of-Range Images with Geometric Transformations
- StyleGAN3: Alias-Free Generative Adversarial Networks

Feature interpolation

StyleGAN 모델에서는 blending 방식을 통해 두 이미지 사이의 새로운 이미지를 생성할 수 있음
- 다만, 만약 blending을 하고자 하는 이미지가 너무 다르게 생겼다면 blending이 잘 안됨
저자들은 blending 말고 feature interpolation을 시도
\[f_{i}=(1-\alpha) f_{i}^{A}+\alpha f_{i}^{B}, \text { where } \alpha \in[0,1]^{B, C, H, W}\]

Generation from a single image

Single image내에서도 interpolation이 가능
어떤 feature layer에서 관련있는 patch를 뽑은 다음에 이를 다른 영역에 blending하는 방식으로 복붙
\[f_{i}=\operatorname{Shift}((1-\alpha)) f_{i}+\operatorname{Shift}\left(\alpha f_{i}\right)\]
SinGAN은 사진마다 모델을 training 시켰어야했는데, 저자들의 방식은 latent code를 다양하게 뽑으면 다양한 사진들을 만들어 낼 수 있어서 더 효용적이라고 주장

또, SinGAN은 좀더 유연하게 이미지를 생성할 수 있지만 Fig 4를 보면 현실적이지 않은 이미지를 생성하고 있음
- 건물모양이랑 구름 모두 이상 (이 문제는 singan 논문에서도 report하고 있음)
논문에서 말하길, 저자들의 방식은 이미지의 다른 부분을 가져와서 추가로 확장한 영역에 붙이는 거기 때문에 조금 더 현실적이라고 함

Improved GAN Inversion

out-of-domain에 있는 이미지는 inversion하기 어려움
Wulff 팀은 W+ space를 Gaussian distribution에서 모델링함으로써 GAN Inversion의 성능을 향상시킴
SoAT의 저자들은 Style space를 Gaussian distribution에서 모델링해서 inversion 성능을 높였다고 함
- 그랬더니 Wulff 팀의 결과보다 좋아졌다

Controllable I2I translation

https://github.com/HideUnderBush/UI2I_via_StyleGAN2 모델에서는 W+ space를 freeze한 다음에 stylegan network blending을 해서 이미지 blending
- 저자들: style space가 조금 더 disentangle하니까 이를 freeze해서 실험을 했다고 함 (결과 Fig 6(a))
Fig 6(b): blending 정도를 continuous하게 조절 가능
Fig 6(c): feature interpolation 가능
- 약간 StyleMapGAN과 비슷한 컨셉

Panorama Generation

Attributes Transfer

Pose가 다른 두 이미지에 대해서는 blending하기가 어려움
저자들은 이 문제를 해결하기 위해 pose alignment를 먼저 맞춘 다음에 blending 했다고 함

공유하기

Twitter Facebook LinkedIn

댓글남기기

참고

[Paper Review] RAFT: Adapting Language Model to Domain Specific RAG 논문 리뷰

June 13 2024

[4] RAG (Retriever Augumented Generation)

June 13 2024

[3-2] A Survey of Large Language Models - Adaptation of LLMs

June 13 2024

[3-1] A Survey of Large Language Models - 다양한 LLMs부터, Data, Architecture, Training 까지

June 13 2024