[Paper Review] CLIPDraw & StyleCLIPDraw 논문 리뷰

업데이트: February 21, 2022

CLIPDraw

Paper: CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders (arxiv 2021): arxiv, code

별도의 훈련 없이 pre-trained CLIP language-image encoder를 사용하여 text로부터 이미지를 생성하는 모델

description과 generated drawing간의 smilarity가 최대화되도록 optimize

CLIP model
- image encoder와 text encoder, 이 2가지의 network로 구성되며
- 이들은 512 dim의 encoding space를 공유한다

Method
1. Initialize Curves: N개의 RGBA Bezier curves를 random으로 initialize
  - 각 curve들은 thickness, RGBA color vector와 함께 3~5개의 control point로 parameterize된다고 함
  - 처음에는 curve들이 random으로 initialize
  - 흰 배경, default color
  - optimization 동안 curve와 control point의 수는 고정, thickness와 color vector는 GD를 통해 optimize될 수 있음
2. Render Curves to Pixels: 매 iteration마다 curve를 pixel image로 rendering (by differentiable vector render)
3. Augment the Image: 2의 이미지를 D번 augmentation
  - random perspective shift
  - random crop-and-resize
4. Encoding Image: 3의 augmented image batch를 CLIP model로 encoding
5. Compute Loss: text encoding값과의 cosine similarity 계산
6. Backprop
7. 2,3,4,5,6 반복

Result

StyleCLIPDraw

Paper: StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis (NeurIPS workshop 2021): arxiv, code

CLIPDraw에 style loss를 추가해서 style control까지 가능하도록 한 모델

CLIPDraw의 결과물에 style-transfer를 하면 단순히 texture만 포함이 되지만

StyleCLIPDraw를 통해 이미지를 생성하면 style image의 texture와 shape이 반영된 이미지 생성이 가능하다.

Method
1. Initialize Curves & Render Curves to Pixels: Differentiable Renderer로 brush strokes(Bezier curve)를 raster image로 rendering
  - StyleCLIPDraw는 CLIPDraw와 마찬가지로 randomized Bezier curves를 initialize하여 시작한다.
2. Content Loss
  1. image를 augmentation
  2. CLIPDraw에서처럼 CLIP model을 활용하여 augmented image와 text를 embedding한 후 cosine distance로 loss를 계산한다.
3. StyleLoss
  - VGG-16 model로 raster image와 style image사이의 loss를 계산

Result

Twitter Facebook LinkedIn

Jihye Back

[Paper Review] CLIPDraw & StyleCLIPDraw 논문 리뷰

CLIPDraw

StyleCLIPDraw

공유하기

댓글남기기

참고

[Paper Review] RAFT: Adapting Language Model to Domain Specific RAG 논문 리뷰

[4] RAG (Retriever Augumented Generation)

[3-2] A Survey of Large Language Models - Adaptation of LLMs

[3-1] A Survey of Large Language Models - 다양한 LLMs부터, Data, Architecture, Training 까지