[Paper Review] Cali-Sketch: Stroke Calibration and Completion for High-Quality Face Image Generation from Poorly-Drawn Sketches 논문 리뷰

업데이트:

Cali-Sketch: poorly-drawn-sketch to photo-realistic-image generation

  • Stroke Calibration Network (SCN): abstract한 sketch를 detail하게 만들어주는 network
  • Image Synthesis Network (ISN): refined sketch로부터 이미지를 생성해주는 network

(1) Stroke Calibration Network (SCN)

  • inconsequent stroke를 수정하고 input sketch를 detail하게 수정해주는 network → refined sketch $R$
\[\mathbf{R}=G_{1}\left(\mathbf{S}, \mathbf{E}_{g t}\right)\]
  • $\mathbf{E}{g t}$ : global contours $\mathbf{E}{g c}$ + local details $\mathbf{E}_{ld}$

  • calibration loss $\mathcal{L}_{CL}$
    • global contour loss: inconsequent stroke를 수정
      • HED detection
    • local detail loss: 세부적인 detail을 추가
      • Canny edge map
    • 위 두 loss 모두 feature matching loss를 base로 하고 있음
    \[\mathcal{L}_{C L}=\mathbb{E}\left[\sum_{i=1}^{T} \frac{1}{N_{i}}\left\|D_{1}^{(i)}\left(\mathbf{E}_{g t}[j]\right)-D_{1}^{(i)}(\mathbf{R})\right\|_{1}\right]\]
  • GAN Loss

    \[\begin{aligned}\min _{D_{1}} L_{\mathrm{adv}, \mathrm{SCN}}\left(D_{1}\right)=& \frac{1}{2} \mathbb{E}_{\boldsymbol{x} \sim p(\boldsymbol{x})}\left[\left(D_{1}(\boldsymbol{x})-b\right)^{2}\right]+\\& \frac{1}{2} \mathbb{E}_{\boldsymbol{z} \sim p_{\boldsymbol{z}}(\boldsymbol{z})}\left[\left(D_{1}\left(G_{1}(\boldsymbol{z})\right)-a\right)^{2}\right] \\\min _{G_{1}} L_{\mathrm{adv}, \mathrm{SCN}}\left(G_{1}\right)=& \frac{1}{2} \mathbb{E}_{\boldsymbol{z} \sim p_{z}(\boldsymbol{z})}\left[\left(D_{1}\left(G_{1}(\boldsymbol{z})\right)-c\right)^{2}\right]\end{aligned}\]
    • where a, b, c denotes the labels for fake data and real data and the value that G wants D to believe for fake data respectively
  • Total Loss

    \[\min _{G_{1}} \max _{D_{1}} \mathcal{L}_{G_{1}}=\min _{G_{1}}\left(\max _{D_{1}}\left(\mathcal{L}_{a d v, S C N}\right)+\lambda \mathcal{L}_{C L}\right)\]

(2) Image Synthesis Network (ISN)

  • SCN을 통해 얻은 refined sketch $R$ 를 input으로 삼아 photo-realistic face photo $P$ 를 생성하는 network
\[\mathbf{P}=G_{2}\left(\mathbf{R}, \mathbf{I}_{g t}\right)\]
  • Reconstruction Loss

    \[\mathcal{L}_{\ell_{1}}=\mathbb{E}\left[\sum_{i} \frac{1}{N_{i}}\left\|\mathbf{I}_{g t}-\mathbf{P}\right\|_{1}\right]\]
  • Perceptual Loss
    • VGG-19 pre-trained network로 두 activated feature 사이에 거리를 계산
    \[\mathcal{L}_{\text {percep }}=\mathbb{E}\left[\sum_{i} \frac{1}{N_{i}}\left\|\phi_{i}\left(\mathbf{I}_{g t}\right)-\phi_{i}(\mathbf{P})\right\|_{1}\right]\]
  • Style Loss

    \[\mathcal{L}_{\text {style }}=\mathbb{E}_{j}\left[\left\|G_{j}^{\phi}\left(\mathbf{I}_{g t}\right)-G_{j}^{\phi}(\mathbf{P})\right\|_{1}\right]\]
  • GAN Loss
  • Total variation loss

    \[\mathcal{L}_{\mathrm{tv}}=\left\|\nabla_{x} \mathbf{P}-\nabla_{y} \mathbf{P}\right\|_{1}\]
  • Total Loss

    \[\mathcal{L}_{G_{2}}=\lambda_{1} \mathcal{L}_{\ell_{1}}+\lambda_{2} \mathcal{L}_{a d v, I S N}+\lambda_{3} \mathcal{L}_{\text {percep }}+\lambda_{4} \mathcal{L}_{\text {style }}+\mathcal{L}_{\mathrm{tv}}\]

Training


Result

댓글남기기