SQA-010

Main idea: 让每个 ODE trajectory 上面每个点都输出对应的数据点, 从而直接一步从噪声生成图片

这里我们使用的记号是 0 时刻是 data, $T$ 时刻是 noise, SDE $\text dx_t = -t \nabla \log p_t(x_t)\text dt$

$t$ 时间的分布等效于 $x+t\epsilon$

Parameterization

我们希望 $f(x_0, 0)=x_0$. 采用的方法是使用 $f(x, t) = c_{\text{skip}}(t)x+c_{\text{out}}F_{\theta}(x, t)$ 然后让 0 时候手动等于 $x_0$

Sampling

除了一步生成, 也可以多步生成, 每次加点噪声. Schedule 自己调.

Distill from a diffusion model $s_{\phi}(x, t)$

先找出离散的时间点 $0=t_1, \cdots, t_N=T$

方法是和 EDM 一样的方法 $\rho=7$

image not found

其中 $\Phi(x, t, \phi) = -ts_{\phi}(x, t)$ 是走一步 ODE (这里对应 euler, 也可以用别的 ODE solver)

Directly train a consistency model

image not found

其中利用了 $\nabla \log p_t(x_t) = \mathbb E\left[\frac{x_t-x}{t^2}\mid x_t\right]$

文中还提到训练的时候 $N$ 应该逐渐加大 (intuition: $N$ 小的时候, consistency training loss more bias but less variance)

Details

对于 CD, 使用 LPIPS (Learned Perceptual Image Patch Similarity) 来计算相似度, 以及训练时候使用 Heun 18 steps 最佳.

对于 CD, 当 $N$ 充分大的时候, 再增加也表现差不多 (一般取到 80 差不多)

对于 CT with continuous time training, helpful if 把模型初始化成 EDM model….. 唐 (猜测是 large variance of loss)

discrete time 随机初始化可以 work

parameterization: 使用 $c_{\text{skip}}(t) = \frac{\sigma_{\text{data}}^2}{t^2+\sigma_{\text{data}}^2}, \quad c_{\text{out}} = \frac{\sigma_{\text{data}}t}{\sqrt{t^2+\sigma_{\text{data}}}^2}$

$N$ & EMA schedule: 有深意

image not found

To do LPIPS, rescale image to 224x224 with bilinear.

Zero-shot image editing

image not found

whether we design $A$ as an invertible matrix, and $\Omega$ as a mask.

具体见 Appendix D