Abstract: Diffusion models trained on large-scale text-image datasets have demonstrated a strong capability of con-trollable high-quality image generation from arbitrary text prompts. However, the ...