Abstract
Flow Matching (FM) has recently emerged as a principled and efficient alternative to diffusion models. Standard FM encourages the learned velocity field to follow a target direction; however, it may accumulate errors along the trajectory and drive samples off the data manifold, leading to perceptual degradation, especially in lightweight or low-step configurations. \nTo enhance stability and generalization, we extend FM into a balanced attract–repel scheme that provides explicit guidance on both “where to go” and “where not to go.” To be formal, we propose \textbf{Velocity Contrastive Regularization (VeCoR)}, a complementary training scheme for flow-based generative modeling that augments the standard FM objective with contrastive, two-sided supervision. VeCoR not only aligns the predicted velocity with a stable reference direction (positive supervision) but also pushes it away from inconsistent, off-manifold directions (negative supervision). This contrastive formulation transforms FM from a purely attractive, one-sided objective into a two-sided training signal, regularizing trajectory evolution and improving perceptual fidelity across datasets and backbones. On ImageNet-1K 256 256, VeCoR yields 22% and 35% relative FID reductions on SiT-XL/2 and REPA-SiT-XL/2 backbones, respectively, and achieves further FID gains (32% relative) on MS-COCO text-to-image generation, demonstrating consistent improvements in stability, convergence, and image quality, particularly in low-step and lightweight settings.
VeCoR: Making SiT More Stable and Realistic
(a) Color/contrast: VeCoR yields a more saturated, uniform sky and wolf hues closer to the ground truth.
(b) Geometric consistency: SiT bends the boat and distorts the lamp shade, while VeCoR produces a level hull and a lamp shade closer to the true shape.
(c) Deblurring/sharpening: previously soft boundaries become crisp.
(d) Artifact removal: SiT hallucinates extraneous structures (e.g., a mechanical arm near the spire; a protrusion above the bird’s beak), whereas VeCoR removes them, restoring clean, plausible shapes and textures.
ImageNet-1k Result compared to REPA.
BibTeX
@misc{hong2025vecorvelocitycontrastive,
title={VeCoR - Velocity Contrastive Regularization for Flow Matching},
author={Zong-Wei Hong and Jing-lun Li and Lin-Ze Li and Shen Zhang and Yao Tang},
year={2025},
eprint={2511.18942},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.18942},
}