PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

VeCoR — Velocity Contrastive Regularization for Flow Matching

Zong-Wei Hong, Jing-lun Li, Lin-ze Li, Shen Zhang, Yao Tang

JIIOV Technology
Arxiv

Paper Code arXiv

Supervision and trajectory behavior.

Left — SFM: Only positive supervision → off-manifold drift.

Right — VeCoR: Contrastive negative supervision → more stable outputs.

Abstract

Flow Matching (FM) has recently emerged as a principled and efficient alternative to diffusion models. Standard FM encourages the learned velocity field to follow a target direction; however, it may accumulate errors along the trajectory and drive samples off the data manifold, leading to perceptual degradation, especially in lightweight or low-step configurations. \nTo enhance stability and generalization, we extend FM into a balanced attract–repel scheme that provides explicit guidance on both “where to go” and “where not to go.” To be formal, we propose \textbf{Velocity Contrastive Regularization (VeCoR)}, a complementary training scheme for flow-based generative modeling that augments the standard FM objective with contrastive, two-sided supervision. VeCoR not only aligns the predicted velocity with a stable reference direction (positive supervision) but also pushes it away from inconsistent, off-manifold directions (negative supervision). This contrastive formulation transforms FM from a purely attractive, one-sided objective into a two-sided training signal, regularizing trajectory evolution and improving perceptual fidelity across datasets and backbones. On ImageNet-1K 256 256, VeCoR yields 22% and 35% relative FID reductions on SiT-XL/2 and REPA-SiT-XL/2 backbones, respectively, and achieves further FID gains (32% relative) on MS-COCO text-to-image generation, demonstrating consistent improvements in stability, convergence, and image quality, particularly in low-step and lightweight settings.

VeCoR: Making SiT More Stable and Realistic

(a) Color/contrast: VeCoR yields a more saturated, uniform sky and wolf hues closer to the ground truth.

(b) Geometric consistency: SiT bends the boat and distorts the lamp shade, while VeCoR produces a level hull and a lamp shade closer to the true shape.

(c) Deblurring/sharpening: previously soft boundaries become crisp.

(d) Artifact removal: SiT hallucinates extraneous structures (e.g., a mechanical arm near the spire; a protrusion above the bird’s beak), whereas VeCoR removes them, restoring clean, plausible shapes and textures.

Framework of proposed method, VeCoR.

ImageNet-1k Result compared to REPA.

Text-to-Image Result compared to REPA.

BibTeX

@misc{hong2025vecorvelocitycontrastive,
      title={VeCoR - Velocity Contrastive Regularization for Flow Matching}, 
      author={Zong-Wei Hong and Jing-lun Li and Lin-Ze Li and Shen Zhang and Yao Tang},
      year={2025},
      eprint={2511.18942},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.18942}, 
}

More Works from Our Lab

RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images

Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation

Domain-Generalized Face Anti-Spoofing with Unknown Attacks