AI 2.0 (GenAI):
Currently I build and deploy
RLHF systems for multimodal video generation at
Kling,
covering preference data pipelines, reward modeling, and scalable and stable reinforcement learning.
AI 1.0 (Perception & Recognition):
Previously, I led an algorithm team at
Megvii and
Jiiov, focusing on
visual perception systems, including face, hand, finger, and
human-centric
2D & 3D understanding and recognition,
deployed over
billion mobile devices for
everyday real-world usage. In parallel, I have extensive research experience with publications at top-tier AI and vision conferences; see my
Google Scholar.
I am looking for self-motivated interns for mulimodal understanding and generation research! Drop me an email if you are interested.
News
- [2025.12] Kling-Omni Technical Report is released. [Link]
- [2025.12] Kling Launch Week has come! Including Omni, Sound, Avatar and more! [Youtube]
- [2025.10] GRPO-Guard for tackling over-optimization in flow matching is released. [Project][Paper]
- [2025.09] Three papers have been accepted by NeurIPS 2025!
- [2025.09] VR-Thinker reward for video generation is released. [Project] [Paper]
- [2025.06] EvoSearch for video test-time scaling is released. [Project][Paper]
- [2025.05] We release Flow-GRPO, the first method to integrate online policy gradient reinforcement learning (RL) into flow matching models. [Project][Paper]
- [2025.05] One paper has been accepted by ICML 2025!
- [2025.04] Kling 2.0 is released! [Link]
- [2025.01] Video-Align is released. we develop a systematic pipeline that leverage human feedback to improve video generation. [Project][Paper]
About me
I am Jiajun Liang, I was born in Zhongshan, Guangdong Province, the hometown of Sun Yat-sen, and I now live in Beijing.
I received my M.S. degree from Tsinghua University in 2017 and my B.E. degree from Huazhong University of Science and Technology in 2014.
My research interests lie in multimodal understanding and generation, with a particular focus on data curation pipelines and reinforcement-learning based post-training.
Working Experience
- Kuaishou Kling, Multimodal Video Generation (2024 – Present)
- Megvii, Algorithm Director (2016 – 2024)
Selected Publications
▲ Video Generation / RLHF
Flow-GRPO: Training Flow Matching Models via Online RL
J. Liu, G. Liu, Jiajun Liang, Y. Li, J. Liu, X. Wang, P. Wan, D. Zhang, W. Ouyang
GRPO-Guard: Mitigating Implicit Over-Optimization in Flow Matching via Regulated Clipping
J. Wang, Jiajun Liang, J. Liu, H. Liu, G. Liu, J. Zheng, W. Pang, A. Ma, Z. Xie, X. Wang
Improving Video Generation with Human Feedback
J. Liu, G. Liu, Jiajun Liang, Z. Yuan, X. Liu, M. Zheng, X. Wu, Q. Wang, M. Xia, X. Wang
Scaling Image and Video Generation via Test-Time Evolutionary Search
H. He, Jiajun Liang, X. Wang, P. Wan, D. Zhang, K. Gai, L. Pan
VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning
Q. Wang, J. Liu, Jiajun Liang, Y. Jiang, Y. Zhang, J. Chen, Y. Zheng, X. Wang, P. Wan, X. Yue
▲ Diffusion Models
LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding
S. Zhang, S. Liang, Y. Tan, Z. Chen, L. Li, G. Wu, Y. Chen, S. Li, Z. Zhao, C. Chen, Jiajun Liang, Y. Tang
MegActor-Sigma: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer
S. Yang, H. Li, J. Wu, M. Jing, L. Li, R. Ji, Jiajun Liang, H. Fan, J. Wang
MegActor: Harnessing Diffusion Models for High-Fidelity Human Animation
S. Yang, H. Li, J. Wu, M. Jing, L. Li, R. Ji, Jiajun Liang, H. Fan
HiDiffusion: Unlocking High-Resolution Creativity and Efficiency in Low-Resolution Trained Diffusion Models
S. Zhang, Z. Chen, Z. Zhao, Y. Tang, Y. Chen, W. Cao, Jiajun Liang
▲ Knowledge Distillation/Efficient AI
Efficient One-Pass Self-Distillation with Zipf’s Label Smoothing
Jiajun Liang, L. Li, Z. Bing, B. Zhao, Y. Tang, B. Lin, H. Fan
Decoupled Knowledge Distillation
B. Zhao, Q. Cui, R. Song, Y. Qiu, Jiajun Liang
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
S. Wei, T. Ye, S. Zhang, Y. Tang, Jiajun Liang
Cumulative Spatial Knowledge Distillation for Vision Transformers
B. Zhao, R. Song, Jiajun Liang
DOT: A Distillation-Oriented Trainer
B. Zhao, Q. Cui, R. Song, Jiajun Liang
Asymmetric Decision-Making in Online Knowledge Distillation
Z. Chen, B. Zhao, Y. Ge, Y. Chen, R. Song, Jiajun Liang
▲ Vision / Recognition
EAST: An Efficient and Accurate Scene Text Detector
X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, Jiajun Liang
Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization
S. Dong, J. Wang, R. Ji, Jiajun Liang, H. Fan, Z. Ge
A Simple Baseline for Efficient Hand Mesh Reconstruction
Z. Zhishan, Z. Shihao, L. Zhi, Z. Minqiang, T. Yao, Jiajun Liang