Faculty Directory
Yiping Lu

Assistant Professor of Industrial Engineering and Management Sciences and (by courtesy) Engineering Sciences and Applied Mathematics

Contact

2145 Sheridan Road
Tech M237
Evanston, IL 60208-3109

Email Yiping Lu

Website

Yiping Lu's Website

Departments

Industrial Engineering and Management Sciences

Download CV

Education

Ph.D ,Applied and computational math, Stanford

B.S., Computational Math, Peking University

Research Interests

My research interest is introduced in https://2prime.github.io/files/rs.pdf

My research focuses on scaling laws in machine learning—understanding when, why, and how machine-learning systems improve predictably as we scale resources, including data, model size, optimization effort, and inference-time computation.

In large language models, scaling has become a remarkably reliable principle: performance improves smoothly as we increase compute and data, often following simple power laws. This predictability fundamentally changes how we design learning systems, making it possible to forecast performance and allocate resources optimally. However, this kind of reliable scaling behavior is far from universal. In many settings—especially when models interact with complex structure, constraints, or long-horizon dynamics—scaling breaks down: optimization becomes unstable, hyperparameters stop transferring across model sizes, and accuracy plateaus despite increased compute.

My research aims to build a general theory and algorithmic framework for scalable learning, in which increased resources provably and reliably lead to better performance. Rather than treating scaling as an empirical phenomenon, I study it as a principled question of optimization geometry, statistical complexity, and resource allocation.

1. Why Do Some Models Scale—and Others Don’t?

A central question in my work is understanding what fundamentally limits scaling. I study how approximation error, optimization difficulty, and statistical uncertainty interact as models grow wider, deeper, or are trained with more data. This includes identifying error floors, characterizing regimes where scaling laws hold, and explaining why naive scaling often fails. The goal is to move beyond ad-hoc heuristics and develop predictive scaling theories that apply across model classes.

2. Scaling-Aware Optimization and Geometry

As model size increases, the geometry of the loss landscape changes in ways that strongly affect optimization. Learning rates and optimizer hyperparameters that work well at small scale often fail at large scale. I study optimization from a geometric perspective, viewing modern optimizers as instances of steepest descent under different norms. This leads to new scaling-aware optimization methods whose convergence behavior and hyperparameter choices remain stable as width and depth increase. Ultimately, I aim to design optimizers whose performance scales smoothly with model size, rather than deteriorating.

3. Inference-Time Scaling: Trading Compute for Accuracy

Beyond training-time scaling, I am particularly interested in inference-time scaling—improving model performance by allocating more computation after training, without changing model parameters. Inspired by ideas from Monte Carlo simulation, control, and sequential decision-making, my work develops methods that use additional inference-time compute to detect, correct, and reduce model error on the fly. This establishes inference-time computation as a first-class scaling axis, alongside data and model size, and provides a principled way to trade compute for reliability.

Selected Publications

1.A Simple Unbiased Guidance for Diffusion Models via Sequential Monte Carlo on Path Measures, to be submitted to ICML 2026

2.On the Width Scaling of Neural Optimizers Under Matrix Operator Norms I: Making Operators Play Nice Together to be submitted to Mathematics of Operations Research

3.Ruihan Xu, Yiping Lu, What is a Sketch-and-Precondition Derivation for Low-Rank Approximation? Inverese Power Error or Inverse Power Estimation?, arXiv:2502.07993

4.Zexi Fan, Yan Sun, Shihao Yang, and Yiping Lu*. Physics-Informed Inference Time Scaling via Simulation-Calibrated Scientific Machine Learning, arXiv:2504.16172

5.Zihan Wang, Kangrui Wang, Qineng Wang, Pingyue Zhang, Linjie Li, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Monica Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li, RAGEN: Training Agents by Reinforcing Reasoning, arXiv:2504.20073

6.Yiping Lu, Daozhe Lin, Qiang Du, Which Space can be Embedded into a Lp-Type Reproducing Kernel Banach Space? A Characterization via Metric Entropy, arXiv:2410.11116

7.Yiping Lu, Jiajin Li, Lexing Ying, Jose Blanchet. Synthetic Principal Component Design: Fast Experiment Design with Synthetic Control, 40th Conference on Uncertainty in Artificial Intelligence, 2024 (Oral)

8.Yihang Chen, Fanghui Liu, Yiping Lu, Grigorios Chrysos, Volkan Cevher. Generalization Guarantees of Deep ResNets in the Mean-Field Regime, International Conference on Learning Representations(ICLR) 2024 (Spotlight)

9.Jose Blachet, Haoxuan Chen, Yiping Lu, Lexing Ying. When can Regression-Adjusted Control Variate Help? Rare Events, Sobolev Embedding and Minimax Optimality, Thirty-seventh Conference on Neural Information Processing Systems (alphabetical order), (Neurips 2023)

10.Yiping Lu*, Wenlong Ji*, Zach izzo*, Lexing Ying, Importance Tempering: Group Robustness for Overparameterized Models., arXiv:2209.08745

11.Jikai Jin, Yiping Lu, Jose Blanchet, Lexing Ying Mini-max Optimal Kernel Operator Learning via Multilevel Training, Eleventh International Conference on Learning Representations (ICLR 2023). (Spotlight)

12.Yiping Lu, Haoxuan Chen, Jianfeng Lu, Lexing Ying, Jose Blanchet Machine Learning For Elliptic PDEs: Fast Rate Generalization Bound, Neural Scaling Law and Minimax Optimality, International Conference on Learning Representations (ICLR),2022

13.Wenlong Ji, Yiping Lu, Yiliang Zhang, Zhun Deng, Weijie J Su How Gradient Descent Separates Data with Neural Collapse: A Layer-Peeled Perspective, International Conference on Learning Representations(ICLR),2022

14.Yiping Lu*, Zhuohan Li*, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Liwei Wang, Tie-yan Liu. Understanding and Improving Transformer Architecture From the View of Multi-particle Dynamic System. ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations. (alphabetical order)

15.Yiping Lu, Chao Ma, Yulong Lu, Jianfeng Lu, Lexing Ying. A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth.Thirty-seventh International Conference on Machine Learning (ICML), 2020.

16.Zichao Long*, Yiping Lu*, Xianzhong Ma*, Bin Dong. PDE-Net:Learning PDEs From Data, Thirty-fifth International Conference on Machine Learning (ICML), 2018. (*equal contribution)

17. Yiping Lu*, Aoxiao Zhong, Quanzheng Li, Bin Dong. Beyond Finite Layer Neural Network: Bridging Deep Architects and Numerical Differential Equations, Thirty-fifth International Conference on Machine Learning (ICML), 2018.

18. Xiaoshuai Zhang*, Yiping Lu*, Jiaying Liu, Bin Dong. Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration, International Conference on Learning Representations(ICLR),2019. (*equal contribution)

19. Dinghuai Zhang*, Tianyuan Zhang*, Yiping Lu*, Zhanxing Zhu, Bin Dong. You Only Propagate Once: Painless Adversarial Training Using Maximal Principle. 33rd Annual Conference on Neural Information Processing Systems (NeurIPS), 2019. (*equal contribution)

Faculty DirectoryYiping Lu