📚️ Third-year Ph.D. Student in Information Sciences @UIUC.
🔬 My advisor is Prof. Jiaqi Ma.
👨🏽💻 Machine Learning Engineer @Intel, 2020-2023.
🏫 M.S. in Information @Umich, 2019-2021.
🏫 B.S. in Electrical and Computer Engineering @SJTU, 2016-2020.
👨🏽💻 I also worked as intern @Microsoft, 2025 and @Intel, 2020.
🏡 Now I live in Urbana, Illinois.
💡 I am interested in Data-Centric AI and AI toolkit development.
A Survey of Data Attribution: Methods, Applications, and Evaluation in the Era of Generative AI. Junwei Deng*, Yuzheng Hu*, Pingbang Hu*, Ting-Wei Li*, Shixuan Liu*, Jiachen T. Wang, Dan Ley, Qirun Dai, Benhao Huang, Jin Huang, Cathy Jiao, Hoang Anh Just, Yijun Pan, Jingyan Shen, Yiwen Tu, Weiyi Wang, Xinhe Wang, Shichang Zhang, Shiyuan Zhang, Ruoxi Jia, Himabindu Lakkaraju, Hao Peng, Weijing Tang, Chenyan Xiong, Jieyu Zhao, Hanghang Tong, Han Zhao, Jiaqi W. Ma.
Preprint.
[Preprint]
Exploring Training Data Attribution under Limited Access Constraints.
Shiyuan Zhang*, Junwei Deng*, Juhan Bae, Jiaqi W. Ma
Preprint.
[ArXiv]
Efficient Ensembles Improve Training Data Attribution. Junwei Deng*, Ting-Wei Li*, Shichang Zhang, Jiaqi W. Ma
Preprint.
[ArXiv]
Computational Copyright: Towards A Royalty Model for Music Generative AI. Junwei Deng, Shiyuan Zhang, Jiaqi W. Ma
DPFM Workshop at ICLR 2024, Best Paper Award.
[ArXiv]
Selected Publications
Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining.
Weiyi Wang, Junwei Deng, Yuzheng Hu, Shiyuan Zhang, Xirui Jiang, Runting Zhang, Han Zhao, Jiaqi W. Ma
NeurIPS 2025.
[ArXiv]
A Versatile Influence Function for Data Attribution with Non-Decomposable Loss. Junwei Deng, Weijing Tang, Jiaqi W. Ma
ICML 2025.
[ArXiv]
Adversarial Attacks on Data Attribution.
Xinhe Wang, Pingbang Hu, Junwei Deng, Jiaqi W. Ma
ICLR 2025.
[ArXiv]
dattri: A Library for Efficient Data Attribution. Junwei Deng*, Ting-Wei Li*, Shiyuan Zhang, Shixuan Liu, Yijun Pan, Hao Huang, Xinhe Wang, Pingbang Hu, Xingjian Zhang, Jiaqi W. Ma
NeurIPS 2024. (Datasets and Benchmark Track, Spotlight).
[ArXiv][Project Page ]
BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster. Jason Dai, Ding Ding, Dongjie Shi, Shengsheng Huang, Jiao Wang, Xin Qiu, Kai Huang, Guoqiong Song, Yang Wang, Qiyuan Gong, Jiaming Song, Shan Yu, Le Zheng, Yina Chen, Junwei Deng, Ge Song.
CVPR 2022 (Demo Track).
[ArXiv][Project Page ]
Award: National Scholorship (2018~2019) Outstanding Student Scholorship (2017,2018,2019) Dean List (2017,2018,2019) Explorer Scholarship (2020) Shanghai Outstanding College Graduate (2020)
Talks & Blogs
From Ray to Chronos: Build end-to-end AI use cases using BigDL on top of Ray. [Blog]
BigDL 2.0: Accelerate the process to build large scale AI application on Spark. [Record (in Chinese)]