Reflections on My 2025 Fall Application Season
This post comes rather late — mostly because I procrastinated, partly because I needed distance to understand it all. Looking back, 2025’s application season was a strangely emotional yet enlightening ride. Background and Outcome I work on video world models, with previous experience in LLMs, agents, and interpretability. I did my undergrad in Computer Science (IEEE Honor Class) at Shanghai Jiao Tong University and applied to both PhD and Master’s programs. ...
How to Write a Good Letter of Recommendation?
When requesting a recommendation letter from professors, it’s increasingly common to be asked for a timeline, a summary of your work, or even a draft of the letter itself. Professors are often extremely busy—you might be surprised by the number of emails they handle daily—so such a request is generally reasonable (though some professors still dedicate time to writing the letter entirely themselves). In fact, this can be seen as a good opportunity: it gives you more control over the content of your recommendation. As a result, it’s important for students to learn how to write a recommendation letter, even if only as a draft. ...
Paper Reading: Cheating Popular LLM Benchmarks
Anti-cheating has long been a critical consideration when designing the rules for leaderboards, but this remains unexplored in the context of LLM benchmarks Citation: [1]Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates X. Zheng, T. Pang, C. Du, Q. Liu, J. Jiang, M. Lin, (2024) Link . Introduction There are many well-known LLM benchmarks, such as AlpacaEval 2.0 Citation: [2]Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators Y. Dubois, B. Galambosi, P. Liang, T. Hashimoto, (2024) DOI , Arena-Hard-Auto Citation: [3]From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline T. Li, W. Chiang, E. Frick, L. Dunlap, T. Wu, B. Zhu, J. Gonzalez, I. Stoica, (2024) DOI ; Citation: [4]Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena L. Zheng, W. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. Xing, H. Zhang, J. Gonzalez, I. Stoica, (2023) DOI , and MTBench Citation: [4]Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena L. Zheng, W. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. Xing, H. Zhang, J. Gonzalez, I. Stoica, (2023) DOI . They are widely used in the research community to evaluate the performance of LLMs. ...