Zhilin Yang


I am a cofounder of Recurrent AI and an assistant professor of Tsinghua University.

The ultimate goal of all my work, including both research and business, is to maximize the value of artificial intelligence. Particularly, me and my team are working on achieving general cognitive intelligence using natural language as a key interface, i.e., human users interact with a machine using natural language for performing various cognitive tasks. We not only formulate and solve general research problems, but also build products and solutions. I believe breaking the barrier between research, engineering, and business is a crucial path towards the goal.

In 2019, I obtained my PhD degree from the School of Computer Science, Carnegie Mellon University, advised by Ruslan Salakhutdinov and William W. Cohen. Prior to that, in 2015, I received my bachelor's degree from Tsinghua University, advised by Jie Tang. I worked at Meta AI with Jason Weston, and Google Brain with Quoc V. Le.

I am a Forbes Asia 30 Under 30 awardee, an Nvidia Fellow, a Siebel Scholar, and a BAAI Young Scientist. My projects received the Nvidia Pioneering Research Award, the Facebook ParlAI Research Award, and the Yunfa Award at the World AI Conference.

Prospective undergrad/graduate interns: Our group at Tsinghua University is actively taking undergrad and graduate research interns. Feel free to send me emails for inquiries.

Prospective PhDs: For PhD admission, I usually take top undergrad students from the IIIS summer camp held in June/July each year. If you are an undergrad interested in doing a PhD with me, please apply to the summer camp in your 3rd year. Statistically, undergraduates who have worked with me for at least 4 months have a higher chance of being accepted as my PhD student.

Prospective master's: Unfortunately, I am not taking master's students.

Reference letters: I write reference letters for students who have worked with me for at least 4 months.

Reach out to me at A@B, where A=zhiliny and B=tsinghua.edu.cn.

[Google Scholar] [GitHub]



We released a multilingual code generation model CodeGeeX.

TLM (NLP from scracth without pretraining) accepted at ICML 2022.
[PDF] [Code]

GLM (general language models) accepted at ACL 2022.
[PDF] [Code]

P-Tuning v2 accepted at ACL 2022.
[PDF] [Code]

FlipDA (effective and robust data augmentation) accepted at ACL 2022.
[PDF] [Code]

FewNLU (how to benchmark few-shot learning systems) accepted at ACL 2022.


NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

Xingcheng Yao, Yanan Zheng, Xiaocong Yang, Zhilin Yang
ICML 2022
[PDF] [Code]

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang
ACL 2022
[PDF] [Code]

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, Jie Tang
ACL 2022
[PDF] [Code]

FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding

Yanan Zheng, Jing Zhou, Yujie Qian, Ming Ding, Chonghua Liao, Jian Li, Ruslan Salakhutdinov, Jie Tang, Sebastian Ruder, Zhilin Yang
ACL 2022
[PDF] [Site]

Flipda: Effective and Robust Data Augmentation for Few-Shot Learning

Jing Zhou, Yanan Zheng, Jie Tang, Jian Li, Zhilin Yang
ACL 2022
[PDF] [Code]

Controllable Generation from Pre-trained Language Models via Inverse Prompting

Xu Zou, Da Yin, Qingyang Zhong, Ming Ding, Hongxia Yang, Zhilin Yang, Jie Tang
KDD 2021
[PDF] [Code]

Distribution Matching for Rationalization

Yongfeng Huang, Yujun Chen, Yulun Du, Zhilin Yang
AAAI 2021
[PDF] [Code]

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le
NeurIPS 2019 (*: equal contribution)
Oral, acceptance rate 0.5%

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
ACL 2019 (*: equal contribution)

Mixtape: Breaking the Softmax Bottleneck Efficiently

Zhilin Yang, Thang Luong, Ruslan Salakhutdinov, Quoc V. Le
NeurIPS 2019

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Zhilin Yang*, Peng Qi*, Saizheng Zhang*, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning
EMNLP 2018 (*: equal contribution)

GLoMo: Unsupervised Learning of Transferable Relational Graphs

Zhilin Yang*, Jake Zhao*, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann LeCun
NIPS 2018 (*: equal contribution)
[PDF] [Code]

Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Jiateng Xie, Zhilin Yang, Graham Neubig, Noah A. Smith, Jaime Carbonell
EMNLP 2018

Neural Models for Reasoning over Multiple Mentions using Coreference

Bhuwan Dhingra, Qiao Jin, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
NAACL 2018, short paper

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Zhilin Yang*, Zihang Dai*, Ruslan Salakhutdinov, William W. Cohen
ICLR 2018 (*: equal contribution)
Oral, acceptance rate 2%
[PDF] [Code]

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston
ICLR 2018

Good Semi-supervised Learning that Requires a Bad GAN

Zihang Dai*, Zhilin Yang*, Fan Yang, William W. Cohen, Ruslan Salakhutdinov
NIPS 2017 (*: equal contribution)
[PDF] [Code]

Differentiable Learning of Logical Rules for Knowledge Base Reasoning

Fan Yang, Zhilin Yang, William W. Cohen
NIPS 2017
[PDF] [Code]

Linguistic Knowledge as Memory for Recurrent Neural Networks

Bhuwan Dhingra, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
Preprint 2017

Semi-Supervised QA with Generative Domain-Adaptive Nets

Zhilin Yang, Junjie Hu, Ruslan Salakhutdinov, William W. Cohen
ACL 2017
[PDF] [Data]

Gated-Attention Readers for Text Comprehension

Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
ACL 2017
[PDF] [Code]

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
ICLR 2017
[PDF] [Code]

Words or Characters? Fine-grained Gating for Reading Comprehension

Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
ICLR 2017
[PDF] [Code]

Review Networks for Caption Generation

Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen
NIPS 2016

Multi-Task Cross-Lingual Sequence Tagging from Scratch

Zhilin Yang, Ruslan Salakhutdinov, William Cohen
Preprint 2016

Revisiting Semi-Supervised Learning with Graph Embeddings

Zhilin Yang, William Cohen, Ruslan Salakhutdinov
ICML 2016

Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs

Zhilin Yang, Jie Tang, William Cohen
IJCAI 2016
Our model is deployed on AMiner for extracting research interests.

Collaborative Embedding Features and Diversified Ensemble for E-Commerce Repeat Buyer Prediction

Zhanpeng Fang*, Zhilin Yang*, Yutao Zhang (* indicates equal contribution)
IJCAI Workshop 2015
Invited paper, competition winners.
[PDF] [Slides]

COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency

Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, Philip Yu
KDD 2015
Oral presentation, acceptance rate 19%.
[PDF] [Slides]

Active Learning for Streaming Networked Data

Zhilin Yang, Jie Tang, Yutao Zhang
CIKM 2014
Full paper, acceptance rate 21%.

Active Learning for Networked Data Based on Non-Progressive Diffusion Model

Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing
WSDM 2014
Full-length oral presentation (5%), acceptance rate 18%.

SAE: Social Analytic Engine for Large Networks

Yang Yang, Jianfei Wang, Yutao Zhang, Wei Chen, Jing Zhang, Honglei Zhuang, Zhilin Yang, Bo Ma, Zhanpeng Fang, Sen Wu, Xiaoxiao Li, Debing Liu, Jie Tang
KDD Demo 2013