Zhilin Yang


I am a fourth and final year PhD student at the School of Computer Science, Carnegie Mellon University.

I work on deep learning and natural language understanding.
My advisor is Ruslan Salakhutdinov. I also worked with William W. Cohen.

I worked at Facebook AI Research with Jason Weston, and Google Brain with Quoc V. Le.
I am an Nvidia Fellow and a Siebel Scholar. I also received the Nvidia Pioneering Research Award and Facebook ParlAI Research Award.

We hold the state-of-the-art results on all six major language modeling datasets (One Billion Word, WikiText-103, WikiText-2, Penn Treebank, enwik8, and text8) at the same time (as of Jan 2019)!
Check out our papers --- Transformer-XL and Mixture of Softmaxes.

Prior to coming to CMU, I was an undergrad at Tsinghua University, advised by Jie Tang.

You can contact me through A@B, where A=zhiliny and B=cs.cmu.edu.

[Google Scholar] [GitHub]



Transformer-XL updated and released! Along with code and pretrained models. SoTA on enwik8, text8, One Billion Word, WikiText-103.

HotpotQA dataset released! Along with code, blog posts, and website.

Paper accepted at NIPS 2018.

Two papers accepted at EMNLP 2018.

Starting my internship at Google Brain, mentored by Quoc V. Le.

Paper on unsupervised relational graph learning released.

One oral (acceptance rate 2%) and one poster at ICLR 2018.

Paper on Mechanical Turker Descent released.

Paper and code on language modeling released. SOTA on Penn Treebank and WikiText-2.
[PDF] [Code]

Code on GAN-based semi-supervised learning released. SOTA on MNIST, SVHN, and CIFAR-10 with standard architectures.

Two papers accepted at NIPS 2017.

Code and data on Transfer Learning released.

05 - 08/2017
Internship at Facebook AI Research, mentored by Jason Weston.

Dataset on Semi-Supervised QA released.

Paper on Semi-Supervised Learning with Bad GANs released. SOTA on MNIST, SVHN, and CIFAR-10 with standard architectures.

Paper on Differentiable Rule Learning updated. SOTA on Wordnet, Freebase, and WikiMovies.

Paper on Semi-Supervised QA with Generative Domain-Adaptive Nets accepted by ACL 2017 .

Paper on Gated Attention Readers accepted by ACL 2017 .

Two papers accepted at ICLR 2017.


Carnegie Mellon University

PhD student, Computer Science, 2015 till now

Tsinghua University

BE, Computer Science, 2011-2015

Major GPA 96.8/100, Overall GPA 94.2/100, ranked No.1 among 124 students in the computer science department.

1st place, best undergraduate thesis, computer science department

Advisor: Professor Jie Tang


Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Preprint 2019 (*: equal contribution)

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Zhilin Yang*, Peng Qi*, Saizheng Zhang*, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning
EMNLP 2018 (*: equal contribution)

GLoMo: Unsupervised Learning of Transferable Relational Graphs

Zhilin Yang*, Jake Zhao*, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann LeCun
NIPS 2018 (*: equal contribution)
[PDF] [Code]

Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Jiateng Xie, Zhilin Yang, Graham Neubig, Noah A. Smith, Jaime Carbonell
EMNLP 2018

Neural Models for Reasoning over Multiple Mentions using Coreference

Bhuwan Dhingra, Qiao Jin, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
NAACL 2018, short paper

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Zhilin Yang*, Zihang Dai*, Ruslan Salakhutdinov, William W. Cohen
ICLR 2018 (*: equal contribution)
Oral, acceptance rate 2%
[PDF] [Code]

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston
ICLR 2018

Good Semi-supervised Learning that Requires a Bad GAN

Zihang Dai*, Zhilin Yang*, Fan Yang, William W. Cohen, Ruslan Salakhutdinov
NIPS 2017 (*: equal contribution)
[PDF] [Code]

Differentiable Learning of Logical Rules for Knowledge Base Reasoning

Fan Yang, Zhilin Yang, William W. Cohen
NIPS 2017
[PDF] [Code]

Linguistic Knowledge as Memory for Recurrent Neural Networks

Bhuwan Dhingra, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
Preprint 2017

Semi-Supervised QA with Generative Domain-Adaptive Nets

Zhilin Yang, Junjie Hu, Ruslan Salakhutdinov, William W. Cohen
ACL 2017
[PDF] [Data]

Gated-Attention Readers for Text Comprehension

Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
ACL 2017
[PDF] [Code]

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
ICLR 2017
[PDF] [Code]

Words or Characters? Fine-grained Gating for Reading Comprehension

Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
ICLR 2017
[PDF] [Code]

Review Networks for Caption Generation

Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen
NIPS 2016

Multi-Task Cross-Lingual Sequence Tagging from Scratch

Zhilin Yang, Ruslan Salakhutdinov, William Cohen
Preprint 2016

Revisiting Semi-Supervised Learning with Graph Embeddings

Zhilin Yang, William Cohen, Ruslan Salakhutdinov
ICML 2016

Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs

Zhilin Yang, Jie Tang, William Cohen
IJCAI 2016
Our model is deployed on AMiner for extracting research interests.

Collaborative Embedding Features and Diversified Ensemble for E-Commerce Repeat Buyer Prediction

Zhanpeng Fang*, Zhilin Yang*, Yutao Zhang (* indicates equal contribution)
IJCAI Workshop 2015
Invited paper, competition winners.
[PDF] [Slides]

COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency

Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, Philip Yu
KDD 2015
Oral presentation, acceptance rate 19%.
[PDF] [Slides]

Active Learning for Streaming Networked Data

Zhilin Yang, Jie Tang, Yutao Zhang
CIKM 2014
Full paper, acceptance rate 21%.

Active Learning for Networked Data Based on Non-Progressive Diffusion Model

Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing
WSDM 2014
Full-length oral presentation (5%), acceptance rate 18%.

SAE: Social Analytic Engine for Large Networks

Yang Yang, Jianfei Wang, Yutao Zhang, Wei Chen, Jing Zhang, Honglei Zhuang, Zhilin Yang, Bo Ma, Zhanpeng Fang, Sen Wu, Xiaoxiao Li, Debing Liu, Jie Tang
KDD Demo 2013