Hi~ I am Tianyu Gao, a senior-year undergraduate student from the Department of Computer Science and Technology, Tsinghua University. I am a member of Tsinghua Natural Language Processing and Computational Social Science Lab (THUNLP) and advised by Professor Zhiyuan Liu.

Here is [my CV], and you can find my projects from [my github] and [THUNLP github].

Research


My research interests lie within the intersection of natural language processing and machine learning. More specifically, my research interests include:

  • Training NLP models with fewer annotations. Annotations for language are expensive to gather, so it is meaningful to develop models and algorithms that learn more efficiently.
    • Human can grasp new knowledge with only a handful of examples. So can machines. Few-shot learning aims at guiding models to learn new tasks with limited data.
    • There are huge amounts of unlabeled data on the Internet and we can utilize them with unsupervised / semi-supervised training, like pretraining language models or bootstrapping from a few seeds of annotations.
    • Existing structured information can act as an external knowledge for NLP models, like knowledge graphs in distant supervision relation extraction.
    • I explore the above aspects mainly in the field of information extraction, an important area in NLP.

Publications


KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhiyuan Liu, Juanzi Li, Jian Tang.
Arxiv preprint

  • In this paper, we propose a unified framework for knowledge embedding and pre-trained language representation.
    • We combine the objectives from both knowledge graph embedding (like TransE) and pre-trained language representation (like BERT), fusing language semantics and sytax from large-scale corpora and world knowledge from structured graphs in a smooth way.
    • Our KEPLER framework can not only benefit knowledge-driven NLP tasks (e.g., relation extraction and entity typing), but also improve link prediction in knowledge graphs.
    • Compared with similar methods, ours not only bring the best results, but also eliminate overheads for fusion in previous models.

Neural Snowball for Few-Shot Relation Learning
Tianyu Gao, Xu Han, Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin, Maosong Sun.
To appear in proc. of AAAI 2020

  • A new bootstrapping method for relation extraction combining snowball process and neural networks.
    • For a new relation and a handful of given examples, neural snowball allows us to collect reliable training instances from unlabeled corpora and iteratively train new relation classifiers.
    • By using Relational Siamese Networks (RSN), we can take the better generalization abilities of neural networks and pick high-confidence instances for the new relation by comparing with given seeds.

FewRel 2.0: Towards More Challenging Few-Shot Relation Classification
Tianyu Gao, Xu Han, Hao Zhu, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou.
In proc. of EMNLP 2019

  • Investigate two challenges in few-shot learning:
    • Can few-shot models adapt to a new domain with only a handful of instances
    • Can they detect none-of-the-above queries
  • Propose baselines to tackle the above two challenges.
  • Release the upgrade version of FewRel, the largest annotated relation classification dataset.

OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction
Xu Han, Tianyu Gao, Yuan Yao, Deming Ye, Zhiyuan Liu, Maosong Sun.
In proc. of EMNLP 2019

  • An open-source toolkit for relation extraction (RE), including sentence-level RE, bag-level RE (distantly-supervised RE) and few-shot RE.
  • Unified framework, modular design, easy to use and very extensible.
  • Also include an online DEMO system!

Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification
Tianyu Gao*, Xu Han*, Zhiyuan Liu and Maosong Sun. (* indicates equal contribution)
In proc. of 33rd AAAI Conference on Artificial Intelligence, 2019

  • Tackle the problem of few-shot relation classification in noisy scenario.
  • Add instance-level attention and feature-level attention to Prototypical Networks.
  • Achieve great improvement on accuracy and convergence speed compared to baseline models.

Projects


OpenNRE: An Open-Source Package for Neural Relation Extraction link

  • An open-source toolkit for relation extraction (RE), including sentence-level RE, bag-level RE (distantly-supervised RE) and few-shot RE.
  • Unified framework, modular design, easy to use and very extensible.

NREPapers: A Paper List for Neural Relation Extraction link

  • Include important papers in relation extraction and will continue to be updated.

FewRel: FewRel Dataset, Toolkits and Baseline Models link

  • FewRel is a large-scale few-shot relation extraction dataset.
  • I helped with implementing the toolkits and several baseline models. Also, I led the development of the second version of this dataset (FewRel 2.0).

Experiences


Tsinghua Natural Language Processing and Computational Social Science Lab. As undergraduate student researcher. Jan. 2018 - Present

  • Directed by Prof. Zhiyuan Liu.
  • Research on natural language processing and machine learning.

Montreal Institution for Learning Algorithm. _As Research Intern. Jul. 2019 - Sept. 2019

  • Directed by Prof. Jian Tang.
  • Research on knowledge graph embedding and pre-training language models.

WeChat, Tencent. Fellow of Xiniuniao Talent Program. May. 2019 - Present

  • Research on natural language processing and machine learning.

Momenta. As intern. May. 2017 - May. 2018

  • Research on semantic segmentation.