About
Dr. Tao Ge is a Principal Science Lead at Microsoft in Redmond, leading research and development of state-of-the-art large language models (LLMs), spanning synthetic data creation, mid-/post-training of OpenAI models (GPT-4/5, and o3/o4-mini), and agentic approaches powering Microsoft products (e.g., Office/Copilot). Prior to his current role, Tao was a Principal Researcher at Tencent AI Lab (Seattle) and Microsoft Research Asia (MSRA) after earning his Ph.D. in Computer Science from Peking University.
Tao has published more than 60 papers at top AI/ML conferences. Two of his most known and widely adopted tech innovations are:
- 
    
Speculative Decoding: Tao pioneered the seminal study of Speculative Decoding beginning in 2021 (initially referred to as Aggressive Decoding). He was the first to introduce a separate drafter model to achieve lossless acceleration of Transformer decoding (first made public in March 2022), and he was also the first to coin the term “Speculative Decoding” for this speculative execution paradigm (made public in September 2022). His research was subsequently followed by the papers on Speculative Decoding/Sampling for LLMs from Google (first made public in November 2022) and DeepMind (made public in February 2023), sparking the surge of interest and adoption since mid-2023. Today, Speculative Decoding has become an industry standard for LLM inference acceleration, supported in major open-source frameworks (e.g., vLLM, PyTorch, ONNX) and widely integrated into production-scale deployments.
 - 
    
Persona-Driven Synthetic Data Creation: Tao proposed persona-driven synthetic data creation, a novel paradigm for scaling high-quality synthetic training data generation. This innovation has been widely recognized and adopted as a core synthetic data methodology in the development of leading LLMs, including (but not limited to):
 
Publications (*: equal contributions; ✉: corresponding author)
Tech Report
- 
    
DocReward: A Document Reward Model for Structuring and Stylizing
Junpeng Liu, Yuzhong Zhao, Bowen Cao, Jiayu Ding, Yilin Jia, Tengchao Lv, Yupan Huang, Shaohan Huang, Nan Yang, Li Dong, Lei Cui, Tao Ge, Xun Wang, Huitian Jiao, Sun Mao, FNU Kartik, Si-Qing Chen, Wai Lam, Furu Wei
 - 
    
Scaling Synthetic Data Creation with 1,000,000,000 Personas (a novel persona-driven synthetic data creation paradigm)
Tao Ge, Xin Chan, Xiaoyang Wang, Dian Yu, Haitao Mi, Dong Yu
 - 
    
Inference with Reference: Lossless Acceleration of Large Language Models (the innovation used in OpenAI’s Predicted Output)
Nan Yang, Tao Ge, Liang Wang, Binxing Jiao, Daxin Jiang, Linjun Yang, Rangan Majumder, Furu Wei
 - 
    
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding (an earlier tech report of Speculative Decoding)
Tao Ge, Heming Xia, Xin Sun, Si-Qing Chen, Furu Wei
 - 
    
Reaching Human-level Performance in Automatic Grammatical Error Correction (the milestone of human-level GEC)
Tao Ge, Furu Wei, Ming Zhou
 
Peer-reviewed
- 
    
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
Yuheng Zhang, Dian Yu, Tao Ge, Linfeng Song, Zhichen Zeng, Haitao Mi, Nan Jiang, Dong Yu
 - 
    
Router-Tuning: A Simple and Effective Approach for Dynamic Depth
Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, Dong Yu
 - 
    
Low-Bit Quantization Favors Undertrained LLMs
Xu Ouyang, Tao Ge✉, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu
 - 
    
    
Guangyue Peng, Tao Ge✉, Wen Luo, Wei Li, Houfeng Wang
 - 
    
    
Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, Furu Wei
 - 
    
ALYMPICS: Language Agents Meet Game Theory
Shaoguang Mao, Yuzhe Cai, Yan Xia, Wenshan Wu, Xun Wang, Fengyi Wang, Tao Ge, Furu Wei
 - 
    
Overview of the NLPCC 2024 Shared Task: Chinese Essay Discourse Logic Evaluation and Integration
Yuhao Zhou, Hongyi Wu, Xinshu Shen, Man Lan, Yuanbin Wu, Xiaopeng Bai, Shaoguang Mao, Tao Ge, Yan Xia
 - 
    
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Zhihan Zhang, Tao Ge, Zhenwen Liang, Wenhao Yu, Dian Yu, Mengzhao Jia, Dong Yu, Meng Jiang
 - 
    
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei
 - 
    
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao
 - 
    
    
Heming Xia, Zhe Yang, Qingxiu Dong, Peiyi Wang, Yongqi Li, Tao Ge, Tianyu Liu, Wenjie Li, Zhifang Sui
 - 
    
SCALE: Synergized Collaboration of Asymmetric Language Translation Engines
Xin Cheng, Xun Wang, Tao Ge, Si-Qing Chen, Furu Wei, Dongyan Zhao, Rui Yan
 - 
    
Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction
Dingyao Yu, Yang An, Wei Ye, xiongfeng xiao, Shaoguang Mao, Tao Ge, Shikun Zhang
 - 
    
Low-code LLM: Visual Programming over LLMs
Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei
 - 
    
    
Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge✉, Furu Wei, Heng Ji
 - 
    
In-context Autoencoder for Context Compression in a Large Language Model
Tao Ge✉, Jing Hu, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei
 - 
    
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation
Heming Xia*, Tao Ge✉*, Peiyi Wang, Si-Qing Chen, Furu Wei, Zhifang Sui
 - 
    
Extensible Prompts for Language Models on Zero-shot Language Style Customization
Tao Ge✉, Jing Hu, Li Dong, Shaoguang Mao, Yan Xia, Xun Wang, Si-Qing Chen, Furu Wei
 - 
    
Smart Word Suggestions for Writing Assistance
Chenshuo Wang, Shaoguang Mao, Tao Ge, Wenshan Wu, Xun Wang, Yan Xia, Jonathan Tien, Dongyan Zhao
 - 
    
    
Yuzhe Cai, Shaoguang Mao, Chenshuo Wang, Tao Ge, Wenshan Wu, Yan Xia, Chanjin Zheng, Qiang Guan
 - 
    
Overview of the NLPCC 2023 Shared Task: Chinese Essay Discourse Coherence Evaluation
Hongyi Wu, Xinshu Shen, Man Lan, Xiaopeng Bai, Yuanbin Wu, Aimin Zhou, Shaoguang Mao, Tao Ge, Yan Xia
 - 
    
Overview of CCL23-Eval Task 8: Chinese Essay Fluency Evaluation (CEFE) Task
Xinshu Shen, Hongyi Wu, Xiaopeng Bai, Yuanbin Wu, Aimin Zhou, Shaoguang Mao, Tao Ge, Yan Xia
 - 
    
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Tao Ge, Si-Qing Chen, Furu Wei
 - 
    
Plug and Play Knowledge Distillation for kNN-LM with External Logits
Xuyang Jin, Tao Ge✉, Furu Wei
 - 
    
    
Xin Sun, Tao Ge✉, Shuming Ma, Jingjing Li, Furu Wei, Houfeng Wang
 - 
    
Text Revision by On-the-Fly Representation Optimization
Jingjing Li, Zichao Li, Tao Ge, Irwin King, Michael Lyu
 - 
    
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression
Canwen Xu, Wangchunshu Zhou, Tao Ge✉, Ke Xu, Julian McAuley, Furu Wei
 - 
    
Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
Wangchunshu Zhou, Tao Ge✉, Canwen Xu, Ke Xu, Furu Wei
 - 
    
Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding
Xin Sun*, Tao Ge✉*, Furu Wei, Houfeng Wang
 - 
    
Blow the Dog Whistle: A Dataset for Cant Creation, Understanding and Decryption in Chinese
Canwen Xu*, Wangchunshu Zhou*, Tao Ge✉, Ke Xu, Julian McAuley, Furu Wei
 - 
    
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou, Canwen Xu, Tao Ge✉, Ke Xu, Julian McAuley, Furu Wei
 - 
    
UnihanLM: Coarse-to-Fine Chinese-Japanese Language Model Pretraining with the Unihan Database
Canwen Xu, Tao Ge, Chenliang Li, Furu Wei
 - 
    
    
Mengyun Chen*, Tao Ge✉*, Xingxing Zhang, Furu Wei, Ming Zhou
 - 
    
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu*, Wangchunshu Zhou*, Tao Ge✉, Ke Xu, Julian McAuley, Furu Wei, Ming Zhou
 - 
    
Pseudo-Bidirectional Decoding for Local Sequence Transduction
Wangchunshu Zhou, Tao Ge, Chang Mu, Ke Xu, Furu Wei, Ming Zhou
 - 
    
Improving Grammatical Error Correction with Machine Translation Pairs
Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou
 - 
    
Scheduled DropHead: A Regularization Method for Transformer Models
Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou, Ke Xu
 - 
    
Parallel Data Augmentation for Formality Style Transfer
Yi Zhang, Tao Ge, Xu Sun
 - 
    
Self-Adversarial Learning with Comparative Discrimination for Text Generation
Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou
 - 
    
Fact-aware Sentence Split and Rephrase with Permutation Invariant Training
Yinuo Guo, Tao Ge, Furu Wei
 - 
    
Bert-based Lexical Substitution
Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou
 - 
    
Automatic Grammatical Error Correction for Sequence-to-sequence Text Generation: An Empirical Study
Tao Ge, Xingxing Zhang, Furu Wei, Ming Zhou
 - 
    
    
Tao Ge, Qing Dou, Heng Ji, Lei Cui, Baobao Chang, Zhifang Sui, Furu Wei, Ming Zhou
 - 
    
Fluency Boost Learning and Inference for Neural Grammatical Error Correction
Tao Ge, Furu Wei, Ming Zhou
 - 
    
EventWiki: A Knowledge Base of Major Events
Tao Ge, Lei Cui, Baobao Chang, Zhifang Sui, Furu Wei, Ming Zhou
 - 
    
SeRI: A Dataset for Sub-event Relation Inference from an Encyclopedia
Tao Ge, Lei Cui, Baobao Chang, Zhifang Sui, Furu Wei, Ming Zhou
 - 
    
Event detection with Burst Information Networks
Tao Ge, Lei Cui, Baobao Chang, Zhifang Sui, Ming Zhou
 - 
    
News Stream Summarization using Burst Information Networks
Tao Ge, Lei Cui, Heng Ji, Baobao Chang, Sujian Li, Ming Zhou, Zhifang Sui
 - 
    
Discovering Concept-level Event Associations from a Text Stream
Tao Ge, Lei Cui, Heng Ji, Baobao Chang, Zhifang Sui
 - 
    
Towards Time-aware Knowledge Graph Completion
Tingsong Jiang, Tianyu Liu, Tao Ge, Lei Sha, Baobao Chang, Sujian Li, Zhifang Sui
 - 
    
Encoding Temporal Information for Time-aware Link Prediction
Tingsong Jiang, Tianyu Liu, Tao Ge, Lei Sha, Sujian Li, Baobao Chang, Zhifang Sui
 - 
    
One Tense per Scene: Predicting Tense in Chinese Conversations
Tao Ge, Heng Ji, Baobao Chang, Zhifang Sui
 - 
    
Bring you to the past: Automatic Generation of Topically Relevant Event Chronicles
Tao Ge, Wenzhe Pei, Heng Ji, Sujian Li, Baobao Chang, Zhifang Sui
 - 
    
An Effective Neural Network Model for Graph-based Dependency Parsing
Wenzhe Pei, Tao Ge, Baobao Chang
 - 
    
Exploiting task-oriented resources to learn word embeddings for clinical abbreviation expansion
Yue Liu, Tao Ge, Kusum S Mathews, Heng Ji, Deborah McGuinness
 - 
    
Max-Margin Tensor Neural Network for Chinese Word Segmentation
Wenzhe Pei, Tao Ge, Baobao Chang
 - 
    
A semi-supervised method for opinion target extraction
Tao Ge, Wenjie Li, Zhifang Sui
 - 
    
The CIPS-SIGHAN CLP 2014 Chinese Word Segmentation Bake-off
Huiming Duan, Zhifang Sui, Tao Ge
 - 
    
    
Tao Ge, Zhifang Sui, Baobao Chang
 - 
    
Event-Based Time Label Propagation for Automatic Dating of News Articles
Tao Ge, Baobao Chang, Sujian Li, Zhifang Sui