[Github] https://github.com/createmomo (You can also find the Wechat Public Account here)
Most blog posts have both English and Chinese versions, with only a few exceptions written solely in Chinese.
Collections (Post Series)
- [2025-2026, Planning…]
- Language Model Learning & Practice (Student-Oriented Series) (TBD)
- Current Trends in Goal-Guided Conversational AI Models (TBD)
- Using Language Models in Specific Domains (TBD)
- [2023-2025] Past of Goal-Guided Conversational AI Models
- 1 Introduction
- 2 Target-Guided Open-Domain Conversation (ACL2019)
- 3 Proactive Human Machine Conversation with Explicit Conversation Goal (ACL2019)
- 4 Towards Conversational Recommendation Over Multi-Type Dialogs (ACL2020)
- 5 Knowledge Graph Grounded Goal Planning for Open-Domain Conversation Generation (AAAI2020)
- 6 Towards Topic Guided Conversational Recommender System (Arxiv 2020)
- 7 Towards Effective Automatic Debt Collection with Persona Awareness (EMNLP2023)
- 8 Reinforcement Learning of Cooperative Persuasive Dialogue Policies Using Framing (COLING 2014)
- 9 Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing (LREC 2018)
- 10 Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good (ACL2019)
- 11 Quick Review
- 12 OTTers: One-turn Topic Transitions for Open-Domain Dialogue (ACL&IJCNLP 2021)
- 13 Towards a Universal NLG for Dialogue Systems and Simulators with Future Bridging(2021)
- 14 SalesBot: Transitioning from Chit-Chat to Task-Oriented Dialogues (ACL, 2022)
- [2024-2025] Use AI to Detect AI-Generated Text
- 1 Introduction
- 2 Testbed1~2
- 3 Testbed3~4
- 4 Testbed5~8
- 5 Key Experimental Details
- 6 Results, ChatGPT&Human
- 7 Results, Testbed1
- 8 Results, Testbed2~4
- 9 Results, Testbed5
- 10 Results, Testbed6 & Tips for Setting Classification Thresholds
- 11 Results, Testbed7
- 12 Results, Testbed8,Attempt to Evade being Detected by AI
- 13 Some interesting analyses
- 14 Ghostbuster
- 15 Ghostbuster (Results)
- 16 Ghostbuster (Make the Detector More Robust)
- 17 Ghostbuster (Again How to Avoid Being Detected by AI)
- [2023-2024] Using Language Models in Specific Domains
- 1 Introduction
- 2 Domain-specific Training Data
- [Any Domain] Use Unlabelled Text to Improve Instruction Following Language Models (Notes 1, 2, 3, 4, 5)
- [Medical/Health] ChatDoctor (Notes 1, 2, 3; Slides 1, 2, 3)
- [Medical/Health] MedicalGPT-zh (Notes, Slides)
- [Medical/Health] MING (Notes, Slides)
- [Medical/Health] SoulChat (Notes, Slides)
- [Mobile Interaction] Tiny Models, Mighty Powers - ReALM (1, 2, 3, 4)
- 3 Automatic Model Evaluation
- [Any Domain] Evaluating Language Models with Language Models (1 Introduction)
- [Any Domain] Evaluating Language Models with Language Models (2 PandaLM)
- [Any Domain] Evaluating Language Models with Language Models (3 Shepherd, 1,2,3,4)
- [Medical/Health] Comparing ChatDoctor and ChatGPT3.5 using BERT-Score
- [2024] Tiny Models, Mighty Powers - ReALM (1, 2, 3, 4)
- [2023-2024] Use Unlabelled Text to Improve Instruction Following Language Models
- [2023] Evaluating Language Models with Language Models
- [2022-2024] Conference “Interesting”s
- Interesting · LREC-COLING&NAACL2024
- Interesting · ACL2023
- Interesting · EACL2023
- Interesting · EMNLP2023
- Interesting · ACL2022 (Findings)
- Interesting · ACL2022 (Short Papers)
- Interesting · ACL2022 (Long Papers)
- Interesting · NeurIPS2022
- Interesting · EMNLP2022 (Findings&CL Papers)
- Interesting · EMNLP2022 (Main Conference)
- [2023] Chinese Natural Language Understanding, NLU, in Dialogue Systems
- [2021] Fantastic Trees (Decision Trees, Random Forest, Adaboost, Gradient Boosting DT, XGBoost)
- [2020] Improving Your English Communication Skills (Writing Emails, Speaking English and Building ePortfolio)
- [2017] CRF Layer on the Top of BiLSTM
- CRF Layer on the Top of BiLSTM - 1 Outline and Introduction
- CRF Layer on the Top of BiLSTM - 2 CRF Layer (Emission and Transition Score)
- CRF Layer on the Top of BiLSTM - 3 CRF Loss Function
- CRF Layer on the Top of BiLSTM - 4 Real Path Score
- CRF Layer on the Top of BiLSTM - 5 The Total Score of All the Paths
- CRF Layer on the Top of BiLSTM - 6 Infer the Labels for a New Sentence
- CRF Layer on the Top of BiLSTM - 7 Chainer Implementation Warm Up
- CRF Layer on the Top of BiLSTM - 8 Demo Code
Notes (Single Post)
- [2023-2025] Open-Source Language Model Pocket
- [2023] Using ColossalAI SFT in Kaggle or Colab (in Chinese)
- [2023] General Understanding of Decoding Strategies Commonly Used in Text Generation
- [2022] Understand Gradient Checkpoint in Pytorch
- [2022] Baidu World Conference 2022: AI Applications (in Chinese, Notes)
- [2021] GPT Understands, Too
- [2021] Super Git Revision Notes
- [2021] Baidu World Conference 2021: AI Applications (in Chinese, Notes)
- [2019] Probabilistic Graphical Models Revision Notes
- [2018] Super Machine Learning Revision Notes
Paper Explained
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part1
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part2
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part3
Detailed Links:
* Fantastic Trees (Decision Trees, Random Forest, Adaboost, Gradient Boosting DT, XGBoost)
* Probabilistic Graphical Models Revision Notes
- Representations
- Bayesian Network (directed graph)
- Markov Network (undirected graph)
- Inference
- Learning
* Super Machine Learning Revision Notes
- Activation Functions
- Gradient Descent
- Computation Graph
- Backpropagation
- Gradients for L2 Regularization (weight decay)
- Vanishing/Exploding Gradients
- Mini-Batch Gradient Descent
- Stochastic Gradient Descent
- Choosing Mini-Batch Size
- Gradient Descent with Momentum (always faster than SGD)
- Gradient Descent with RMSprop
- Adam (put Momentum and RMSprop together)
- Learning Rate Decay Methods
- Batch Normalization
- Parameters
- Regularization
- Models
- Logistic Regression
- Multi-Class Classification (Softmax Regression)
- Transfer Learning
- Multi-task Learning
- Convolutional Neural Network (CNN)
- Sequence Models
- Transformer (Attention Is All You Need)
- Bidirectional Encoder Representations from Transformers (BERT)
- Practical Tips
[2017] CRF Layer on the Top of BiLSTM (BiLSTM-CRF)
- CRF Layer on the Top of BiLSTM - 1 Outline and Introduction
- CRF Layer on the Top of BiLSTM - 2 CRF Layer (Emission and Transition Score)
- CRF Layer on the Top of BiLSTM - 3 CRF Loss Function
- CRF Layer on the Top of BiLSTM - 4 Real Path Score
- CRF Layer on the Top of BiLSTM - 5 The Total Score of All the Paths
- CRF Layer on the Top of BiLSTM - 6 Infer the Labels for a New Sentence
- CRF Layer on the Top of BiLSTM - 7 Chainer Implementation Warm Up
- CRF Layer on the Top of BiLSTM - 8 Demo Code