[Github] https://github.com/createmomo (You can also find the Wechat Public Account here)
Starting from April 1, 2026, most posts will be available in English, German, and Chinese versions. For blog posts published up to March 2026, most are available in both English and Chinese, with only a few exceptions written solely in Chinese.
Collections (Post Series)
- [2026] AI Agents
- [2024-2025] Use AI to Detect AI-Generated Text
- 1 Introduction
- 2 Testbed1~2
- 3 Testbed3~4
- 4 Testbed5~8
- 5 Key Experimental Details
- 6 Results, ChatGPT&Human
- 7 Results, Testbed1
- 8 Results, Testbed2~4
- 9 Results, Testbed5
- 10 Results, Testbed6 & Tips for Setting Classification Thresholds
- 11 Results, Testbed7
- 12 Results, Testbed8,Attempt to Evade being Detected by AI
- 13 Some interesting analyses
- 14 Ghostbuster
- 15 Ghostbuster (Results)
- 16 Ghostbuster (Make the Detector More Robust)
- 17 Ghostbuster (Again How to Avoid Being Detected by AI)
- [2023-2024] Using Language Models in Specific Domains
- 1 Introduction
- 2 Domain-specific Training Data
- [Any Domain] Use Unlabelled Text to Improve Instruction Following Language Models (Notes 1, 2, 3, 4, 5)
- [Medical/Health] ChatDoctor (Notes 1, 2, 3; Slides 1, 2, 3)
- [Medical/Health] MedicalGPT-zh (Notes, Slides)
- [Medical/Health] MING (Notes, Slides)
- [Medical/Health] SoulChat (Notes, Slides)
- [Mobile Interaction] Tiny Models, Mighty Powers - ReALM (1, 2, 3, 4)
- 3 Automatic Model Evaluation
- [Any Domain] Evaluating Language Models with Language Models (1 Introduction)
- [Any Domain] Evaluating Language Models with Language Models (2 PandaLM)
- [Any Domain] Evaluating Language Models with Language Models (3 Shepherd, 1,2,3,4)
- [Medical/Health] Comparing ChatDoctor and ChatGPT3.5 using BERT-Score
- [2024] Tiny Models, Mighty Powers - ReALM (1, 2, 3, 4)
- [2023-2024] Use Unlabelled Text to Improve Instruction Following Language Models
- [2023] Evaluating Language Models with Language Models
- [2023-2025] Past of Goal-Guided Conversational AI Models
- 1 Introduction
- 2 Target-Guided Open-Domain Conversation (ACL2019)
- 3 Proactive Human Machine Conversation with Explicit Conversation Goal (ACL2019)
- 4 Towards Conversational Recommendation Over Multi-Type Dialogs (ACL2020)
- 5 Knowledge Graph Grounded Goal Planning for Open-Domain Conversation Generation (AAAI2020)
- 6 Towards Topic Guided Conversational Recommender System (Arxiv 2020)
- 7 Towards Effective Automatic Debt Collection with Persona Awareness (EMNLP2023)
- 8 Reinforcement Learning of Cooperative Persuasive Dialogue Policies Using Framing (COLING 2014)
- 9 Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing (LREC 2018)
- 10 Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good (ACL2019)
- 11 Quick Review
- 12 OTTers: One-turn Topic Transitions for Open-Domain Dialogue (ACL&IJCNLP 2021)
- 13 Towards a Universal NLG for Dialogue Systems and Simulators with Future Bridging(2021)
- 14 SalesBot: Transitioning from Chit-Chat to Task-Oriented Dialogues (ACL, 2022)
- [2022-2024] Conference “Interesting”s
- Interesting · LREC-COLING&NAACL2024
- Interesting · ACL2023
- Interesting · EACL2023
- Interesting · EMNLP2023
- Interesting · ACL2022 (Findings)
- Interesting · ACL2022 (Short Papers)
- Interesting · ACL2022 (Long Papers)
- Interesting · NeurIPS2022
- Interesting · EMNLP2022 (Findings&CL Papers)
- Interesting · EMNLP2022 (Main Conference)
- [2023] Chinese Natural Language Understanding, NLU, in Dialogue Systems
- [2021] Fantastic Trees (Decision Trees, Random Forest, Adaboost, Gradient Boosting DT, XGBoost)
- [2020] Improving Your English Communication Skills (Writing Emails, Speaking English and Building ePortfolio)
- [2017] CRF Layer on the Top of BiLSTM
- CRF Layer on the Top of BiLSTM - 1 Outline and Introduction
- CRF Layer on the Top of BiLSTM - 2 CRF Layer (Emission and Transition Score)
- CRF Layer on the Top of BiLSTM - 3 CRF Loss Function
- CRF Layer on the Top of BiLSTM - 4 Real Path Score
- CRF Layer on the Top of BiLSTM - 5 The Total Score of All the Paths
- CRF Layer on the Top of BiLSTM - 6 Infer the Labels for a New Sentence
- CRF Layer on the Top of BiLSTM - 7 Chainer Implementation Warm Up
- CRF Layer on the Top of BiLSTM - 8 Demo Code

Notes (Single Post)
- [2023-2025] Open-Source Language Model Pocket
- [2023] Using ColossalAI SFT in Kaggle or Colab (in Chinese)
- [2023] General Understanding of Decoding Strategies Commonly Used in Text Generation
- [2022] Understand Gradient Checkpoint in Pytorch
- [2022] Baidu World Conference 2022: AI Applications (in Chinese, Notes)
- [2021] GPT Understands, Too
- [2021] Super Git Revision Notes
- [2021] Baidu World Conference 2021: AI Applications (in Chinese, Notes)
- [2019] Probabilistic Graphical Models Revision Notes
- [2018] Super Machine Learning Revision Notes
Paper Explained
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part1
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part2
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part3
Detailed Links:
* Fantastic Trees (Decision Trees, Random Forest, Adaboost, Gradient Boosting DT, XGBoost)
* Probabilistic Graphical Models Revision Notes
- Representations
- Bayesian Network (directed graph)
- Markov Network (undirected graph)
- Inference
- Learning

* Super Machine Learning Revision Notes
- Activation Functions
- Gradient Descent
- Computation Graph
- Backpropagation
- Gradients for L2 Regularization (weight decay)
- Vanishing/Exploding Gradients
- Mini-Batch Gradient Descent
- Stochastic Gradient Descent
- Choosing Mini-Batch Size
- Gradient Descent with Momentum (always faster than SGD)
- Gradient Descent with RMSprop
- Adam (put Momentum and RMSprop together)
- Learning Rate Decay Methods
- Batch Normalization
- Parameters
- Regularization
- Models
- Logistic Regression
- Multi-Class Classification (Softmax Regression)
- Transfer Learning
- Multi-task Learning
- Convolutional Neural Network (CNN)
- Sequence Models
- Transformer (Attention Is All You Need)
- Bidirectional Encoder Representations from Transformers (BERT)
- Practical Tips

[2017] CRF Layer on the Top of BiLSTM (BiLSTM-CRF)
- CRF Layer on the Top of BiLSTM - 1 Outline and Introduction
- CRF Layer on the Top of BiLSTM - 2 CRF Layer (Emission and Transition Score)
- CRF Layer on the Top of BiLSTM - 3 CRF Loss Function
- CRF Layer on the Top of BiLSTM - 4 Real Path Score
- CRF Layer on the Top of BiLSTM - 5 The Total Score of All the Paths
- CRF Layer on the Top of BiLSTM - 6 Infer the Labels for a New Sentence
- CRF Layer on the Top of BiLSTM - 7 Chainer Implementation Warm Up
- CRF Layer on the Top of BiLSTM - 8 Demo Code
