[Github] https://github.com/createmomo (You can also find the Wechat Public Account here)

**Notes**

- [2023-, ing…] Open-Source Language Model Pocket
- [2023] Using ColossalAI SFT in Kaggle or Colab (in Chinese)
- [2023] General Understanding of Decoding Strategies Commonly Used in Text Generation
- [2022] Understand Gradient Checkpoint in Pytorch
- [2022] Baidu World Conference 2022: AI Applications (in Chinese, Notes)
- [2021] Super Git Revision Notes
- [2021] Baidu World Conference 2021: AI Applications (in Chinese, Notes)
- [2019] Probabilistic Graphical Models Revision Notes
- [2018] Super Machine Learning Revision Notes

**Collections**

- [2023-, ing…] Past of Goal-Guided Conversational AI Models
- 1 Introduction
- 2 Target-Guided Open-Domain Conversation (ACL2019)
- 3 Proactive Human Machine Conversation with Explicit Conversation Goal (ACL2019)
- 4 Towards Conversational Recommendation Over Multi-Type Dialogs (ACL2020)
- 5 Knowledge Graph Grounded Goal Planning for Open-Domain Conversation Generation (AAAI2020)
- 6 Towards Topic Guided Conversational Recommender System (Arxiv 2020)
- 7 Towards Effective Automatic Debt Collection with Persona Awareness (EMNLP2023)
- 8 Reinforcement Learning of Cooperative Persuasive Dialogue Policies Using Framing (COLING 2014)
- 9 Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing (LREC 2018)
- 10 Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good (ACL2019)

- [2023-, ing…] Using Language Models in Specific Domains
- 1 Introduction
- 2 Domain-specific Training Data
- 3 Automatic Model Evaluation
- [Any Domain] Evaluating Language Models with Language Models (1 Introduction)
- [Any Domain] Evaluating Language Models with Language Models (2 PandaLM)
- [Any Domain] Evaluating Language Models with Language Models (3 Shepherd, 1,2,3,4)
- [Medical/Health] Comparing ChatDoctor and ChatGPT3.5 using BERT-Score

- [2023-, ing…] Use Unlabelled Text to Improve Instruction Following Language Models
- [2023] Evaluating Language Models with Language Models
- [2023] Chinese Natural Language Understanding, NLU, in Dialogue Systems
- [2022-2023] Conference “Interesting”s
- [2021] Fantastic Trees (Decision Trees, Random Forest, Adaboost, Gradient Boosting DT, XGBoost)
- [2020] Improving Your English Communication Skills (Writing Emails, Speaking English and Building ePortfolio)
- [2017] CRF Layer on the Top of BiLSTM
- CRF Layer on the Top of BiLSTM - 1 Outline and Introduction
- CRF Layer on the Top of BiLSTM - 2 CRF Layer (Emission and Transition Score)
- CRF Layer on the Top of BiLSTM - 3 CRF Loss Function
- CRF Layer on the Top of BiLSTM - 4 Real Path Score
- CRF Layer on the Top of BiLSTM - 5 The Total Score of All the Paths
- CRF Layer on the Top of BiLSTM - 6 Infer the Labels for a New Sentence
- CRF Layer on the Top of BiLSTM - 7 Chainer Implementation Warm Up
- CRF Layer on the Top of BiLSTM - 8 Demo Code

**Paper Explained**

- [2021] GPT Understands, Too
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part1
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part2
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part3

*Detailed Links:**** Fantastic Trees** (Decision Trees, Random Forest, Adaboost, Gradient Boosting DT, XGBoost)*** Probabilistic Graphical Models Revision Notes**

**Representations****Bayesian Network (directed graph)****Markov Network (undirected graph)**

**Inference****Learning**

*** Super Machine Learning Revision Notes**

**Activation Functions****Gradient Descent**- Computation Graph
- Backpropagation
- Gradients for L2 Regularization (weight decay)
- Vanishing/Exploding Gradients
- Mini-Batch Gradient Descent
- Stochastic Gradient Descent
- Choosing Mini-Batch Size
- Gradient Descent with Momentum (always faster than SGD)
- Gradient Descent with RMSprop
- Adam (put Momentum and RMSprop together)
- Learning Rate Decay Methods
- Batch Normalization

**Parameters****Regularization****Models**- Logistic Regression
- Multi-Class Classification (Softmax Regression)
- Transfer Learning
- Multi-task Learning
- Convolutional Neural Network (CNN)
- Sequence Models
- Transformer (Attention Is All You Need)
- Bidirectional Encoder Representations from Transformers (BERT)

**Practical Tips**

**[2017] CRF Layer on the Top of BiLSTM (BiLSTM-CRF)**

- CRF Layer on the Top of BiLSTM - 1 Outline and Introduction
- CRF Layer on the Top of BiLSTM - 2 CRF Layer (Emission and Transition Score)
- CRF Layer on the Top of BiLSTM - 3 CRF Loss Function
- CRF Layer on the Top of BiLSTM - 4 Real Path Score
- CRF Layer on the Top of BiLSTM - 5 The Total Score of All the Paths
- CRF Layer on the Top of BiLSTM - 6 Infer the Labels for a New Sentence
- CRF Layer on the Top of BiLSTM - 7 Chainer Implementation Warm Up
- CRF Layer on the Top of BiLSTM - 8 Demo Code