[Github] https://github.com/createmomo (You can also find the Wechat Public Account here)

**Notes**

- [2023, ing…] Open-Source Language Model Pocket
- [2023] General Understanding of Decoding Strategies Commonly Used in Text Generation
- [2022] Understand Gradient Checkpoint in Pytorch
- [2022] Baidu World Conference 2022: AI Applications (in Chinese, Notes)
- [2021] Super Git Revision Notes
- [2021] Baidu World Conference 2021: AI Applications (in Chinese, Notes)
- [2019] Probabilistic Graphical Models Revision Notes
- [2018] Super Machine Learning Revision Notes

**Collections**

- [2023, ing…] Using Language Models in Specific Domains
- [2023] Chinese Natural Language Understanding, NLU, in Dialogue Systems
- [2022] Conference “Interesting”s
- [2021] Fantastic Trees (Decision Trees, Random Forest, Adaboost, Gradient Boosting DT, XGBoost)
- [2020] Improving Your English Communication Skills (Writing Emails, Speaking English and Building ePortfolio)
- [2017] CRF Layer on the Top of BiLSTM
- CRF Layer on the Top of BiLSTM - 1 Outline and Introduction
- CRF Layer on the Top of BiLSTM - 2 CRF Layer (Emission and Transition Score)
- CRF Layer on the Top of BiLSTM - 3 CRF Loss Function
- CRF Layer on the Top of BiLSTM - 4 Real Path Score
- CRF Layer on the Top of BiLSTM - 5 The Total Score of All the Paths
- CRF Layer on the Top of BiLSTM - 6 Infer the Labels for a New Sentence
- CRF Layer on the Top of BiLSTM - 7 Chainer Implementation Warm Up
- CRF Layer on the Top of BiLSTM - 8 Demo Code

**Paper Explained**

- [2021] GPT Understands, Too
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part1
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part2
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part3

*Detailed Links:**** Fantastic Trees** (Decision Trees, Random Forest, Adaboost, Gradient Boosting DT, XGBoost)*** Probabilistic Graphical Models Revision Notes**

**Representations****Bayesian Network (directed graph)****Markov Network (undirected graph)**

**Inference****Learning**

*** Super Machine Learning Revision Notes**

**Activation Functions****Gradient Descent**- Computation Graph
- Backpropagation
- Gradients for L2 Regularization (weight decay)
- Vanishing/Exploding Gradients
- Mini-Batch Gradient Descent
- Stochastic Gradient Descent
- Choosing Mini-Batch Size
- Gradient Descent with Momentum (always faster than SGD)
- Gradient Descent with RMSprop
- Adam (put Momentum and RMSprop together)
- Learning Rate Decay Methods
- Batch Normalization

**Parameters****Regularization****Models**- Logistic Regression
- Multi-Class Classification (Softmax Regression)
- Transfer Learning
- Multi-task Learning
- Convolutional Neural Network (CNN)
- Sequence Models
- Transformer (Attention Is All You Need)
- Bidirectional Encoder Representations from Transformers (BERT)

**Practical Tips**

**[2017] CRF Layer on the Top of BiLSTM (BiLSTM-CRF)**

- CRF Layer on the Top of BiLSTM - 1 Outline and Introduction
- CRF Layer on the Top of BiLSTM - 2 CRF Layer (Emission and Transition Score)
- CRF Layer on the Top of BiLSTM - 3 CRF Loss Function
- CRF Layer on the Top of BiLSTM - 4 Real Path Score
- CRF Layer on the Top of BiLSTM - 5 The Total Score of All the Paths
- CRF Layer on the Top of BiLSTM - 6 Infer the Labels for a New Sentence
- CRF Layer on the Top of BiLSTM - 7 Chainer Implementation Warm Up
- CRF Layer on the Top of BiLSTM - 8 Demo Code