[Github] https://github.com/createmomo (You can also find the Wechat Public Account here)
Notes
- [2022] Understand Gradient Checkpoint in Pytorch
- [2021] Super Git Revision Notes
- [2021] Baidu World Conference 2021: AI Applications (in Chinese, Notes)
- [2019] Probabilistic Graphical Models Revision Notes
- [2018] Super Machine Learning Revision Notes
Collections
- [2021] Fantastic Trees (Decision Trees, Random Forest, Adaboost, Gradient Boosting DT, XGBoost)
- [2020] Improving Your English Communication Skills (Writing Emails, Speaking English and Building ePortfolio)
- [2017] CRF Layer on the Top of BiLSTM
- CRF Layer on the Top of BiLSTM - 1 Outline and Introduction
- CRF Layer on the Top of BiLSTM - 2 CRF Layer (Emission and Transition Score)
- CRF Layer on the Top of BiLSTM - 3 CRF Loss Function
- CRF Layer on the Top of BiLSTM - 4 Real Path Score
- CRF Layer on the Top of BiLSTM - 5 The Total Score of All the Paths
- CRF Layer on the Top of BiLSTM - 6 Infer the Labels for a New Sentence
- CRF Layer on the Top of BiLSTM - 7 Chainer Implementation Warm Up
- CRF Layer on the Top of BiLSTM - 8 Demo Code
Paper Explained
- [2021] GPT Understands, Too
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part1
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part2
- [2021] Few-Shot Text Classification with Distributional Signatures (ICLR 2020) Part3
Detailed Links:
* Fantastic Trees (Decision Trees, Random Forest, Adaboost, Gradient Boosting DT, XGBoost)
* Probabilistic Graphical Models Revision Notes
- Representations
- Bayesian Network (directed graph)
- Markov Network (undirected graph)
- Inference
- Learning
* Super Machine Learning Revision Notes
- Activation Functions
- Gradient Descent
- Computation Graph
- Backpropagation
- Gradients for L2 Regularization (weight decay)
- Vanishing/Exploding Gradients
- Mini-Batch Gradient Descent
- Stochastic Gradient Descent
- Choosing Mini-Batch Size
- Gradient Descent with Momentum (always faster than SGD)
- Gradient Descent with RMSprop
- Adam (put Momentum and RMSprop together)
- Learning Rate Decay Methods
- Batch Normalization
- Parameters
- Regularization
- Models
- Logistic Regression
- Multi-Class Classification (Softmax Regression)
- Transfer Learning
- Multi-task Learning
- Convolutional Neural Network (CNN)
- Sequence Models
- Transformer (Attention Is All You Need)
- Bidirectional Encoder Representations from Transformers (BERT)
- Practical Tips
[2017] CRF Layer on the Top of BiLSTM (BiLSTM-CRF)
- CRF Layer on the Top of BiLSTM - 1 Outline and Introduction
- CRF Layer on the Top of BiLSTM - 2 CRF Layer (Emission and Transition Score)
- CRF Layer on the Top of BiLSTM - 3 CRF Loss Function
- CRF Layer on the Top of BiLSTM - 4 Real Path Score
- CRF Layer on the Top of BiLSTM - 5 The Total Score of All the Paths
- CRF Layer on the Top of BiLSTM - 6 Infer the Labels for a New Sentence
- CRF Layer on the Top of BiLSTM - 7 Chainer Implementation Warm Up
- CRF Layer on the Top of BiLSTM - 8 Demo Code