CRF Layer on the Top of BiLSTM - 4

2.4 Real path score

In section 2.3, we supposed that every possible path has a score $ P_{i} $ and there are totally $ N $ possible paths, the total score of all the paths is $ P_{total} = P_1 + P_2 + … + P_N = e^{S_1} + e^{S_2} + … + e^{S_N} $, $ e $ is the mathematical constant $ e $.

Obviously, there must be a path is the real one among all the possible paths. For exmaple, the real path of the sentence in section 1.2 is “START B-Person I-Person O B-Organization O END”. The others are incorrect such as “START B-Person B-Organization O I-Person I-Person B-Person”. $ e^{S_i} $ is the score of $ i^{th} $ path.

During the training process, the crf loss function only need two scores: the score of the real path and the total score of all the possbile paths. The proportion of the real path score among the scores of all the possible paths will be increased gradually.

The calculation of a real path score, $e^{S_i}$, is very straightforward.

Here we focus on the calculation of $ S_i $.

Take the real path, “START B-Person I-Person O B-Organization O END”, we used before, for example:

  • We have a sentence which has 5 words, $w_1, w_2, w_3, w_4, w_5$
  • We add two more extra words which denote the start and the end of a sentence, $w_0, w_6$
  • $S_i$ consists of 2 parts: $S_i = EmissionScore + TransitionScore $ (The emission and transition score are expanined in section 2.1 and 2.2)

Emission Score:

  • $ x_{index,label} $ is the score if the $index^{th}$ word is labelled by $ label $

  • These scores $ x_{1,B-Person} $ $ x_{2,I-Person} $ $ x_{3,O} $ $ x_{4,Organization} $ $ x_{5,O} $ are from the previous BiLSTM output.

  • As for the $ x_{0,START} $ and $ x_{6,END} $, we can just set them zeros.

Transition Score:
$t_{START->B-Person} + t_{B-Person->I-Person} + $
$t_{I-Person->O} + t_{0->B-Organization} + t_{B-Organization->O} + t_{O->END}$

  • $t_{label1->label2}$ is the transition score from $label1$ to $label2$
  • These scores come from the CRF Layer. In other words, these transition scores are actually the parameters of CRF Layer.

To sum up, now we can calculate $S_i$ and as well as the path score $e^{S_i}$. The next step is how to calculate the total score of all the possible paths?


2.5 The total score of all the possible paths

How to calculate the total score of all the possible paths of a sentence with a step-by-step toy example.

This section would be one of the most important and a bit difficult part. But DO NOT worry. The toy example given in this section will explain the details as simple as possible.

(Sorry for my late update, I will try my best to squeeze time for updating the following sections.)


[1] Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K. and Dyer, C., 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360.


Please note that: The Wechat Public Account is available now! If you found this article is useful and would like to found more information about this series, please subscribe to the public account by your Wechat! (2020-04-03)
QR Code

When you reprint or distribute this article, please include the original link address.