Fork me on GitHub

Evaluation metrics of Name Entity Recognition systems

Here we briefly introduce some common evaluation metrics in NER tasks, considering both extracted boundary and entities.

Scenarios-that-NER-systems-predict

A. exact match

  • 1) Surface entity and type match (Both entity boundary and type are correct)
  • 2) System hypothesized an entity (predict entity that does not exist in ground truth)
  • 3) Systems miss an entity (entity exists in ground truth, but is not predicted by NER system)

B. partial match (overlapping)

  • 4) Wrong entity type ( correct entity boundary, type disagree)
  • 5) Wrong boundaries (boundary overlap)
  • 6) Wrong boundaries and wrong entity type

Evaluation-metrics

1. CoNLL-2003: Computational Natural Language Learning

2. Automatic Content Extraction (ACE)

4. SemEval‘13

  • Strict:Exact match (Both entity boundary and type are correct)
  • Exact boundary matching:predicted entity boundary is correct, regardless of entity boundary
  • Partial boundary matching:entity boundaries overlap, regardless of entity boundary
  • Type matching:some overlap between the system tagged entity and the gold annotation is required;

Scenario Golden Standard NER system prediction Measure
Entity Type Entity Boundary (Surface String) Entity Type Entity Boundary (Surface String) Type Partial Exact Strict
III MUSIC_NAME 告白气球 MIS MIS MIS MIS
II MUSIC_NAME 年轮 SPU SPU SPU SPU
V MUSIC_NAME 告白气球 MUSIC_NAME 一首告白气球 COR PAR INC INC
IV MUSIC_NAME 告白气球 SINGER 告白气球 INC COR COR INC
I MUSIC_NAME 告白气球 MUSIC_NAME 告白气球 COR COR COR COR
VI MUSIC_NAME 告白气球 SINGER 一首告白气球 INC PAR INC INC

number of golden standard:

number of predictee:

Exact match(i.e. Strict, Exact):

Partial match (i.e. Partial, Type):

F-measure

Measure Type Partial Exact Strict
Correct 2 2 2 1
Incorrect 2 0 2 3
Partial 0 2 0 0
Missed 1 1 1 1
Spurius 1 1 1 1
Precision 0.4 0.6 0.4 0.2
Recall 0.4 0.6 0.4 0.2
F1 score 0.4 0.6 0.4 0.2

My implementation


References

Thanks for your reward!