Fork me on GitHub

Evaluation Metrics of Named Entity Recognition

Here we briefly introduce some common evaluation metrics in NER tasks, considering both extracted boundary and entities.

Scenarios-that-NER-systems-predict

Exact Match

  • 1) Surface entity and type match (Both entity boundary and type are correct)
  • 2) System hypothesized an entity (predict entity that does not exist in ground truth)
  • 3) Systems miss an entity (entity exists in ground truth, but is not predicted by NER system)

Partial Match (Overlapping)

  • 4) Wrong entity type ( correct entity boundary, type disagree)
  • 5) Wrong boundaries (boundary overlap)
  • 6) Wrong boundaries and wrong entity type

Evaluation Metrics

CoNLL-2003: Computational Natural Language Learning

Automatic Content Extraction (ACE)

Message Understanding Conference (MUC)

  • Consider both entity boundary and entity type
  • Correct (COR): match
  • Incorrect(INC):not match
  • Partial(PAR):predicted entity boundary overlap with golden annotation,but they are not the same
  • Missing(MIS):golden annotation boundary is not identified (predictee do not have, but golden label do)
  • Spurius(SPU):predicted entity boundary does not exist in golden annotation(predictee have, but golden label do not)
  • See MUC-5 EVALUATION METRICS
  • Implementation in python version

SemEval‘13

  • Strict:Exact match (Both entity boundary and type are correct)
  • Exact boundary matching:predicted entity boundary is correct, regardless of entity boundary
  • Partial boundary matching:entity boundaries overlap, regardless of entity boundary
  • Type matching:some overlap between the system tagged entity and the gold annotation is required;

Scenario Golden Standard NER system prediction Measure
Entity Type Entity Boundary (Surface String) Entity Type Entity Boundary (Surface String) Type Partial Exact Strict
III MUSIC_NAME 告白气球 MIS MIS MIS MIS
II MUSIC_NAME 年轮 SPU SPU SPU SPU
V MUSIC_NAME 告白气球 MUSIC_NAME 一首告白气球 COR PAR INC INC
IV MUSIC_NAME 告白气球 SINGER 告白气球 INC COR COR INC
I MUSIC_NAME 告白气球 MUSIC_NAME 告白气球 COR COR COR COR
VI MUSIC_NAME 告白气球 SINGER 一首告白气球 INC PAR INC INC

Number of golden standard:

Number of predictee:

Exact match(i.e. Strict, Exact)

Partial match (i.e. Partial, Type)

F-measure

Measure Type Partial Exact Strict
Correct 2 2 2 1
Incorrect 2 0 2 3
Partial 0 2 0 0
Missed 1 1 1 1
Spurius 1 1 1 1
Precision 0.4 0.6 0.4 0.2
Recall 0.4 0.6 0.4 0.2
F1 score 0.4 0.6 0.4 0.2

Pypi library eval4ner installation: pip install -U eval4ner

For attribution in academic contexts, please cite this work as:

1
2
3
4
5
6
@misc{chai2021NER-eval,
author = {Chai, Yekun},
title = {{Evaluation Metrics of Named Entity Recognition}},
year = {2021},
howpublished = {\url{https://cyk1337.github.io/notes/2018/11/21/NLP/NER/NER-Evaluation-Metrics/}},
}

References