A summary of the automatic evaluation metric for natural language generation (NLG) applications.
The human evaluation considers the aspects of adequacy, fidelity, and fluency, but it is quite expensive.
- Adequacy: Does the output convey the same meaning as the input sentence? Is part of the message lost, added, or distorted?
- Fluency: Is the output good fluent English? This involves both grammatical correctness and idiomatic word choices.
Thus, a useful metric for automatic evaluation in NLG applications holds the promise, such as machine translation, text summarization, image captioning, dialogue generation, poetry/story generation, etc.