Technology

Rouge

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is the standard NLP metric for objectively evaluating text generation models (summarization, translation) by measuring content overlap.

ROUGE is your core evaluation tool for Natural Language Processing (NLP) generation tasks. It is a set of recall-focused metrics that quantify the quality of machine-generated text against human-created reference summaries. The system uses specific variants: ROUGE-N measures n-gram overlap (e.g., ROUGE-1 for unigrams, ROUGE-2 for bigrams), and ROUGE-L uses the Longest Common Subsequence (LCS) for flow and structure. This metric is a benchmark: ROUGE-1 and ROUGE-L scores typically show a 0.6–0.8 Kendall Tau-b correlation with human judgment, providing a reliable, automated proxy for content fidelity.

https://github.com/google-research/google-research/tree/master/rouge

1 project · 1 city

Related technologies

LangChain 437 LangGraph 62 NLTK 4 Python 613

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Wittgenlab: LLM Evaluation Framework

Bogotá

LangChain Python