---
title: "sacRebleu"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{sacRebleu}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
This package aims to provide metrics to evaluate generated text. To this point, only the BLEU (bilingual evaluation understudy) score, introduced by [Papineni et al., 2002](https://aclanthology.org/P02-1040/),
is available. The library is implemented in 'R' and 'C++'. The metrics are implemented on the base of previous tokenization, so that lists with tokenized sequences are evaluated.
This package is inspired by the ['NLTK'](https://www.nltk.org/) and ['sacrebleu'](https://github.com/mjpost/sacrebleu) implementation for 'Python'.
# BLEU Score
The BLEU-score is a metric used to evaluate the quality of machine-generated texts by comparing them to
reference texts. It is calculated based on the precision of n-grams, which are contiguous sequences of n items, typically words.
Mathematically, BLEU can be expressed as follows:
\[
BLEU = \text{{BP}} \times \exp\left(\sum_{n=1}^{N} \frac{1}{N} \log \text{{precision}}_n\right)
\]
Where:
- \(\text{{BP}}\) is the brevity penalty, which penalizes if the candidate text is shorter than the reference texts.
It is defined as \(\exp(1 - \frac{{\text{{reference length}}}}{{\text{{output length}}}})\).
- \(N\) is the maximum n-gram order considered in the calculation.
- \(\text{{precision}}_n\) is the precision of n-grams, calculated as the ratio of the number of
n-grams in the candidate text that appear in any of the reference texts to the total number of n-grams in the candidate text.
\(\text{{precision}}_n\) is defined as the following:
\[
precision_n = \frac{\sum_{c \in \text{Cand}} ngram_{\text{clip}}(c)}{\sum_{r \in \text{Ref}_{\text{Cand}}} ngram(r)}
\]
Where $ngram_{\text{clip}}$ represents the count of n-grams in the candidate text that appear in any of the reference texts, while $ngram$ stands
for the total number of n-grams in the candidate sentence, ensuring they do not exceed the count of the reference n-grams. This procedure is
repeated for all 1 to N-grams.
In summary, the BLEU score provides a single numerical value indicating the quality of a candidate text, with higher scores indicating better quality.
# Smoothing
This package provides two smoothing techniques from [Chen et al., 2014](https://aclanthology.org/W14-3346/). The methods available in this
package are `floor` and `add-k`.
## `floor`
The precision of BLEU is calculated by dividing the sum of the n-grams. However, in some cases, the count of certain n-grams may be zero. To address
this issue, a small value (denoted as $\epsilon$) is added to the numerator of the precision calculation when the count is zero.
## `add-k`
Similar to the motivation behind the `floor` method, the `add-k` smoothing technique involves adding an integer value ($k$) to the overall sum of
the numerator and the denominator of the precision calculation for each 1..N-gram.
# Example
```{r}
library(sacRebleu)
cand_corpus <- list(c(1,2,3), c(1,2))
ref_corpus <- list(list(c(1,2,3), c(2,3,4)), list(c(1,2,6), c(781, 21, 9), c(7, 3)))
bleu_corpus_ids_standard <- bleu_corpus_ids(ref_corpus, cand_corpus)
```
Here, the text is already tokenized and represented through integers in the 'cand_corpus' and 'ref_corpus' lists. For tokenization, the ['tok'](https://cran.r-project.org/package=tok) package is recommended.