rflashtext can be used to find and replace words in a given text with only one pass over the document.
It’s a R implementation of the FlashText algorithm and it’s inspired on the python library flashtext.
You can install the released version of rflashtext from CRAN with:
install.packages("rflashtext")
And the development version from GitHub with:
install.packages("devtools")
::install_github("AbrJA/rflashtext") devtools
This is a basic example which shows you how to use the API:
library(rflashtext)
<- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor $show_trie()
processor#> [1] "{\"L\":{\"A\":{\"_word_\":\"Los Angeles\"}},\"N\":{\"Y\":{\"_word_\":\"New York\"}}}"
$add_keys_words(keys = c("TX", "CA"), words = c("Texas", "California"))
processor$show_trie()
processor#> [1] "{\"C\":{\"A\":{\"_word_\":\"California\"}},\"L\":{\"A\":{\"_word_\":\"Los Angeles\"}},\"N\":{\"Y\":{\"_word_\":\"New York\"}},\"T\":{\"X\":{\"_word_\":\"Texas\"}}}"
<- processor$find_keys(sentences = c("I live in LA and I like NY", "Have you been in TX?"))
words_found
words_found#> [[1]]
#> [[1]]$word
#> [1] "Los Angeles" "New York"
#>
#> [[1]]$start
#> [1] 11 25
#>
#> [[1]]$end
#> [1] 12 26
#>
#>
#> [[2]]
#> [[2]]$word
#> [1] "Texas"
#>
#> [[2]]$start
#> [1] 18
#>
#> [[2]]$end
#> [1] 19
::rbindlist(words_found)
data.table#> word start end
#> 1: Los Angeles 11 12
#> 2: New York 25 26
#> 3: Texas 18 19
$replace_keys(sentences = c("I live in LA and I like NY", "Have you been in TX?"))
processor#> [1] "I live in Los Angeles and I like New York"
#> [2] "Have you been in Texas?"
To see more details about the performance of the algorithm, click here.