Keyword in context for Portuguese texts — pt

pt_kwic is an useful function that allows you to extract words placed before and after a keyword. It is similar to kwic from quanteda package, with two important differences: it is dedicated to work only with Portuguese texts in a way that ignores diacritics and is case insensitive; also it returns each word in a separate column as a default.

pt_kwic(
  string,
  id_decision = NULL,
  keyword = NULL,
  before = 5,
  after = 5,
  unite = TRUE
)

Arguments

string	a vector of texts from which to search for the keyword.
id_decision	a vector of id_decisions. If ommited, it defaults to text1,text2, text3...,
keyword	you can provide a regex expression or a vector of regex expressions.
before	Number of words before the keyword. Default is 5
after	Number of words after the keyword. Default is 5
unite	if FALSE, places every previous and posterior word in separate column

Value

a tbl with id_decision, keyword location (start and end), the keyword, the previous words, and the posterior words.

Examples

string<-c("A força do direito deve 
superar o direito da força.",
"Teu dever é lutar pelo Direito,
 mas se um dia encontrares o Direito
 em conflito com a Justiça,
luta pela Justiça.")
id_decision<-c("rui_barbosa","eduardo_couture")
keyword<-"direito"
df<-pt_kwic(string,id_decision,keyword)