pt_kwic is an useful function that allows you to extract words placed before and after a keyword. It is similar to kwic from quanteda package, with two important differences: it is dedicated to work only with Portuguese texts in a way that ignores diacritics and is case insensitive; also it returns each word in a separate column as a default.

pt_kwic(
  string,
  id_decision = NULL,
  keyword = NULL,
  before = 5,
  after = 5,
  unite = TRUE
)

Arguments

string

a vector of texts from which to search for the keyword.

id_decision

a vector of id_decisions. If ommited, it defaults to text1,text2, text3...,

keyword

you can provide a regex expression or a vector of regex expressions.

before

Number of words before the keyword. Default is 5

after

Number of words after the keyword. Default is 5

unite

if FALSE, places every previous and posterior word in separate column

Value

a tbl with id_decision, keyword location (start and end), the keyword, the previous words, and the posterior words.

Examples

string<-c("A força do direito deve superar o direito da força.", "Teu dever é lutar pelo Direito, mas se um dia encontrares o Direito em conflito com a Justiça, luta pela Justiça.") id_decision<-c("rui_barbosa","eduardo_couture") keyword<-"direito" df<-pt_kwic(string,id_decision,keyword)