Besides downloading lawsuits (see the Downloading Lawsuits article), esaj
also allows the user to download the results of a query on lawsuits. This kind of query is very useful for finding out what lawsuits contain certain words, were filed in a given period, were filed in a given court, etc.
To accomplish this task two functions were made available: download_cjpg()
(for queries on first degree lawsuits) and download_cjsg()
(for queries on second degree lawsuits). There are also two auxiliary functions that can help the user to evaluate how long this kind of download might take: peek_cjpg()
and peek_cjsg()
. Finally, there are 3 functions to explore which arguments to use when calling the functions above: cjpg_table()
, cjsg_table()
, and browse_table()
.
As of the day of this writing, all aforementioned functions only work with São Paulo’s Justice Court (TJSP).
To run a query on first degree lawsuits and download the results, simply write the query as a string and supply the path to the directory where to download all resulting files. The example below finds all lawsuits containing the word “recurso”.
download_cjpg("recurso", "~/Desktop/")
#> [1] "/Users/user/Desktop/search.html"
#> [2] "/Users/user/Desktop/page1.html"
As you can see, download_cj*g()
return the paths to the downloaded files. The first file always refers to the search page, containing the filled fields and the first results of the query; the subsequent files contain pages of results (as many pages as requested by the user).
By default download_cj*g()
download only one page of results, but we can also specify page indexes to start and stop downloading.
download_cjpg("recurso", "~/Desktop/", min_page = 3, max_page = 6)
#> [1] "/Users/user/Desktop/search.html"
#> [2] "/Users/user/Desktop/page3.html"
#> [3] "/Users/user/Desktop/page4.html"
#> [4] "/Users/user/Desktop/page5.html"
#> [5] "/Users/user/Desktop/page6.html"
To help you filter the results even more, many other arguments are also made available. download_cjpg()
and download_cjsg()
have slightly different ones, so make sure to run ?download_cjpg
or ?download_cjsg
to learn more about them.
# Repeted arguments
query <- "recurso"
path <- "~/Desktop/"
# First degree query
download_cjpg(query, path, classes = c("8727", "8742"))
download_cjpg(query, path, subjects = "3372")
download_cjpg(query, path, courts = "2-1")
download_cjpg(query, path, date_start = "2016-01-01", date_end = "2016-12-31")
# Second degree query
download_cjsg(query, path, classes = c("1231", "1232"))
download_cjsg(query, path, subjects = "0")
download_cjsg(query, path, courts = "0-56")
download_cjsg(query, path, trial_start = "2009-01-01", trial_end = "2009-12-31")
download_cjsg(query, path, registration_start = "1998-01-01",
registration_end = "1998-12-31")
The output of the calls above are omitted because they are exactly the same. Despite only classes
appearing as a character vector, subjects
and courts
also support this kind of input.
An argument not mentioned above is cores
. This is a very useful tool for parallelizing the execution of download_cj*g()
and the value passed on to it corresponds to the number of processing cores used when downloading the files.
download_cjpg("recurso", "~/Desktop/", max_page = 10, cores = 4)
As you might have guessed, the functions above can take a non-trivial amount of time to finish executing (specially when there are many pages to download). To help the user estimate how much time calls to download_cj*g()
would take, peek_cj*g()
were made available.
These functions take the same arguments as download_cj*g()
(ignoring path
and cores
) and calculate approximatelly how much time it would take to download all pages of results.
peek_cjsg(query, path, classes = c("1231", "1232"))
#> There are 49 pages to download
#> This should take around 25 seconds
path
is ignored because the files won’t actually be downloaded. cores
is ignored because it’s very hard to guess how much time is saved by parallelizing the process, so the estimate only takes one thread into account.
Since download_cj*g()
have very complex arguments, there are 3 other functions that help the user know which values should be passed on to classes
, subjects
, and courts
.
cj*g_table()
take only one argument, type
, which should be one of "classes"
, "subjects"
, and "courts"
. They will return tibbles with all possible values for the corresponding arguments in download_cj*g()
.
dplyr::glimpse(cjpg_table("classes"))
#> Observations: 757
#> Variables: 12
#> $ name0 <chr> "CLASSES ANTIGAS DO SAJ", "PROCESSO CÍVEL E DO TRABALHO"...
#> $ id0 <chr> "768", "8500", "8500", "8500", "8500", "8500", "8500", "...
#> $ name1 <chr> NA, "Processo de Conhecimento", "Processo de Conheciment...
#> $ id1 <chr> NA, "8716", "8716", "8716", "8716", "8716", "8716", "871...
#> $ name2 <chr> NA, "Procedimento de Cumprimento de Sentença/Decisão", "...
#> $ id2 <chr> NA, "8591", "8591", "8591", "8591", "8591", "8586", "858...
#> $ name3 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Pro...
#> $ id3 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "850...
#> $ name4 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Pro...
#> $ id4 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "850...
#> $ name5 <chr> "CLASSES ANTIGAS DO SAJ", "Cumprimento Provisório de Dec...
#> $ id5 <chr> "768", "8816", "8817", "8592", "8593", "4052", "4062", "...
Note how the table above has multiple levels, marked 0 through 5. This is because the class structure is a tree, so higher levels represent classes closer to the leaves. In the example above, the valid values for the classes
argument in download_cjpg()
are every single number in the id*
columns.
cj*g_table("subjects")
have very similar structures to cj*g_table("classes")
. cj*g_table("courts")
have only one level.
To browse classes
and subjects
tables, one can also use the browse_table()
function. This method uses a list of regex to filter the tibbles returned by cj*g_table("classes")
and cj*g_table("subjects")
more easily than dplyr::select()
.
table <- cjpg_table("classes")
browse_table(table, list(c("ADM", "CRIMINAL"), "", "", "", "", "Recurso"))
#> # A tibble: 6 x 12
#> name0 id0 name1 id1 name2 id2 name3
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 PROCESSO CRIMINAL 8644 Recursos 8711 <NA> <NA> <NA>
#> 2 PROCESSO CRIMINAL 8644 Recursos 8711 <NA> <NA> <NA>
#> 3 PROCESSO CRIMINAL 8644 Recursos 8711 <NA> <NA> <NA>
#> 4 PROCESSO CRIMINAL 8644 Recursos 8711 <NA> <NA> <NA>
#> 5 PROCEDIMENTOS ADMINISTRATIVOS 8726 <NA> <NA> <NA> <NA> <NA>
#> 6 PROCEDIMENTOS ADMINISTRATIVOS 8726 <NA> <NA> <NA> <NA> <NA>
#> # ... with 5 more variables: id3 <chr>, name4 <chr>, id4 <chr>,
#> # name5 <chr>, id5 <chr>
For the matching to work properly, patterns
should be a list of at most 6 character vectors, each one containing either one or a vector of regular expressions to be applied from left to right on columns name0
to name5
.
Note that vectors are ORed and different elements are ANDed. In the example above, we got back the rows where name0
contains “ADM” or “CRIMINAL” and where name5
contains “Recurso”.