Working with data frames

Basic principles to work with data frame in base R, Tidyverse, and data.table.

R.Andres Castaneda
2023-01-31

Set up

Attach important packages. For a comprehensive comparison see this blog.

Load data. You can go here to take a look at some fake data in different formats

link_data <- "https://github.com/PovcalNet-Team/Rtraining/raw/main/data/ago_2018.csv"

df <- read.csv(link_data) # base
tb <- read_csv(link_data)  # tidyverse 
dt <- fread(link_data) # data.table

We could have done also this

df <- read.csv(link_data) # base
tb <- as.tibble(df)
dt <- as.data.table(tb)

Basic operations

Filter rows

Keep rows using indices

filter <- c(3:4)

Base R

df[filter,]
   area   welfare    weight
3 rural 213.44287 398.89557
4 rural 423.68354 538.45697
df[filter] # This does not work
Error in `[.data.frame`(df, filter): undefined columns selected

Tidyverse

tb[filter,]
# A tibble: 2 x 3
  area  welfare weight
  <chr>   <dbl>  <dbl>
1 rural    213.   399.
2 rural    424.   538.
slice(tb, filter) # same
# A tibble: 2 x 3
  area  welfare weight
  <chr>   <dbl>  <dbl>
1 rural    213.   399.
2 rural    424.   538.

data.table

dt[filter,]
    area   welfare    weight
1: rural 213.44287 398.89557
2: rural 423.68354 538.45697
# This works. In data.frame does not. 
dt[filter] # same. 
    area   welfare    weight
1: rural 213.44287 398.89557
2: rural 423.68354 538.45697

Keep rows using logical expressions

Base R

x <- df[df$area == "urban",]
x[1:3,] 
     area   welfare    weight
222 urban 665.10899 641.75787
223 urban 489.98341 112.15016
224 urban  75.45557 317.06293

Tidyverse

tb |> 
  filter(area == "urban") |> 
  slice(1:3)
# A tibble: 3 x 3
  area  welfare weight
  <chr>   <dbl>  <dbl>
1 urban   665.    642.
2 urban   490.    112.
3 urban    75.5   317.

data.table

# data.table way and No need of $
 
dt[area == "urban"
   ][1:3]
    area   welfare    weight
1: urban 665.10899 641.75787
2: urban 489.98341 112.15016
3: urban  75.45557 317.06293
# Tidyverse syntax works with data.table
dt |> 
  filter(area == "urban") |> 
  slice(1:3)
    area   welfare    weight
1: urban 665.10899 641.75787
2: urban 489.98341 112.15016
3: urban  75.45557 317.06293
# but data.table syntax does not with tidyverse
tb[area == "urban"][1:3]
Error in `[.tbl_df`(tb, area == "urban"): object 'area' not found

Citation

For attribution, please cite this work as

Castaneda (2023, Jan. 31). R Training for GPID Team: Working with data frames. Retrieved from https://povcalnet-team.github.io/Rtraining/posts/working-with-data-frames/

BibTeX citation

@misc{castaneda2023working,
  author = {Castaneda, R.Andres},
  title = {R Training for GPID Team: Working with data frames},
  url = {https://povcalnet-team.github.io/Rtraining/posts/working-with-data-frames/},
  year = {2023}
}