Basic principles to work with data frame in base R, Tidyverse, and data.table.
Attach important packages. For a comprehensive comparison see this blog.
Load data. You can go here to take a look at some fake data in different formats
We could have done also this
df <- read.csv(link_data) # base
tb <- as.tibble(df)
dt <- as.data.table(tb)
filter <- c(3:4)
df[filter,]
area welfare weight
3 rural 213.44287 398.89557
4 rural 423.68354 538.45697
df[filter] # This does not work
Error in `[.data.frame`(df, filter): undefined columns selected
tb[filter,]
# A tibble: 2 x 3
area welfare weight
<chr> <dbl> <dbl>
1 rural 213. 399.
2 rural 424. 538.
slice(tb, filter) # same
# A tibble: 2 x 3
area welfare weight
<chr> <dbl> <dbl>
1 rural 213. 399.
2 rural 424. 538.
dt[filter,]
area welfare weight
1: rural 213.44287 398.89557
2: rural 423.68354 538.45697
# This works. In data.frame does not.
dt[filter] # same.
area welfare weight
1: rural 213.44287 398.89557
2: rural 423.68354 538.45697
x <- df[df$area == "urban",]
x[1:3,]
area welfare weight
222 urban 665.10899 641.75787
223 urban 489.98341 112.15016
224 urban 75.45557 317.06293
# data.table way and No need of $
dt[area == "urban"
][1:3]
area welfare weight
1: urban 665.10899 641.75787
2: urban 489.98341 112.15016
3: urban 75.45557 317.06293
area welfare weight
1: urban 665.10899 641.75787
2: urban 489.98341 112.15016
3: urban 75.45557 317.06293
# but data.table syntax does not with tidyverse
tb[area == "urban"][1:3]
Error in `[.tbl_df`(tb, area == "urban"): object 'area' not found
For attribution, please cite this work as
Castaneda (2023, Jan. 31). R Training for GPID Team: Working with data frames. Retrieved from https://povcalnet-team.github.io/Rtraining/posts/working-with-data-frames/
BibTeX citation
@misc{castaneda2023working, author = {Castaneda, R.Andres}, title = {R Training for GPID Team: Working with data frames}, url = {https://povcalnet-team.github.io/Rtraining/posts/working-with-data-frames/}, year = {2023} }