Understand the difference between the magritr pipe and the native pipe, and when to use each. Understand when it is better to use the pipe rather than regular Base R syntax.
There is an important difference between loading and attaching a package.
Loading refers to put all the components of a package available in memory ( code, data, and any DLLs; register S3 and S4 methods). However, those components are not in the search the search path, which is equivalent to the ado path in Stata. This is way, we need to call the function of loaded package with ::
. If the packages has not been loaded, using ::
loads the package.
Attaching loads the package and makes it available in the search path. You do it using library()
or require()
. When it is attached, you don’t need to use ::
, but you can.
Read more here, R-Packages book.
Attaching has the advantage of not using ::
, but when you use too many packages, it is difficult to know what function comes from what package.
I attach when I am using a package that I need all the time. For example, tidyverse
or data.table
. However, try to always use ::
because the code is clearer and there is no penalty in speed (minimum).
flights |>
dplyr::filter(dest == "IAH") |>
dplyr::mutate(speed = distance / air_time * 60) |>
dplyr::select(year:day, dep_time, carrier, flight, speed) |>
dplyr::arrange(dplyr::desc(speed))
flights |>
filter(dest == "IAH") |>
mutate(speed = distance / air_time * 60) |>
select(year:day, dep_time, carrier, flight, speed) |>
arrange(desc(speed))
If you are developing packages, you cannot attach packages. You always have to use ::
# load libraries that we will need
library(nycflights13) # data or use library(help = "datasets")
library(tidyverse)
library(data.table)
We need to talk first about frames. Let’s go to Stata first.
At the most basic level, the pipe is a syntax transformation in which you separate the argument from the function
But it by itself is not super useful. You see the real power when you work with dataframes
# so you go from this
flights1 <- filter(flights, dest == "IAH")
flights2 <- mutate(flights1, speed = distance / air_time * 60)
flights3 <- select(flights2, year:day, dep_time, carrier, flight, speed)
arrange(flights3, desc(speed))
# or this
arrange(
select(
mutate(
filter(flights, dest == "IAH"),
speed = distance / air_time * 60
),
year:day, dep_time, carrier, flight, speed
),
desc(speed)
)
# to this
To this.
flights |>
filter(dest == "IAH") |>
mutate(speed = distance / air_time * 60) |>
select(year:day, dep_time, carrier, flight, speed) |>
arrange(desc(speed))
# A tibble: 7,198 x 7
year month day dep_time carrier flight speed
<int> <int> <int> <int> <chr> <int> <dbl>
1 2013 7 9 707 UA 226 522.
2 2013 8 27 1850 UA 1128 521.
3 2013 8 28 902 UA 1711 519.
4 2013 8 28 2122 UA 1022 519.
5 2013 6 11 1628 UA 1178 515.
6 2013 8 27 1017 UA 333 515.
7 2013 8 27 1205 UA 1421 515.
8 2013 8 27 1758 UA 302 515.
9 2013 9 27 521 UA 252 515.
10 2013 8 28 625 UA 559 515.
# i 7,188 more rows
For attribution, please cite this work as
Castaneda (2023, May 9). R Training for GPID Team: How to use the Pipe. Retrieved from https://povcalnet-team.github.io/Rtraining/posts/pipe/
BibTeX citation
@misc{castaneda2023how, author = {Castaneda, R.Andres}, title = {R Training for GPID Team: How to use the Pipe}, url = {https://povcalnet-team.github.io/Rtraining/posts/pipe/}, year = {2023} }