How to use the Pipe

Understand the difference between the magritr pipe and the native pipe, and when to use each. Understand when it is better to use the pipe rather than regular Base R syntax.

R.Andres Castaneda
2023-05-09

Packages

loading and attaching

There is an important difference between loading and attaching a package.

Read more here, R-Packages book.

When to attach

Attaching has the advantage of not using ::, but when you use too many packages, it is difficult to know what function comes from what package.

I attach when I am using a package that I need all the time. For example, tidyverse or data.table. However, try to always use :: because the code is clearer and there is no penalty in speed (minimum).

flights |> 
  dplyr::filter(dest == "IAH") |> 
  dplyr::mutate(speed = distance / air_time * 60) |> 
  dplyr::select(year:day, dep_time, carrier, flight, speed) |> 
  dplyr::arrange(dplyr::desc(speed))


flights |> 
  filter(dest == "IAH") |> 
  mutate(speed = distance / air_time * 60) |> 
  select(year:day, dep_time, carrier, flight, speed) |> 
  arrange(desc(speed))

If you are developing packages, you cannot attach packages. You always have to use ::

Main idea of the pipe

# load libraries that we will need

library(nycflights13) # data or use library(help = "datasets")
library(tidyverse)
library(data.table)

We need to talk first about frames. Let’s go to Stata first.

At the most basic level, the pipe is a syntax transformation in which you separate the argument from the function

x = 1:10

# from this
mean(x)
[1] 5.5
# to this 

x |> mean()
[1] 5.5

But it by itself is not super useful. You see the real power when you work with dataframes

# so you go from this
flights1 <- filter(flights, dest == "IAH")
flights2 <- mutate(flights1, speed = distance / air_time * 60)
flights3 <- select(flights2, year:day, dep_time, carrier, flight, speed)
arrange(flights3, desc(speed))


# or this
arrange(
  select(
    mutate(
      filter(flights, dest == "IAH"),
      speed = distance / air_time * 60
    ),
    year:day, dep_time, carrier, flight, speed
  ),
  desc(speed)
)


# to this

To this.

flights |> 
  filter(dest == "IAH") |> 
  mutate(speed = distance / air_time * 60) |> 
  select(year:day, dep_time, carrier, flight, speed) |> 
  arrange(desc(speed))
# A tibble: 7,198 x 7
    year month   day dep_time carrier flight speed
   <int> <int> <int>    <int> <chr>    <int> <dbl>
 1  2013     7     9      707 UA         226  522.
 2  2013     8    27     1850 UA        1128  521.
 3  2013     8    28      902 UA        1711  519.
 4  2013     8    28     2122 UA        1022  519.
 5  2013     6    11     1628 UA        1178  515.
 6  2013     8    27     1017 UA         333  515.
 7  2013     8    27     1205 UA        1421  515.
 8  2013     8    27     1758 UA         302  515.
 9  2013     9    27      521 UA         252  515.
10  2013     8    28      625 UA         559  515.
# i 7,188 more rows

sources

Citation

For attribution, please cite this work as

Castaneda (2023, May 9). R Training for GPID Team: How to use the Pipe. Retrieved from https://povcalnet-team.github.io/Rtraining/posts/pipe/

BibTeX citation

@misc{castaneda2023how,
  author = {Castaneda, R.Andres},
  title = {R Training for GPID Team: How to use the Pipe},
  url = {https://povcalnet-team.github.io/Rtraining/posts/pipe/},
  year = {2023}
}