R Training for GPID Team: How to use the Pipe

R.Andres Castaneda

Packages

loading and attaching

There is an important difference between loading and attaching a package.

Loading refers to put all the components of a package available in memory ( code, data, and any DLLs; register S3 and S4 methods). However, those components are not in the search the search path, which is equivalent to the ado path in Stata. This is way, we need to call the function of loaded package with ::. If the packages has not been loaded, using :: loads the package.
Attaching loads the package and makes it available in the search path. You do it using library() or require(). When it is attached, you don’t need to use ::, but you can.

When to attach

Attaching has the advantage of not using ::, but when you use too many packages, it is difficult to know what function comes from what package.

I attach when I am using a package that I need all the time. For example, tidyverse or data.table. However, try to always use :: because the code is clearer and there is no penalty in speed (minimum).

flights |> 
  dplyr::filter(dest == "IAH") |> 
  dplyr::mutate(speed = distance / air_time * 60) |> 
  dplyr::select(year:day, dep_time, carrier, flight, speed) |> 
  dplyr::arrange(dplyr::desc(speed))


flights |> 
  filter(dest == "IAH") |> 
  mutate(speed = distance / air_time * 60) |> 
  select(year:day, dep_time, carrier, flight, speed) |> 
  arrange(desc(speed))

If you are developing packages, you cannot attach packages. You always have to use ::

Main idea of the pipe

# load libraries that we will need

library(nycflights13) # data or use library(help = "datasets")
library(tidyverse)
library(data.table)

We need to talk first about frames. Let’s go to Stata first.

At the most basic level, the pipe is a syntax transformation in which you separate the argument from the function

x = 1:10

# from this
mean(x)

[1] 5.5

# to this 

x |> mean()

[1] 5.5

But it by itself is not super useful. You see the real power when you work with dataframes

# so you go from this
flights1 <- filter(flights, dest == "IAH")
flights2 <- mutate(flights1, speed = distance / air_time * 60)
flights3 <- select(flights2, year:day, dep_time, carrier, flight, speed)
arrange(flights3, desc(speed))


# or this
arrange(
  select(
    mutate(
      filter(flights, dest == "IAH"),
      speed = distance / air_time * 60
    ),
    year:day, dep_time, carrier, flight, speed
  ),
  desc(speed)
)


# to this

To this.

flights |> 
  filter(dest == "IAH") |> 
  mutate(speed = distance / air_time * 60) |> 
  select(year:day, dep_time, carrier, flight, speed) |> 
  arrange(desc(speed))

# A tibble: 7,198 x 7
    year month   day dep_time carrier flight speed
   <int> <int> <int>    <int> <chr>    <int> <dbl>
 1  2013     7     9      707 UA         226  522.
 2  2013     8    27     1850 UA        1128  521.
 3  2013     8    28      902 UA        1711  519.
 4  2013     8    28     2122 UA        1022  519.
 5  2013     6    11     1628 UA        1178  515.
 6  2013     8    27     1017 UA         333  515.
 7  2013     8    27     1205 UA        1421  515.
 8  2013     8    27     1758 UA         302  515.
 9  2013     9    27      521 UA         252  515.
10  2013     8    28      625 UA         559  515.
# i 7,188 more rows

sources

This post in stackoverflow
This blog by Isabella Velásquez. It is outdated because it is based in R 4.1, but it is still useful.

How to use the Pipe

Packages

loading and attaching

When to attach

Main idea of the pipe

sources

Citation