Subsetting

select elements from an R object. All the informtion in this chapter comes from the Subsetting chapter of Advanced R by Hadley Wickham.

R.Andres Castaneda
2023-01-24

Atomic vectors

We have two vectors, x and y .

x <- c(1, 3, 2, 4, -10)
y <- c("c", "a", "d",  "b")
length(x)
[1] 5
length(y)
[1] 4

Use [ to select any number of elements from a vector.

Using numbers

Positive numbers

Return the elements in specified positions.

x[c(3, 1)]
[1] 2 1
# Order of the elements of `x` is another vector of the same length as `x`
order(x)
[1] 5 1 3 2 4
order(y)
[1] 2 4 1 3
x
[1]   1   3   2   4 -10
y
[1] "c" "a" "d" "b"
x[order(x)]
[1] -10   1   2   3   4
x[order(y)]
[1] 3 4 1 2
y[order(x)]
[1] NA  "c" "d" "a" "b"
# Duplicate indices will duplicate values
y
[1] "c" "a" "d" "b"
y[c(1, 1)]
[1] "c" "c"
# Real numbers are silently truncated to integers
y
[1] "c" "a" "d" "b"
y[c(2.1, 2.9)]
[1] "a" "a"

Negative numbers

Exclude elements at the specified positions:

# both element ar negative
x[-c(3, 1)]
[1]   3   4 -10

you can't mix positive and negative integers in a single subset

x[c(-1, 2)]
Error in x[c(-1, 2)]: only 0's may be mixed with negative subscripts

But you can do the following

c(x[-1], x[2])
[1]   3   2   4 -10   3

Logical vectors

Logical vectors are recycled. For subsetting c(1, 0) is different to c(TRUE, FALSE).

# For subsetting
y
[1] "c" "a" "d" "b"
y[c(FALSE, TRUE, TRUE, FALSE)]
[1] "a" "d"
y[c(0, 1, 1, 0)]
[1] "c" "c"
# Length zero
y[0]
character(0)
# Return original vector
y[]
[1] "c" "a" "d" "b"
# Logical evaluation 

if (TRUE) {
  print("works")
}
[1] "works"
if (1) {
  print("works")
}
[1] "works"
if (FALSE) {
  # Not working
  print("does not work")
}
if (0) {
  # Not working
  print("does not work")
}

# Rescycle of logical vectors for subsetting
x
[1]   1   3   2   4 -10
x[c(TRUE, FALSE)]
[1]   1   2 -10

Using names

Sometimes, elements of a vector are named. Remember, this is different from having a vector named. The variables in a data frame are named vectors, but you could also name the elements of a variable. This is similar, but not the same to have a factor vector. Factors are numeric vectors with a class factor. Named elements in a vector are elements of any type with a corresponding name. Factors behave like factors depending of the function that is reading them. Named vectors behave according to their own type, regardless of the name. Yes, class and names are attributes but of different kinds.

# using setNamnes()
nombres <- c("Serapio", "Trimegisto", "Amalasunta", "Metafrasto", "Brunilda")
xm <- setNames(object = x, 
               nm = nombres)
xm
   Serapio Trimegisto Amalasunta Metafrasto   Brunilda 
         1          3          2          4        -10 
# using names()
x
[1]   1   3   2   4 -10
names(x) <- nombres
x
   Serapio Trimegisto Amalasunta Metafrasto   Brunilda 
         1          3          2          4        -10 
# Print or use the names
names(x)
[1] "Serapio"    "Trimegisto" "Amalasunta" "Metafrasto" "Brunilda"  
nm <- names(x)
nm
[1] "Serapio"    "Trimegisto" "Amalasunta" "Metafrasto" "Brunilda"  
# Remove names
names(x) <- NULL
x
[1]   1   3   2   4 -10
# Getting one element
xm["Amalasunta"]
Amalasunta 
         2 
# getting an element from a vector without names

x["Amalasunta"]
[1] NA
# Some names exist and some other not
xm[c("Andres", "Amalasunta")]
      <NA> Amalasunta 
        NA          2 
# repeated names
xm[c("Trimegisto", "Trimegisto")]
Trimegisto Trimegisto 
         3          3 
# Names are matched exaxtly.
xm[c("Tri", "Trimegisto")]
      <NA> Trimegisto 
        NA          3 

Subsetting with factors

It is just a bad idea.

fnames <- factor(nombres)
fnames
[1] Serapio    Trimegisto Amalasunta Metafrasto Brunilda  
Levels: Amalasunta Brunilda Metafrasto Serapio Trimegisto
as.numeric(fnames)
[1] 4 5 1 3 2
fnames <- factor(nombres)

# factors are not names
f2 <- fnames[c("Metafrasto", "Serapio")]
f2
[1] <NA> <NA>
Levels: Amalasunta Brunilda Metafrasto Serapio Trimegisto
f3 <- fnames[c(3, 5)]

xm[c(3, 5)]
Amalasunta   Brunilda 
         2        -10 
xm[f3]
   Serapio Trimegisto 
         1          3 
[1] 1 2

Exercises

Citation

For attribution, please cite this work as

Castaneda (2023, Jan. 24). R Training for GPID Team: Subsetting. Retrieved from https://povcalnet-team.github.io/Rtraining/posts/subsetting/

BibTeX citation

@misc{castaneda2023subsetting,
  author = {Castaneda, R.Andres},
  title = {R Training for GPID Team: Subsetting},
  url = {https://povcalnet-team.github.io/Rtraining/posts/subsetting/},
  year = {2023}
}