class: right middle ## Preliminary notions of R R.Andrés Castañeda --- ## Syntax There are three main syntax - base - tidyverse - data.table --- class: middle .f3[.red[Note: ]The rest of the training will be based on ] - [Advanced R](https://adv-r.hadley.nz/) by Hadley Wickham - [R for Data Science](https://r4ds.had.co.nz/) by Hadley Wickham & Garret Grolemund --- ## Objects and names (Almost) Everything in R is an object -- In R, you create objects, which you can bind to a name 'x'. -- .red[You do NOT create an object named 'x'.] -- ```r x <- c(1, 2, 3) y <- x ``` -- ![](https://github.com/hadley/adv-r/raw/master/diagrams/name-value/binding-2.png)<!-- --> --- ## Syntactic Names -- - A syntactic name must consist of letters, digits, . and \_ but can’t begin with \_ or a digit. -- - You can't use any of the reserved words like `TRUE`, `NULL`, `if`, and `function` (see the complete list in `?Reserved`). -- - A name that doesn't follow these rules is a **non-syntactic** name; if you try to use them, you’ll get an error -- - It's possible to override these rules and use any name, i.e., any sequence of characters, by surrounding it with backticks. ```r `1+1` <- 2 `1+1` ``` ``` ## [1] 2 ``` --- ## Vectors and functions A little simplistic, in R, -- Vectors contain the information; the data. -- Functions are instructions of what to DO with the data. --- ## Vectors A vector is a collection of elements. -- There are two (well three) kind of vectors. ![](https://github.com/hadley/adv-r/raw/master/diagrams/vectors/summary-tree.png)<!-- --> -- **Atomic vectors**: all elements must have the same type __lists__: elements can have different types. --- ## Atomic vectors - `c()` .pull-left[ Since atomic vectors require that all elements are of the same type, there are four types .light-blue[(six in reality, but you won't use the other two: raw and complex)] ] .pull-right[ ![](https://github.com/hadley/adv-r/raw/master/diagrams/vectors/summary-tree-atomic.png)<!-- --> ] --- ## Examples of atomic vectors .pull-left[ ```r lgl_var <- c(TRUE, FALSE) int_var <- c(1L, 6L, 10L) dbl_var <- c(1, 2.5, 4.5) chr_var <- c("these are", "some strings") x <- matrix(1:6, nrow = 2, ncol = 3) ``` ] .pull-right[ <iframe src="https://rrmaximiliano.shinyapps.io/learnr-app/?showcase=0" width="100%" height="400px" data-external="1"></iframe> ] --- ## Class of vectors (S3 object system) .pull-left[ Classes are attributes of the vectors that inform the functions how to deal with the vector ] .pull-right[ <img src="https://github.com/hadley/adv-r/raw/master/diagrams/vectors/summary-tree-s3-1.png" width="450" /> ] --- ## List vectors - `list()` In lists, elements can be of any type. .pull-left[ ```r l1 <- list( 1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9) ) ``` ] .pull-right[ ```r typeof(l1) ``` ``` ## [1] "list" ``` ```r str(l1) ``` ``` ## List of 4 ## $ : int [1:3] 1 2 3 ## $ : chr "a" ## $ : logi [1:3] TRUE FALSE TRUE ## $ : num [1:2] 2.3 5.9 ``` ] --- ## `data.frame`, a special kind of lists (..finally!) data frames are the kind of rectangular tables that you use in Stata .pull-left[ ```r df1 <- data.frame( x = 1:3, y = c("a", "b", "c") ) typeof(df1) ``` ``` ## [1] "list" ``` ] .pull-right[ ```r class(df1) ``` ``` ## [1] "data.frame" ``` ```r attributes(df1) ``` ``` ## $names ## [1] "x" "y" ## ## $class ## [1] "data.frame" ## ## $row.names ## [1] 1 2 3 ``` ] --- ## Better data.frames! `tibble` and `data.table` Base R `data.frame`s are a great idea, but a little old. By trying to do more, the end up doing less and frustrate the users. -- You should move to `tibble`s and/or `data.table`s. -- ```r library(tibble) library(data.table) # Bad code on purpose tb <- tibble::tibble(x = 1:3, y = letters[1:3]) dt <- data.table::data.table(x = 1:3, y = letters[1:3]) ``` --- ## Better data.frames! `tibble` and `data.table` ```r class(tb) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` ```r class(dt) ``` ``` ## [1] "data.table" "data.frame" ``` --- class: center, middle ## Hands on! --- I. What is the difference between `c(list(1:3), list(9)` and `list(list(1:3), list(9)` II. Is there any difference between these three ways of assignment? ```r assign('a', c(10, 150, 30, 45, 20.3)) a <- c(10, 150, 30, 45, 20.3) a = c(10, 150, 30, 45, 20.3) ``` III. The `dim()` function gets the dimension of an object. Why `dim(a)` is `NULL`? IV. Why `is.character(c('blue',10,'green',20))` is `TRUE`? --- R comes with a variety of datasets to work with (Similar to `sysuse auto` in Stata). You can see all the datasets available by typing `library(help = "datasets")`. V. In a `tibble` bound to the name `tb` add a column named `x` with the vector `rivers` from the internal `datasets`. ---