#### Tibbles

In [None]:
library(tidyverse)

In [None]:
# Abrindo dataframes como tibbles

as_tibble(iris)

In [None]:
# criando um tibble

tibble(
  x = 1:5,
  y = 1,
  z = x^2+y
)

In [None]:
tb <- tibble(
  `:)` = "smile", 
  ` ` = "space",
  `2000` = "number"
)

In [None]:
tb

In [None]:
tb <- tibble(
  `:)` = "happy", 
  `:(` = "sad",
  `:*` = "kiss"
)

In [None]:
tb

In [None]:
# transposição de tibble

tribble(
  ~x, ~y, ~z,
  #--|--|----
  "a", 2, 3.6,
  "b", 1, 8.5
)

In [None]:
tribble(
  ~agua, ~cafe, ~cha,
  #--|--|----
  80, 2, 3.6,
  70.7, 1, 8.5
)

Print

In [None]:
tibble(
  a = lubridate::now() + runif(1e3) * 86400,
  b = lubridate::today() + runif(1e3) * 30,
  c = 1:1e3,
  d = runif(1e3),
  e = sample(letters, 1e3, replace = TRUE)
)


In [None]:
nycflights13::flights %>% 
  print(n = 10, width = Inf)

Subsetting

In [None]:
df <- tibble(
  x = runif(5),
  y = rnorm(5)
)

# Extract by name
df$x
#> [1] 0.73296674 0.23436542 0.66035540 0.03285612 0.46049161
df[["x"]]
#> [1] 0.73296674 0.23436542 0.66035540 0.03285612 0.46049161

# Extract by position
df[[1]]
#> [1] 0.73296674 0.23436542 0.66035540 0.03285612 0.46049161

In [None]:
df %>% .$x
#> [1] 0.73296674 0.23436542 0.66035540 0.03285612 0.46049161
df %>% .[["x"]]
#> [1] 0.73296674 0.23436542 0.66035540 0.03285612 0.46049161

How can you tell if an object is a tibble? (Hint: try printing mtcars, which is a regular data frame).

In [None]:
print(mtcars)

In [None]:
m = as_tibble(mtcars)

In [None]:
print(m)

Compare and contrast the following operations on a data.frame and equivalent tibble. What is different? Why might the default data frame behaviours cause you frustration?

In [None]:
df <- data.frame(abc = 1, xyz = "a")
df$x
df[, "xyz"]
df[, c("abc", "xyz")]

In [None]:
tb <- tibble(abc = 1, xyz = "a")

tb[, "xyz"]
tb[, c("abc", "xyz")]

If you have the name of a variable stored in an object, e.g. var <- "mpg", how can you extract the reference variable from a tibble?

In [None]:
var <- "mpg"

In [None]:
var

Practice referring to non-syntactic names in the following data frame

In [None]:
annoying <- tibble(
  `1` = 1:10,
  `2` = `1` * 2 + rnorm(length(`1`))
)

In [None]:
annoying$`1`

In [None]:
ggplot(data = annoying, mapping = aes(x = `1`, y = `2`)) +
  geom_point()

In [None]:
annoying$`3` = annoying$`2` / annoying$`1`

In [None]:
annoying

What does tibble::enframe() do? When might you use it?

In [None]:
?tibble::enframe()

In [None]:
tb <- tibble(
  1
)
tibble::enframe(tb)

What option controls how many additional column names are printed at the footer of a tibble?

In [None]:
#não sei

#### Data import

In [None]:
tb <- read_csv("a,b,c
1,2,3
4,5,6")

In [None]:
tb

In [None]:
orelhao <- read_csv("TUP.csv")

In [None]:
orelhao

##### What function would you use to read a file where fields were separated with “|”?

In [None]:
read_delim("um.csv", "|")

##### Apart from file, skip, and comment, what other arguments do read_csv() and read_tsv() have in common?

read_csv2(file, col_names = TRUE, col_types = NULL,
  locale = default_locale(), na = c("", "NA"), quoted_na = TRUE,
  quote = "\"", comment = "", trim_ws = TRUE, skip = 0,
  n_max = Inf, guess_max = min(1000, n_max),
  progress = show_progress(), skip_empty_rows = TRUE)

read_tsv(file, col_names = TRUE, col_types = NULL,
  locale = default_locale(), na = c("", "NA"), quoted_na = TRUE,
  quote = "\"", comment = "", trim_ws = TRUE, skip = 0,
  n_max = Inf, guess_max = min(1000, n_max),
  progress = show_progress(), skip_empty_rows = TRUE)

##### What are the most important arguments to read_fwf()?

col_positions

##### Sometimes strings in a CSV file contain commas. To prevent them from causing problems they need to be surrounded by a quoting character, like " or '. By default, read_csv() assumes that the quoting character will be ". What argument to read_csv() do you need to specify to read the following text into a data frame?

In [None]:
read_csv("x,y\n1,'a,b'")

In [None]:
read_csv("x,y\n1,'a,b'", quote="''")

Identify what is wrong with each of the following inline CSV files. What happens when you run the code?

In [None]:
read_csv("a,b\n1,2,3\n4,5,6")

In [None]:
read_csv("a,b,c\n1,2,3\n4,5,6")

In [None]:
read_csv("a,b,c\n1,2\n1,2,3,4")

In [None]:
read_csv("a,b,c\n1,2\n1,2,'3,4'", quote="''")

In [None]:
read_csv("a,b\n\"1")

In [None]:
read_csv("a,b\n\1,2")

In [None]:
read_csv("a,b\n1,2\na,b")

In [None]:
read_csv2("a;b\n1;3")

#### parse

In [None]:
str(parse_logical(c("TRUE", "FALSE", "NA")))
#>  logi [1:3] TRUE FALSE NA
str(parse_integer(c("1", "2", "3")))
#>  int [1:3] 1 2 3
str(parse_date(c("2010-01-01", "1979-10-14")))
#>  Date[1:2], format: "2010-01-01" "1979-10-14"

In [None]:
x <- parse_integer(c("123", "345", "abc", "123.45"))

In [None]:
x

In [None]:
problems(x)

In [None]:
parse_double("1.23")
#> [1] 1.23
parse_double("1,23", locale = locale(decimal_mark = ","))
#> [1] 1.23

In [None]:
x <- "4,56"

In [None]:
y <- parse_double(x, locale = locale(decimal_mark = ","))

In [None]:
x

In [None]:
y

In [None]:
x == y

In [None]:
salario <- "R$3289.56"

In [None]:
parse_number(salario)

In [None]:
desconto <- "20%"

In [None]:
parse_number(desconto)

In [None]:
cpf <- "455.678.779.32"

In [None]:
parse_number(cpf, locale = locale(grouping_mark = (".")))

In [None]:
charToRaw("Vinícius")

In [None]:
x1 <- "El Ni\xf1o was particularly bad this year"
x2 <- "\x82\xb1\x82\xf1\x82\xc9\x82\xbf\x82\xcd"

x1
#> [1] "El Ni\xf1o was particularly bad this year"
x2
#> [1] "\x82\xb1\x82\xf1\x82\u0242\xbf\x82\xcd"

In [None]:
parse_character(x1, locale = locale(encoding = "Latin1"))
#> [1] "El Niño was particularly bad this year"
parse_character(x2, locale = locale(encoding = "Shift-JIS"))
#> [1] "こんにちは"

In [None]:
guess_encoding(charToRaw(x1))

In [None]:
parse_datetime("20200215")

In [None]:
parse_datetime("20201009T2010")

In [None]:
hoje <- "2020/02/15"

In [None]:
parse_date("01/02/15", "%m/%d/%y")
#> [1] "2015-01-02"
parse_date("01/02/15", "%d/%m/%y")
#> [1] "2015-02-01"
parse_date("01/02/15", "%y/%m/%d")
#> [1] "2001-02-15"

In [None]:
parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
#> [1] "2015-01-01"

What are the most important arguments to locale()?

locale(date_names = "en", date_format = "%AD", time_format = "%AT",
  decimal_mark = ".", grouping_mark = ",", tz = "UTC",
  encoding = "UTF-8", asciify = FALSE)

What happens if you try and set decimal_mark and grouping_mark to the same character? What happens to the default value of grouping_mark when you set decimal_mark to “,”? What happens to the default value of decimal_mark when you set the grouping_mark to “.”?

In [None]:
parse_datetime("2020,12,13", locale = locale(decimal_mark = ",", grouping_mark = ","))

In [None]:
parse_number("20.201,213", locale = locale(decimal_mark = ","))

In [None]:
parse_number("2.020,1213", locale = locale(grouping_mark = "."))

I didn’t discuss the date_format and time_format options to locale(). What do they do? Construct an example that shows when they might be useful.

In [None]:
#sei lá

If you live outside the US, create a new locale object that encapsulates the settings for the types of file you read most commonly.

In [None]:
parse_number("336.280.938-30",
            locale = locale(grouping_mark="."))

What’s the difference between read_csv() and read_csv2()?

In [None]:
#o readcsv usa "," como separador e o readcsv2 usa ";" como separador

What are the most common encodings used in Europe? What are the most common encodings used in Asia? Do some googling to find out.

In [None]:
#UTF-8

Generate the correct format string to parse each of the following dates and times

In [None]:
d1 <- "January 1, 2010"
d2 <- "2015-Mar-07"
d3 <- "06-Jun-2017"
d4 <- c("August 19 (2015)", "July 1 (2015)")
d5 <- "12/30/14" # Dec 30, 2014
t1 <- "1705"
t2 <- "11:15:10.12 PM"

In [None]:
parse_date(d1, "%B %d, %Y")

In [None]:
parse_date(d2, "%Y-%b-%d")

In [None]:
parse_date(d3, "%d-%b-%Y")