### Bibliotecas

In [None]:
library(nycflights13)
library(tidyverse)

In [None]:
flights

### Filter

In [None]:
filter(flights, month == 1, day == 1)

In [None]:
filter(flights, month == 11, day == 11)

In [None]:
nov1 <- filter(flights, month == 11, day == 11)

In [None]:
(jan1 <- filter(flights, month == 1, day == 1)) #salva na variável e mostra o conteúdo

In [None]:
(dec25 <- filter(flights, month == 12, day == 25))

##### Comparações

In [None]:
filter(flights, month = 1)

In [None]:
sqrt(2) ^ 2 == 2

1 / 49 * 49 == 1


In [None]:
near(sqrt(2) ^ 2,  2)
#> [1] TRUE
near(1 / 49 * 49, 1)
#> [1] TRUE

##### Operadores lógicos

In [None]:
filter(flights, month == 11 | month == 12)

In [None]:
filter(flights, month == 11 & day == 12)

In [None]:
filter(flights, month == 11,  day != 1)

In [None]:
nov_dec <- filter(flights, month %in% c(11, 12))

In [None]:
nov_dec

In [None]:
(apr_jun <- filter(flights, month %in% c(4, 6) & day %in% c(20, 30)))

In [None]:
filter(flights, !(arr_delay > 120 | dep_delay > 120))
filter(flights, arr_delay <= 120, dep_delay <= 120)

##### Missing values

In [None]:
NA > 5

In [None]:
10 == NA

In [None]:
NA / 2

In [None]:
NA == NA

In [None]:
x = NA * 3
is.na(x)

In [None]:
df <- tibble(x = c(1, NA, 3))
filter(df, x > 1)

In [None]:
filter(df, x > 1)

Had an arrival delay of two or more hours

In [None]:
filter(flights, arr_delay < 2)

Flew to Houston (IAH or HOU)

In [None]:
filter(flights, dest %in% c('IAH', 'HOU'))

Were operated by United, American, or Delta

In [None]:
filter(flights, carrier %in% c('UA', 'AA', 'DL'))

In [None]:
airlines

Departed in summer (July, August, and September)

In [None]:
filter(flights, month %in% c(7, 8, 9))

Arrived more than two hours late, but didn’t leave late

In [None]:
filter(flights, dep_delay <= 0 & arr_delay <= 120)

Were delayed by at least an hour, but made up over 30 minutes in flight

In [None]:
filter(flights, dep_delay >= 60 & arr_delay <= 30)

Departed between midnight and 6am (inclusive)

In [None]:
filter(flights, hour <= 6)

Another useful dplyr filtering helper is between(). What does it do? Can you use it to simplify the code needed to answer the previous challenges?

In [None]:
filter(flights, between(hour, 0, 6))

How many flights have a missing dep_time? What other variables are missing? What might these rows represent?

In [None]:
(df_na <- filter(flights, is.na(dep_time)))

In [None]:
count(df_na)

Why is NA ^ 0 not missing? Why is NA | TRUE not missing? Why is FALSE & NA not missing? Can you figure out the general rule? (NA * 0 is a tricky counterexample!)

In [None]:
NA ^ 0

In [None]:
NA | TRUE

In [None]:
FALSE & NA

In [None]:
NA * 0

### Arange

In [None]:
arrange(flights, year, month, day)

In [None]:
arrange(flights, desc(dep_delay))

In [None]:
arrange(flights, arr_delay)

In [None]:
arrange(flights, distance)

In [None]:
df <- tibble(x = c(5, NA, 7, 9, 32, NA))
arrange(df, x)

In [None]:
arrange(df, desc(x))

How could you use arrange() to sort all missing values to the start? (Hint: use is.na()).

In [None]:
arrange(df, desc(is.na(x)))

Sort flights to find the most delayed flights. Find the flights that left earliest.

In [None]:
arrange(flights, desc(dep_delay))

In [None]:
arrange(flights, dep_delay)

Sort flights to find the fastest (highest speed) flights.

In [None]:
arrange(flights, air_time, distance)

Which flights travelled the farthest? Which travelled the shortest?

In [None]:
arrange(flights, distance)

In [None]:
arrange(flights, desc(distance))

### Select

In [None]:
select(flights, year, month, day)

In [None]:
select(flights, dep_delay, arr_delay, distance)

In [None]:
select(flights, year:day)

In [None]:
select(flights, hour:time_hour)

In [None]:
select(flights, -(year:day))

In [None]:
select(flights, -(flight:dest))

In [None]:
rename(flights, tail_num = tailnum)

In [None]:
select(flights, time_hour, air_time, everything())

Brainstorm as many ways as possible to select dep_time, dep_delay, arr_time, and arr_delay from flights.

In [None]:
select(flights, dep_time, dep_delay, arr_time, arr_delay)

What happens if you include the name of a variable multiple times in a select() call?

In [None]:
select(flights, dep_time, dep_time, dep_time)

What does the any_of() function do? Why might it be helpful in conjunction with this vector?

In [None]:
vars <- c("year", "month", "day", "dep_delay", "arr_delay")

In [None]:
select(flights, vars)

In [None]:
select(flights, all_of(vars))

Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default?

In [None]:
select(flights, contains("TIME"))

### Mutate

In [None]:
flights_sml <- select(flights, 
  year:day, 
  ends_with("delay"), 
  distance, 
  air_time
)
mutate(flights_sml,
  gain = dep_delay - arr_delay,
  speed = distance / air_time * 60
)

In [None]:
mutate(flights_sml,
  gain = dep_delay - arr_delay,
  hours = air_time / 60,
  gain_per_hour = gain / hours
)

In [None]:
transmute(flights,
  gain = dep_delay - arr_delay,
  hours = air_time / 60,
  gain_per_hour = gain / hours
)

In [None]:
transmute(flights,
  dep_time,
  hour = dep_time %/% 100,
  minute = dep_time %% 100
)

In [None]:
transmute(flights,
  dep_time,
  logar = log2(dep_time)
)

Currently dep_time and sched_dep_time are convenient to look at, but hard to compute with because they’re not really continuous numbers. Convert them to a more convenient representation of number of minutes since midnight.

In [None]:
(dep1 = select(flights, dep_time, sched_dep_time))

In [None]:
flights

In [None]:
transmute(flights,
         dep_time,
         min_since_midnight = dep_time %/% 100 *60 + dep_time %% 100)

In [None]:
transmute(flights,
         sched_dep_time,
         min_since_midnight = sched_dep_time %/% 100 *60 + sched_dep_time %% 100)

Compare air_time with arr_time - dep_time. What do you expect to see? What do you see? What do you need to do to fix it?

In [None]:
transmute(flights,
         air_time,
         dep_time,
         dif = arr_time - dep_time)

Compare dep_time, sched_dep_time, and dep_delay. How would you expect those three numbers to be related?

In [None]:
transmute(flights,
      dep_time, 
      sched_dep_time, 
      dep_delay)

In [None]:
min_rank(select(flights, dep_delay))