# Section 2: Vectors, Sorting

This notebook introduces the concept of vectors and functions such as sorting.

---
## Vectors

Vectors are the most basic unit to store data in R. Complex datasets can usually be broken down into components that are vectors.

For example, in a data frame such as the murders data frame, each column is a vector:

In [1]:
library(dslabs)
data(murders)
head(murders)

state,abb,region,population,total
Alabama,AL,South,4779736,135
Alaska,AK,West,710231,19
Arizona,AZ,West,6392017,232
Arkansas,AR,South,2915918,93
California,CA,West,37253956,1257
Colorado,CO,West,5029196,65


In this sub-section, we discuss more about this important class.

### Creation
The first thing to demonstrate is how to create vectors. We can do that by using the function _c_, which stands for _concatenate_. In the example below, we define an object named `codes` with three objects (380, 124, and 818):

In [2]:
codes <- c(380, 124, 818)

We can also create character vectors, such as the `country` object we define below:

In [3]:
country <- c('italy', 'canada', 'egypt')

Sometimes, it's useful to name the entries of a vector. For example, when defining a vector of country codes, we can use the names to connect the two or, in other words, assign a name to each code. Below, we redefine the codes object and associate a country to each code:

In [4]:
codes <- c(italy=380, canada=124, egypt=818)

The object codes continues to be numeric, though:

In [6]:
codes
class(codes)

Observe that if we use quotes, the object will still be identical to the one we defined without using quotes

In [7]:
codes <- c('italy'=380, 'canada'=124, 'egypt'=818)
codes
class(codes)

We can also use the _names_ function to assign names to the entries of a vector. The code cell below does exactly tha same thing as the previous ones, but using the _names_ function:

In [8]:
codes <- c(380, 124, 818)
country <- c('italy', 'canada', 'egypt')
names(codes) <- country
codes
class(codes)

Another useful function for creating vectors is the function _seq_, which stands for _sequence_. Its first argument defines the start and the second argument defines the end of the sequence. The third argument of this function is the increment, and has a default value of 1. The next code cell shows a few examples of how to create vectors with this function:

In [9]:
seq(1, 10)
seq(1, 10, 2)
seq(10, 1, -1)

note that if want ocnsecutive integers, we can just type the code in the format _start:end_, as shown below:

In [10]:
1:10
15:20

### Subsetting

Subsetting lets is access specific parts of a vector. To access elements of a vector, we use square brackets. In the code cell below, we access the second element of `codes`:

In [11]:
codes[2]

We can get more than one entry by using a multi-entry vector as an index. For example, to access the first and the third elements of `codes`, we can type:

In [12]:
codes[c(1, 3)]

We can also use sequences to access elements, as shown in the next code cell:

In [13]:
codes[1:2]

If the elements have names, we ca açsp access the entries using these names. Below, we access the entry that has the name "canada":

In [14]:
codes['canada']

We can also have a vector of names as the index to access elements of a vector. Observe:

In [15]:
codes[c('italy', 'egypt')]

### Coercion

In general, corecion is an attempt by R to be flexible with data types. When an entry does not match the expected, R tries to guess what we meant before throwing in an error. But this can also lead to congusion. Failing to understand coercion can drive programmers crazy when attempting to code in R, since it behaves quite differently from most other languages.

We said earlier that vectors must be all of the same type. So if we try to combine, for example, numbers and characters, we might expect an error. In the code cell below, we do this:

In [16]:
x <- c(1, 'canada', 3)

We don't get an error, neither even a warning. In the code cell below we print `x` to see what it looks like and also its class:

In [18]:
x
class(x)

Even though 1 and 3 were originally numbers when we wrote it out, R converted it into character. In other words, it means that _R coerced the data into character_: it guessed that, because we put a character string in the middle, we meant the 1 and the 3 to actually also be character strings.

R offers functions to force a spercifi coercion. For example, we can tur numbers into characters with the function *as.character*:

In [19]:
x <- 1:10
y <- as.character(x)
y

We can turn character or other data types into numeric variables with the function `as.numeric`:

In [20]:
as.numeric(y)

The function *as.numeric* is quite useful in practice, since many datasets that include numbers include them in a form that makes them appearto be character strings.

Continuing the discussion, we now talk about missing values, which are very common in practice. In R, we have a special value for missing data: `NA`, which stands for _not available_. We can get `NA` from coercion, for example, when R fails to coerce something. Here's an example:

In [21]:
x <- c('1', 'b', '3')
as.numeric(x)

“NAs introduced by coercion”

In the code cell above, R is able to convert '1' and '3' into numeric values, but it doesn't know what to do with 'b'. That's why we receive a warning.

Note that, as data scientists, we'll encounter `NA`'s often, as they are used for missing data. This is a very common problem in real life datasets, so we must be sure of what `NA` means and also be ready to see a lot of them.

---
## Sorting


