# Chapter 1: R Objects

# Requirement of this note

Installing R with Rstudio or IPython notebook. 
# You learn
* How a binary operator like "+" works
* How to make a vector
* How to use a binary operator over two objects
* How to make a matrix
* How to make a data frame
* How to enter missing values
* How to convert a matrix to a data.frame
* How to rename rows and columns for a data.frame
* How to run R help documentation 
* How to run examples of a documentation
* How to repeat values in a vector using the rep() function





# What is IDE 

Integrated Development Environment IDE (IDE) is like a simple editor that provides comprehensive facilities to a programmer for software development, often with markup and language highlights. R programming language is a command line and requires an IDE to be installed with the language. The most well-known multi-platform IDE for R, that runs under MacOSX, Windows, and Linux is RStudio (https://www.rstudio.com/). For the sake of coherent learning  of R and Python we suggest installing Jupyther (http://jupyter.org/), a browser-based IDE designed to handle Julia, Python, and R. It also supports some other scripting languages. For the details of other kernels see IPython scripting language support list (https://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages) 

First you must install R from R-CRAN (http://r-project.org) .

To benefit from the Jupyther Notebook it is simpler to install it through Anaconda. Installation of Anaconda is local and does not require administrator rights. The graphical installer for Windows, Mac, or Linux is at (http://continuum.io/downloads). We suggest to download and install Anaconda with Python 2.7. Anaconda installs Python automatically. In order to make Jupyther function with R, you have to add irkerner and several other R packages, for more instructions see IRkernel (https://irkernel.github.io/installation/). 



# Basic Cell Properties
The option to compile each cell of Jupyther notebook is in a rolling bar with "Code", "Markdown", "Raw NBConvert" and "Heading" as options.
Write descriptions in "Markdown" language which a simplified version of LaTex. While you write codes in the cells be sure that the cell option is set to "Code" and be sure to press Shift+Enter.



# Writing  R  Commands
In this note you are going to learn some basic R commands mostly required for data handling and R programming. There are many intorductory R books. We suggest the CRAN "An Introduction to R" available [here](https://cran.r-project.org/manuals.html).

# Operators 
We start with the sum "+" that returns the sum of two values 


In [1]:
print("Hello World!")
print(2+3)
2+3

[1] "Hello World!"
[1] 5


The command `<-` or `=` assigns values to variables from right to left. You can assign left to right (which is not common) using "->"
It is recommended to use `<-` for assignment in R and reserve `=` to declare the arguments of a function (we will see later).

To improve you R style, read this [Google's R Style](https://google.github.io/styleguide/Rguide.xml) and try to read other people's code. Pay attention to examples of R documentation, most of R examples are tricky to understand.

If x and y are two values  below you see the list of other operators

```R
1) + x     # does nothing
2) - x     # negation
4) x - y   # subtract
5) x + y   # add
6) x * y   # multiply
7) x / y   # divide
8) x ^ y   # power
9) x %% y  # mode or the residual of an integer division 
10) x %/% y # integer division
```

In [2]:
x <- 4
y <- 5
x+y

# Exercise 
Subtract 4 from 10.
# Exercise 
How many different digits you can store in a byte?
# Exercise 
Find the residual of 10 divided by 3


R is an advanced programming language. There are various data types that facilitates the usage of operators and functions. A vector is a one dimensional array.

In [3]:
x <- c(2, 3)
y <- c(4, 5)
x + y 


`x + y` adds the elements of two vectors 

It is possible to define a boolean vector as well. The logical encodes TRUE and FALSE are reserved for boolean values. TRUE is an integer equal 1 and FALSE value is defined as zero. So any logical vector can be regarded as a numeric (integer) vector as well. T is the old encoding of TRUE and F is the old enconding for FALSE. Do not use quotes, otherwise it becomes a character!




In [4]:
z <- c(TRUE, FALSE, TRUE)
z
w <- c(T, F, F)
w

This is how you define a vector of characters. 

In [5]:
y <- c("a", 'bc', 3) 
y

There are some points to highlight about vectors: ' or " makes no difference. Numbers like 3 becomes '3' if put with other characters. Vectors and matrix or dataframes are different 'classes' with different properties. Vector is **NOT** a matrix with one row!

In [6]:
length(x) # returns the length of a vector


Matrix is a two-dimensional array. Captial and small letters matter in R. NA (Not Available) is used to assign missing values.


In [7]:
mat <- matrix(0, nrow=3, ncol=5)
Mat <- matrix(NA, nrow=3, ncol=10)
mat 
mat+1


0,1,2,3,4
0,0,0,0,0
0,0,0,0,0
0,0,0,0,0


0,1,2,3,4
1,1,1,1,1
1,1,1,1,1
1,1,1,1,1


Transforming a matrix back to vector is possible 

The following command transforms a matrix to a vector of size $nrow\times ncol$

In [8]:
as.vector(mat)

In [9]:
Mat
Mat+1

0,1,2,3,4,5,6,7,8,9
,,,,,,,,,
,,,,,,,,,
,,,,,,,,,


0,1,2,3,4,5,6,7,8,9
,,,,,,,,,
,,,,,,,,,
,,,,,,,,,


# Dataframe type
Dataframe is the most common data format in R. It looks like "matrix" except that rows and columns have names.

In [10]:
mydata <- as.data.frame(mat)
mat
mydata


0,1,2,3,4
0,0,0,0,0
0,0,0,0,0
0,0,0,0,0


Unnamed: 0,V1,V2,V3,V4,V5
1,0,0,0,0,0
2,0,0,0,0,0
3,0,0,0,0,0


You can get access to the first row

In [11]:
mydata[1,] 

Unnamed: 0,V1,V2,V3,V4,V5
1,0,0,0,0,0


In [12]:
mydata[,2]  

You can get access to the first column

Call the first column using its name 

In [13]:
mydata$V1  

`length()` returns the size of a vector, but `dim` returns the size of a matrix or data.frame

In [14]:
dim(mydata)  

In [15]:
colnames(mydata) <- c("x", "y", "z", "w", "u")  

In [16]:
mydata

Unnamed: 0,x,y,z,w,u
1,0,0,0,0,0
2,0,0,0,0,0
3,0,0,0,0,0


You also can change the change column name of a dataframe

You can change the row name


In [17]:
rownames(mydata) <- c("a", "b", "c")  

In [18]:
mydata

Unnamed: 0,x,y,z,w,u
a,0,0,0,0,0
b,0,0,0,0,0
c,0,0,0,0,0


# Seqeuence generation
If you nererd to generate a vector of numbers, seq is a useful tool.

1) `seq(1,100, by=10)`  defines a vector of values with min and max, by some jump

2) `seq(1,100, length=12)` creates a sequence with min and max and a specific vector length

3) `1:10` the same as seq(1,10,legnth=1)

In [19]:
seq(from=1,to=100, by=10)  
seq(1,100, length=12) 
1:10

# Immediate help
If you needed help about a function use ? 

In [20]:
? seq

0,1
seq {base},R Documentation

0,1
...,arguments passed to or from methods.
"from, to",the starting and (maximal) end values of the sequence. Of length 1 unless just from is supplied as an unnamed argument.
by,number: increment of the sequence.
length.out,"desired length of the sequence. A non-negative number, which for seq and seq.int will be rounded up if fractional."
along.with,take the length from the length of this argument.


# R Examples

Examples are good sources of learning in R documentation.
Always "See Also" is a good hint of what is related to the function of your interest. 


In [21]:
example(seq)


seq> seq(0, 1, length.out = 11)
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

seq> seq(stats::rnorm(20)) # effectively 'along'
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

seq> seq(1, 9, by = 2)     # matches 'end'
[1] 1 3 5 7 9

seq> seq(1, 9, by = pi)    # stays below 'end'
[1] 1.000000 4.141593 7.283185

seq> seq(1, 6, by = 3)
[1] 1 4

seq> seq(1.575, 5.125, by = 0.05)
 [1] 1.575 1.625 1.675 1.725 1.775 1.825 1.875 1.925 1.975 2.025 2.075 2.125
[13] 2.175 2.225 2.275 2.325 2.375 2.425 2.475 2.525 2.575 2.625 2.675 2.725
[25] 2.775 2.825 2.875 2.925 2.975 3.025 3.075 3.125 3.175 3.225 3.275 3.325
[37] 3.375 3.425 3.475 3.525 3.575 3.625 3.675 3.725 3.775 3.825 3.875 3.925
[49] 3.975 4.025 4.075 4.125 4.175 4.225 4.275 4.325 4.375 4.425 4.475 4.525
[61] 4.575 4.625 4.675 4.725 4.775 4.825 4.875 4.925 4.975 5.025 5.075 5.125

seq> seq(17) # same as 1:17, or even better seq_len(17)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17


The rep function repeats elements of a vector. It is useful for making matrices.

In [22]:
rep(1:10,1:10)

In [23]:
rep(1:10, each=10)

In [24]:
rep(1:10, times=10)

# Exercise 
Repeat the letters a, b, c each 5 times.

