![title](img/DSI.png)

# Introduction R for Data Science

Achmad Wildan Al Aziz \| _Data Scientist_ \| _Lead of Education DSI East Java_

## R and Rstudio

<img src="img/rlogo1.jpg" width="100">

**R** is a language and environment for statistical computing and graphics. Available at [https://cran.r-project.org](https://cran.r-project.org)


<img src="img/rstudiologo.jpg" width="100">

**RStudio** allows the user to run R in a more user friendly environment. It is open source (i.e. free) and available at http://www.rstudio.com/


## Why Use R?

* **Data analysis software**: R is s data analysis software. It is used by data scientists for statistical analysis, predictive modeling and visualization.
* **Statistical analysis environment** : R provides a complete environment for statistical analysis. It is easy to implement statistical methods in R. Most of the new research in statistical analysis and modeling is done using R. So, the new techniques are first available only in R.
* **Open source**: R is open source technology, so it is very easy to integrate with other applications.
* **Community support**: R has the community support of leading statisticians, data scientists from different parts of the world and is growing rapidly.


<img src="img/useR.png" width="700">

<img src="img/company.png" width="700">

## Table of Contents

[Basic Mathematical Operation](#BasicMathematicalOperation)
* [Aritmathic Operation](#AritmathicOperation)
* [Assignment Variable](#AssignmentVariable)
* [Mathematical Function](#MathematicalFunction)

[Data and Variable](#DataandVariable)
* [Main Structure](#MainStructure)
* [Class](#Class)
* [Vector](#Vector)
* [Matrix](#Matrix)
* [Dataframe](#Dataframe)

[Read and Write Data](#ReadandWriteData)

[Conditional Statement](#ConditionalStatement)

[Logical Function](#LogicalFunction)

[Looping](#Looping)
* [for](#for)
* [while](#while)
* [repeat](#repeat)

[Function](#Function)


<a id='BasicMathematicalOperation'></a>
## Basic Mathematical Operation

<a id='AritmathicOperation'></a>
### Aritmathic Operation

In [1]:
5+6+3

In [2]:
5+6-3

In [3]:
(7+5)/2

In [4]:
2^3

In [5]:
2^(2*3)

In [6]:
5 %/% 2 #integer division 

In [7]:
5 %% 2 #modulo division  

<a id='AssignmentVariable'></a>
###  Assignment Variable

In [8]:
a <- 2
a

In [9]:
b = 2
b

In [10]:
d = e = f = 3
d
e
f

In [11]:
pi

* names are case sensitive.
* pi is a constant, but still can be used as variable name.
* print(x) prints content of x 

<a id='MathematicalFunction'></a>
### Mathematical Function

Function | Meaning
---|---
log(x)|log to base e of x
exp(x)|antilog of x (=2.7818x)
log(x,n)|log to base n of x
log10(x)|log to base 10 of x
sqrt(x)|square root of x
factorial(x)|x!
choose(n,x)|binomial coefficients n!/(x! (n – x)!)
gamma(x)|Γ.x.(x – 1)! for integer x
lgamma(x)|natural log of gamma(x)
floor(x)| greatest integer < x
ceiling(x)|smallest integer x
trunc(x)|closest integer to x between x and 0: trunc(1.5) =1, trunc(-1.5) = -1
trunc|is like floor for positive values and like
ceiling|for negative values
round(x, digits=0)|round the value of x to an integer
signif(x, digits=6)|give x to six digits in scientific notation
runif(n)|generates n random numbers between 0 and 1 from a uniform distribution
cos(x)|cosine of x in radians
sin(x)|sine of x in radians
tan(x)|tangent of x in radians
acos(x), asin(x), atan(x)|inverse trigonometric transformations of real or complex numbers.
acosh(x), asinh(x), atanh(x)|inverse hyperbolic trigonometric transformations on real or complex numbers
abs(x)|the absolute value of x, ignoring the minus sign if there is one


<a id='DataandVariable'></a>
## Data and Variable

<a id='MainStructure'></a>
### Main Structure

* **Vector** array 1 dimensi dengan ukuran m (1 tipe data)
* **Matrix** array 2 dimensi dengan ukuran m × n (1 tipe data)
* **Dataframe** seperti matrix, namun bisa menampung lebih dari 1 tipe data 


<a id='Class'></a>
### Class

* **character** vector of strings
* **numeric** vector of real numbers
* **integer** vector of signed integer
* **logical** vector of boolean (TRUE or FALSE)
* **complex** vector of complex numbers
* **list** vector of R objects
* **factor** sets of labelled observations, pre-defined set of labels
* **NA** not available, missing value


<a id='Vector'></a>
#### Vector

In [12]:
a = 1:3
a

In [13]:
b = 2:4 
b

In [14]:
c(a,b)

In [15]:
c(1 ,1:3)

In [16]:
array(1 ,4)

In [17]:
seq(1 ,3)

In [18]:
seq(1,3, by=0.5)

In [19]:
seq(1,3, length.out = 4) 

In [20]:
rep(1:4 ,2) 

In [21]:
rep(1:4, each = 2)

In [22]:
rep(c(7 ,9 ,3), 1:3)

In [23]:
a = c(2 ,3 ,1 ,4)
length(a)

In [24]:
rev(a)

In [25]:
a[1:2]

In [26]:
a[-1]

In [27]:
a[a < 3]

In [28]:
which(a == 3)

In [29]:
a>1

In [30]:
letters[1:3]

In [31]:
LETTERS[1:3]

In [32]:
month.abb[1:6]

In [33]:
month.name[1:12]

<a id='MainStructure'></a>
#### Matrix

In [34]:
matrix (1:12 , nrow =3)

0,1,2,3
1,4,7,10
2,5,8,11
3,6,9,12


In [35]:
matrix (1:12 , nrow =3, byrow = TRUE)

0,1,2,3
1,2,3,4
5,6,7,8
9,10,11,12


In [36]:
matrix (2, nrow =2, ncol =2)

0,1
2,2
2,2


In [37]:
matrix (1:12 , 3 ,4)

0,1,2,3
1,4,7,10
2,5,8,11
3,6,9,12


In [38]:
#Concatenation
x = 1:5
y = 4:8
rbind (x,y)


0,1,2,3,4,5
x,1,2,3,4,5
y,4,5,6,7,8


In [39]:
cbind (x,y)

x,y
1,4
2,5
3,6
4,7
5,8


In [40]:
x <- matrix (1:10 , 2, 5)
x

0,1,2,3,4
1,3,5,7,9
2,4,6,8,10


In [41]:
dim(x) # size of matrix x

In [42]:
col(x) # column indices of ALL elements

0,1,2,3,4
1,2,3,4,5
1,2,3,4,5


In [43]:
row(x) # row indices of ALL elements

0,1,2,3,4
1,1,1,1,1
2,2,2,2,2


In [44]:
x[2,5] # extract 2nd row and 5th column

In [45]:
x1 = c(2,5)
x2 = c(4,7)
x=cbind (x1,x2)
x

x1,x2
2,4
5,7


In [46]:
t(x) #matrix transpose

0,1,2
x1,2,5
x2,4,7


In [47]:
solve(x) #inverse matrix

0,1,2
x1,-1.1666667,0.6666667
x2,0.8333333,-0.3333333


In [48]:
det(x) 

In [49]:
diag(x) 

In [50]:
y1 = c(3,6)
y2 = c(1,4)
y=cbind (y1,y2)
x*y


x1,x2
6,4
30,28


In [51]:
x%*%y

y1,y2
30,18
57,33


<a id='Dataframe'></a>
#### Dataframe

In [52]:
Age <- c(10 ,20 ,15 ,43 ,76 ,41 ,25 ,46)
Sex <- factor (c("m","f","m","f","m","f","m","f"))
Sibblings <- c(2 ,5 ,8 ,3 ,6 ,1 ,5 ,6)
myframe <- data.frame(Age, Sex, Sibblings)
myframe

Age,Sex,Sibblings
<dbl>,<fct>,<dbl>
10,m,2
20,f,5
15,m,8
43,f,3
76,m,6
41,f,1
25,m,5
46,f,6


In [53]:
myframe[1,]

Age,Sex,Sibblings
<dbl>,<fct>,<dbl>
10,m,2


In [54]:
myframe[,1]

In [55]:
myframe["Age"]

Age
<dbl>
10
20
15
43
76
41
25
46


In [56]:
myframe$Age

In [57]:
myframe[3,3] <- 2 #mengubah nilai

In [58]:
myframe[,-2] #mendapatkan semua kolom selain kolom 2

Age,Sibblings
<dbl>,<dbl>
10,2
20,5
15,2
43,3
76,6
41,1
25,5
46,6


In [59]:
subset(myframe,myframe$Age>30)

Unnamed: 0_level_0,Age,Sex,Sibblings
Unnamed: 0_level_1,<dbl>,<fct>,<dbl>
4,43,f,3
5,76,m,6
6,41,f,1
8,46,f,6


In [60]:
mean(subset(myframe$Age, myframe$Sex=='m'))

In [61]:
myframe[(myframe$Sex=='m') & (myframe$Age>30),]

Unnamed: 0_level_0,Age,Sex,Sibblings
Unnamed: 0_level_1,<dbl>,<fct>,<dbl>
5,76,m,6


In [62]:
myframe <- cbind(myframe, "Income(USD)"=c(1700, 2100, 2300, 2050,
                                         2800, 1450, 3400, 2000))

In [63]:
myframe

Age,Sex,Sibblings,Income(USD)
<dbl>,<fct>,<dbl>,<dbl>
10,m,2,1700
20,f,5,2100
15,m,2,2300
43,f,3,2050
76,m,6,2800
41,f,1,1450
25,m,5,3400
46,f,6,2000


In [64]:
myframe[order(myframe$Age),]

Unnamed: 0_level_0,Age,Sex,Sibblings,Income(USD)
Unnamed: 0_level_1,<dbl>,<fct>,<dbl>,<dbl>
1,10,m,2,1700
3,15,m,2,2300
2,20,f,5,2100
7,25,m,5,3400
6,41,f,1,1450
4,43,f,3,2050
8,46,f,6,2000
5,76,m,6,2800


In [65]:
myframe[order(myframe$Sex, myframe$Age),]

Unnamed: 0_level_0,Age,Sex,Sibblings,Income(USD)
Unnamed: 0_level_1,<dbl>,<fct>,<dbl>,<dbl>
2,20,f,5,2100
6,41,f,1,1450
4,43,f,3,2050
8,46,f,6,2000
1,10,m,2,1700
3,15,m,2,2300
7,25,m,5,3400
5,76,m,6,2800


<a id='ReadandWriteData'></a>
## Read and Write Data

### Read

<img src="img/rnw.png" width="700">

### Import Data in Rstudio

<img src="img/import.png" width="700">

### Write data

<img src="img/write.png" width="700">

<a id='ConditionalStatement'></a>
## Conditional Statement

In [66]:
#simple if
x <- 1
if (x==2){ print ("x=2") }

In [67]:
#simple if
x <- 2
if (x==2){ print ("x=2") }


[1] "x=2"


In [68]:
x <- 1
if (x==2) {print ("x = 2")} else {print ("x != 2")}

[1] "x != 2"


<a id='LogicalFunction'></a>
## Logical Function

Symbol | Meaning
----| ----
<| #smaller
<=|#smaller or equal
>| #bigger
>=|#bigger or equal
!=|#unequal
==|#logical equal
!| #logical NOT ( unary )
&| #logical AND ( vector )
\|| #logical OR ( vector )
&&|#logical AND (no vector )
\|\||#logical OR (no vector )

<a id='Looping'></a>
## Looping

<a id='for'></a>
### for

In [69]:
for (i in 1:4)
    {print(i)}


[1] 1
[1] 2
[1] 3
[1] 4


In [70]:
for (i in letters[1:4])
    {print(i)}

[1] "a"
[1] "b"
[1] "c"
[1] "d"


<a id='while'></a>
### while

In [71]:
i <- 0
while (i<4) {
  i <- i+1
  print(i)
}

[1] 1
[1] 2
[1] 3
[1] 4


<a id='repeat'></a>
### repeat

In [72]:
i <- 0
repeat {
  i <- i+1
  print (i)
  if (i==4) break
}

[1] 1
[1] 2
[1] 3
[1] 4


<a id='Function'></a>
## Function

In [73]:
myfun <- function(x){
  a=x^2/pi
  return(a)
}

In [74]:
myfun(2)

In [75]:
myfun5 <- function (x, a){
  r1 <- a* sin (x)
  r2 <- a* cos (x)
  return ( list (r1 ,r2))
}

In [76]:
myfun5 (2,4)

In [77]:
fahrenheit_to_celsius <- function(temp_F) {
  temp_C <- (temp_F - 32) * 5 / 9
  return(temp_C)
}

In [78]:
fahrenheit_to_celsius(180)