# Python vs. R: Some Main Differences

This notebook summarises some of the main differences between R and Python. It is meant to help you avoid some of the potential pitfalls if you are coming from an R programming background.

In [None]:
# first set this so that jupyter notebook prints all output from a cell, 
# not just the most recent one
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

### In Python, indexing starts at 0, so the first element of a list is selected by the 0-th index.

In [None]:
lst = ["A", "B", 3.45]
lst[0]

**R**:
```R
lst <- list("A","B", 3)

lst[1]

Output: "A"
```

### Unlike R, the ending index is excluded in Python.

In [None]:
lst[0:1]

In [None]:
lst[0:2]

### In R, you use {} to define scope. In Python, there are no curlies, and you use indentation to define scope. 

**R**:
```R
printString <- function(x,y) {

print("Hello!")  # indenting this line is not necessary

}
```

In Python, when you indent, you need to end the line above with a colon. We suggest that you use 4 spaces for indentation, but tabs work just as  fine.

In [None]:
def printInput(name):
    if type(name) is str:
        print("String: Hello " + name + '!')
    elif type(name) is int or type(name) is float:
        print("Numeric: Hello " + str(name) + '!')
    else:
        print("We don't greet strangers!")

printInput("world")
printInput(123)
printInput(1.45)
printInput(None)


### In Python, variables are passed as references to functions. In R, they are passed as values.

If you modify an input variable inside a Python function, it will also change it in the main function. This can result in hard to find bugs if you don't pay attention. However, this makes Python functions faster and more memory efficient.

In [None]:
x = [1, 2, 3]

In [None]:
def lst(x):
   y = x.append(4)
   return y

In [None]:
lst(x)

In [None]:
# Variable x globally changed too!
x

The issue is that when you assign the list x to the variable y, you are not creating a copy of x. Rather you are creating a reference to x. This means that when either variable changes (x or y), they both change because they both point to the same address in the memory. To resolve this issue, you need to explicitly tell Python to create a copy of x and call it y. This way, both variables will be independent of each other.

In [None]:
x = [1, 2, 3]
def lst(x):
    y = list(x)  # create a copy of x and not a reference
    y.append(4)  # change the copy
    return y

In [None]:
lst(x)

In [None]:
# Variable x is still [1 , 2, 3]
x

### Assignment is not always what you think.

In [None]:
a1 = [1,1]
a2 = [1,1]

In [None]:
# This simply creates a view: both a and b point to the 
# same location in the computer memory
b = a1

In [None]:
b[0] = 'boo!'

In [None]:
print(a1)

If you want a real copy, do either one of the below:

In [None]:
c = list(a2)
# OR
import copy
c = copy.copy(a2)
# even more confusing, if you want deep copy 
# (that is, also make copies of lists within a list): 
# c = copy.deepcopy(a)
c

How to check if two variables point to the same address in the memory:

In [None]:
b is a1

In [None]:
c is a2

Tricky! How to check if two variables have the same value:

In [None]:
c == a2

Python does this for memory efficiency. However, base types will work just fine:

In [None]:
a = 1
b = a
b = 'boo!'
print(a)

### In R, to perform exponentiation   you can use either ^ (caret symbol) or ** (double asterisk). In Python, you can only use ** because in Python, ^ is bitwise XOR.

So, here is $2^3$ (notice how you can embed Latex code inside a notebook):

**R**:
```R
Input: 2**3

Output: 8

Input: 2^3

Output: 8
```

In [None]:
2**3

In [None]:
2^3

### In R, you can usually use dot when naming variables and functions. In Python, you use dot to access methods and attributes of classes and objects. In Python, you should not use dot when naming anything.

**R**:
```R
my.integer.variable <- 5
```

In [None]:
a = [1,2,3]
print(a)

a.append(4)
print(a)

### In R, by default, reshaping of data happens column-wise. The default behaviour in Python is to reshape row-wise. This can cause subtle bugs that are hard to catch.

**R**:
```R
matrix(0:9, nrow=2, ncol=5)
     [,1] [,2] [,3] [,4] [,5]
[1,]    0    2    4    6    8
[2,]    1    3    5    7    9
```

In [None]:
import numpy as np
np.arange(10).reshape(2, 5)

However, you can force Python to do column-wise reshaping by setting the `order` parameter to 'F' inside the `reshape` function.

In [None]:
import numpy as np
np.arange(10).reshape(2, 5, order='F')

***

MATH2319 - Machine Learning @ RMIT University