# 2.3 Language Basics

1. [Indentation](#indentation)
2. [Object Orientation](#object)
3. [Strong Typing](#strong)
4. [Attributes and Methods](#attributes)
5. [Duck Typing](#duck)
6. [Imports](#imports)
7. [Binary Operators](#binary)
8. [Scalar Types](#scalar)
9. [Control Flow](#control)

<a name="indentation"></a>
## Indentation not braces

Basically the same as bash. Don't use brackets for anything like in R.

for x in array:
    if x < pivot:
        less.append(x)
    else:
        greater.append(x)

<a name="object"></a>
## Object Orientation

Everything is an object. Every number, string, data structure, function, class, module, etc.

Not entirely sure the consequences of this and if it's very different from R. Keep in mind going forward.

Almost every object in Python has attached functions. These are called methods and they have access to the object's internal contents. (Similar to R S4?)

Example:

obj.some_method(x, y, z)

One difference relative to R is assigning objects to different variables. Consider the code below. We create a list of integers and assign it to a. Then we assign a to b. In R, we would now have two objects (a and b) that can be independently modified. In Python, however, these are just different pointers to the same object. So if we modify a, b will also be modified.

In [None]:
a = [1, 2, 3]
a
b = a
b
a.append(4)
b

[1, 2, 3, 4]

 <a name="strong"></a>
## Strong Typing

Similar to R, each object has it's own type and they can't be implicitly converted. Trying to add "5" to 5 will result in an error. Basically only time that implicit conversion is allowed is converting integers to floats.

In [None]:
a = 4.5
b = 2
print(f"a is {type(a)}, b is {type(b)}")
a / b

a is <class 'float'>, b is <class 'int'>


2.25

### Checking Types

class(object) is how we do this in R. isinstance() is one way to do it in Python (not sure yet if there is a general one like R's class).

isinstance() is more like is.character() or something in R. Below, we check if a is an integer. We can provide multiple options to check a

In [None]:
a = 5; b = 4.5
isinstance(a, int) # check if integer


True

In [None]:
isinstance(a, (int, float)) # check if integer or float

True

In [None]:
isinstance(b, int) # check if integer

False

In [None]:
isinstance(b, (int, float)) # check if integer or float

True

<a name="attributes"></a>
## Attributes and Methods

Attributes - python objects stored "inside" an object

Methods - functions associated with an object that can have access to the object's internal data.

Syntax: <obj.attribute_name> / <obj.method_name>

Can also view via getattr (Need to look into this further if I want to use it)

In [None]:
?getattr

[0;31mDocstring:[0m
getattr(object, name[, default]) -> value

Get a named attribute from an object; getattr(x, 'y') is equivalent to x.y.
When a default argument is given, it is returned when the attribute doesn't
exist; without it, an exception is raised in that case.
[0;31mType:[0m      builtin_function_or_method

In [None]:
a = "foo"
getattr(a, "split")

<function str.split(sep=None, maxsplit=-1)>

<a name="duck"></a>
## Duck typing
Sometimes it doesn't matter what the actual type is, just what it can do. For example, checking if something is iterable.  

Things are iterable if they have `__iter__` method. This can be found doing tab-completion. For example, type `a.<tab>` to list all the methods and you will see `__iter__` as one of the results.

Here is a function from the book that determines if an object is iterable

In [None]:
def isiterable(obj):
    """
    Check if object is iterable

    Returns
    -------
    True if iterable, False if not.
    """
    try:
        iter(obj)
        return True
    except TypeError: # Will have to look into except and all this stuff down the line.
        return False
    
?isiterable

[0;31mSignature:[0m [0misiterable[0m[0;34m([0m[0mobj[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Check if object is iterable

Returns
-------
True if iterable, False if not.
[0;31mFile:[0m      /var/folders/yg/vgv20k0j7gvg8wyndq5wp9lsszdzc8/T/ipykernel_21691/2390336540.py
[0;31mType:[0m      function

In [None]:
isiterable("a string")

True

In [None]:
isiterable([1, 2, 3])

True

In [None]:
isiterable(5)

False

<a name="imports"></a>
## Imports!

This is one of the things I feel like I was having trouble understanding, but doesn't seem too complicated.  

The gist is that as long as they are in the same directory, a python script can access code from another python script using import.  
For example, I can create a python script called myModule.py and I can put whatever I want in it - variable assignments, custom functions, etc.  
Then, I can import that in a different script and use all of those functions/values.  

Still not sure how this relates to the "if name = main" thing, but I think that will come.  

The code described below can be found in this repository in the folder myExamples.

Example myModule.py:

```
PI = 3.14159

def f(x):
    return x + 2

def g(a, b):
    return a + b
```

Example import script: myScript.py:

```
import myModule
result = myModule.f(5)
```

The value of result will be 7. Note the syntax here - <obj.method> in this case, the object is the module and the method is the function that I wrote.

Alternatively, can import only aspects of a module. Below, I will just import the function g and the value PI:

```
from myModule import g, PI
result = g(5, PI)
```

The value of result will be 8.14159.

Another import convention is using `as` to rename things. You can rename the module itself to something shorter (myModule to mm or something) and you can also rename functions from the module to something elsel

```
import myModule as mm
from myModule import PI as pi, g as gf

r1 = mm.f(pi)
r2 = gf(6, pi)
r3 = mm.g(6, pi)
```

r1 will be 5.14159; r2 will be 9.14159; and r3 will be 9.14159 as well.

<a name="binary"></a>
# Binary Operators

Pretty standard.  
There are a few I'm not used to from R, however.

<img src="./myImages/table2.1_operators.png" width = 600>  

Note the difference between `is`/`is not` and `==`/`!=`:

The two `is` operators are used to determine if two variables reference (i.e. point to) the same object, while the `==` operators determine if the **value** of two variables is the same.  

Below, we make a list a, then assign it to the pointer b, then we **copy** it (because the list() fxn creates a copy).  

`is` and `is not` are commonly used to check if a variable is `None`

In [None]:
a = [1, 2, 3]
b = a
c = list(a)

In [None]:
a is b

True

In [None]:
a is not c

True

In [None]:
a == c

True

## Mutable and Immutable Objects

Recommended to favor immutability...

### Mutable

lists  
dictionaries  
NumPy arrays  
most user-defined types (e.g. annData)  

### Immutable

strings
tuples

<a name="scalar"></a>
# Scalar Types

Handle numeric data, strings, boolean values, and dates/times.  

These are "scalar" because they're *single value* in the sense that there's only one instance. There's only one number, on True/False, one string, etc.  

Compare that to a list, which is made up of multiple scalar values.  

<img src="./myImages/table2.2_scalars.png" width = 600>

## Strings

Strings and tuples are immutable objects. Can't actually change their values, have to assign them to new objects.  

In [None]:
a = "this is a string"
b = a.replace("string", "longer string")
print(a)
print(b)

# Slicing - note that this grabs the first 3 characters, starting at the 0th index.
print(a[:3])

# Split a string into a list
c = list(a)
print(c)
c[2] = "a"
c[3] = "t"
print(c)

# Avoid annoying escape characters  by preceding a string with r:
s = r"this\is\a\string\with\no\special\characters"
print(s)
s

this is a string
this is a longer string
thi
['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 's', 't', 'r', 'i', 'n', 'g']
['t', 'h', 'a', 't', ' ', 'i', 's', ' ', 'a', ' ', 's', 't', 'r', 'i', 'n', 'g']
['t', 'h', 'i', 's', 't', ' ', 'i', 's', ' ', 'a', ' ', 's', 't', 'r', 'i', 'n', 'g']
this\is\a\string\with\no\special\characters


'this\\is\\a\\string\\with\\no\\special\\characters'

## String templating

String objects have a `format` method that can be used to create templates of strings (Think this is kind of like sprintf in R)

General format is to define each argument within brackets, specifying its index and type (i.e {#:type})

In the code below:

- `{0:.2f}`: 0 means it's the first argument. .2f means it's a float with 2 decimals
- `{1:s}`: 1 means it's the second argument. s means it's a string
- `{2:d}`: means it's the third argument and it's an integer.

So you assign that to an object and then call the `format` method. Integers can be forced to be floats (i.e. if I provide 80 to the first argument, it will return 80.00). Can't force integers/floats to strings though.

In [None]:
myTemplate = "{0:.2f} {1:s} are worth US${2:d}"
myTemplate.format(88.46, "Argentine Pesos", 1)

'88.46 Argentine Pesos are worth US$1'

In [None]:
myTemplate.format(88, "Argentine Pesos", 1)
# myTemplate.format(88, 75, 1) # this will throw an error because 75 is an int, but my template requires it to be a string

'88.00 75 are worth US$1'

## Formatted string literals (f strings)

This  is a different way to format string templates and seems to be more similar to sprintf in some respects.  
Basically you start your string with f (e.g. f"<string text here>"). And then within the string you can place any sort of python expression you want and it will be evaluated.  
These expressions are usually variables (b/c otherwise you could just write a normal string).  
In the example below, the first two expressions are just literally calling those variables, while the third one is also performing an operation (division) on the variables that are called.  
You can also use the same colon followed by formatting info to specify the output. Below, we use `:.2f` to round the output to two decimal points

In [None]:
amount = 10
rate = 88.46
currency = "Pesos"
r1 = f"{amount} {currency} is worth US${amount / rate}"
print(r1)
r2 = f"{amount} {currency} is worth US${amount / rate:.2f}"
print(r2)

10 Pesos is worth US$0.11304544426859599
10 Pesos is worth US$0.11


## Booleans

The boolean values are `True` and `False`, `True == 1` and `False == 0`.  

Combine them with `and` and `or` in your conditional statements.  

Reverse them with `not` (same as `!` in R).  

## Type Casting

The scalar types `str`, `bool`, `int`, and `float` are also functions that can be used to cast values to those types

In [None]:
s = "3.14159"
fval = float(s)
ival = int(fval)
bval = bool(fval)

print(type(s))
print(s)
print(type(fval))
print(fval)
print(type(ival))
print(ival)
print(type(bval))
print(bval)

## None

`None` is the default NULL type!  

Common `None` usage is to use it as a default function argument:

```python
def add_and_maybe_multiply(a, b, c=None):
    result = a + b

    if c is not None:
        result = result * c

    return result
```

## Dates and Times

Must import the datetime moduel in order to have access to these.  

`datetime` is essentially `date` with option to add time. It's the most commonly used.

In [None]:
from datetime import datetime, date, time

?datetime

[0;31mInit signature:[0m [0mdatetime[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints.
[0;31mFile:[0m           ~/miniconda3/envs/pydata-book/lib/python3.10/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     

In [None]:
?date

[0;31mInit signature:[0m [0mdate[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m      date(year, month, day) --> date object
[0;31mFile:[0m           ~/miniconda3/envs/pydata-book/lib/python3.10/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     datetime

In [None]:
?time

[0;31mInit signature:[0m [0mtime[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
time([hour[, minute[, second[, microsecond[, tzinfo]]]]]) --> a time object

All arguments are optional. tzinfo may be None, or an instance of
a tzinfo subclass. The remaining arguments may be ints.
[0;31mFile:[0m           ~/miniconda3/envs/pydata-book/lib/python3.10/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     

In [None]:
exDateTime = datetime(2011, 10, 29, 20, 30, 21)
print(f"Day is: {exDateTime.day}")
print(f"Second is: {exDateTime.second}")
print(f"Print only the date using date() method: {exDateTime.date()}")
print(f"Print only the time using date() method: {exDateTime.time()}")

Day is: 29
Second is: 21
Print only the date using date() method: 2011-10-29
Print only the time using date() method: 20:30:21


`strftime` is another common method used in datetime. This allows you to customize the date output as a string:

In [None]:
print(exDateTime.strftime("%Y"))
print(exDateTime.strftime("%Y-%m-%d"))
print(exDateTime.strftime("%Y %H:%M"))

2011
2011-10-29
2011 20:30


The reverse can be done with `strptime` - turn a string into a datetime

In [None]:
s="20091031"
print(s)
print(type(s))
print(datetime.strptime(s, "%Y%m%d"))
print(type(datetime.strptime(s, "%Y%m%d")))

20091031
<class 'str'>
2009-10-31 00:00:00
<class 'datetime.datetime'>


Here are all of the different format options:

<img src="./myImages/table11.2_datetimeFormats.png" width = 600>

<a name="control"></a>
# Control Flow

## if, elif, and else

Notice the difference from R's `else if`  

Also note that no () are needed to surround the conditional statement being evaluated.  

I believe this is the same as in R: if you have multiple conditionals combined with `or`, they're checked in order and the checking stops as soon as True is reached.

### In R:
```R
x = -5
if (x < 0) {
    print("Negative")
} else if (x == 0) {
    print("Zero")
} else if (x > 0 & x < 5) {
    print("Positive and smaller than 5")
} else {
    print("Positive and larger than or equal to 5")
}
```

### In Python:
```python
if x < 0:
    print("Negative")
elif x == 0:
    print("Zero")
elif 0 < x < 5:
    print("Positive and smaller than 5")
else:
    print("Positive and larger than or equal to 5")
```

Test it:

In [None]:
x = 10
if x < 0:
    print("Negative")
elif x == 0:
    print("Zero")
elif 0 < x < 5:
    print("Positive and smaller than 5")
else:
    print("Positive and larger than or equal to 5")

Positive and larger than or equal to 5


## For loops

Essentially the same as in R: `for value in collection:`

I don't really use that in R, however, I use `for i in...`, which in python would look like: `for i in range(<value>)`

`continue` in python is what `next` is in R  

`break` is the same  

Can do something extra with tuples and lists that we'll see more later: `for a, b, c in iterator:` will now have 3 variables (a, b, and c) inside the loop that are extracted from `iterator` (not sure how exactly though)

## Pass

`pass` is the "no-op" statement in python and is used as a placeholder.  

It appears to have the same functionality as next, except maybe clearer to use next in for loops and pass in if statements?

In [None]:
x = 0
if x < 0:
    print("negative!")
elif x == 0:
    # TODO: put something smart here
    pass
else:
    print("positive!")

## Range

start is inclusive, stop is exclusive! 

In [None]:
?range

[0;31mInit signature:[0m [0mrange[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
[0;31mType:[0m           type
[0;31mSubclasses:[0m     

In [None]:
print(range(10))
print(range(0,10))
print(type(range(0,10)))
print(list(range(10)))
print(type(list(range(10))))
print(list(range(0, 20, 2))) # count 0 to 20 by 2

As mentioned above in the for loop section, `range()` is often used to iterate through a sequence by index:

In [None]:
# Make a list
myList = [1, 2, 3, 4]

# Iterate
for i in range(len(myList)):
    print(f"element {i}: {myList[i]}")