#Data Analysis with Python
Estimated time needed: 60 minutes

##Objectives
After completing this lab you will be able to:

*   Defining tuples, lists, dictionaries, and sets
*   Creating reusable Python functions
*   Understand mechanics of Python file objects and interacting with local hard drive

##Table of Contents
1.   Data Structures and Sequences
        * Tuple
        * List
        * Dictionary
        * Set
        * Built-In Sequence Functions
        * List, Set, and Dictionary Comprehensions
        * Functions
        * Files and the Operating System
        * Conclusion

#Built-In Data Structures, Functions, and Files

##1.   Data Structures and Sequences

####Tuple
A tuple is a fixed-length, immutable sequence of Python objects which, once assigned, cannot be changed. The easiest way to create one is with a comma-separated sequence of values wrapped in parentheses:

In [None]:
tup = (4, 5, 6) #or tup = 4, 5, 6

tup

(4, 5, 6)

You can convert any sequence or iterator to a tuple by invoking tuple:

In [None]:
tuple([4, 0, 2])

(4, 0, 2)

In [None]:
tup = tuple('string')

tup

('s', 't', 'r', 'i', 'n', 'g')

Elements can be accessed with square brackets [] as with most other sequence types. As in C, C++, Java, and many other languages, sequences are 0-indexed in Python:

In [None]:
tup[0]

's'

When you're defining tuples within more complicated expressions, it’s often necessary to enclose the values in parentheses, as in this example of creating a tuple of tuples:

In [None]:
nested_tup = (4, 5, 6), (7, 8)

nested_tup

((4, 5, 6), (7, 8))

In [None]:
nested_tup[0]

(4, 5, 6)

In [None]:
nested_tup[1]

(7, 8)

While the objects stored in a tuple may be mutable themselves, once the tuple is created it’s not possible to modify.

If an object inside a tuple is mutable, such as a list, you can modify it in place:

In [None]:
tup = tuple(['foo', [1, 2], True])

tup[1].append(3)
tup

('foo', [1, 2, 3], True)

You can concatenate tuples using the **+** operator to produce longer tuples:

In [None]:
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

Multiplying a tuple by an integer, as with lists, has the effect of concatenating that many copies of the tuple:

In [None]:
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

Note that the objects themselves are not copied, only the references to them.

#####Unpacking tuples

If you try to assign to a tuple-like expression of variables, Python will attempt to unpack the value on the righthand side of the equals sign:



In [None]:
tup = (4,5,6)

a, b, c = tup
b

5

A common use of variable unpacking is iterating over sequences of tuples or lists:

In [None]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


There are some situations where you may want to "pluck" a few elements from the beginning of a tuple. There is a special syntax that can do this, ***rest**, which is also used in function signatures to capture an arbitrarily long list of positional arguments:

In [None]:
values = 1, 2, 3, 4, 5

a, b, *rest = values
rest

[3, 4, 5]

This rest bit is sometimes something you want to discard; there is nothing special about the rest name. As a matter of convention, many Python programmers will use the underscore (_) for unwanted variables:

In [None]:
values = 1, 2, 3, 4, 5

a, b, *_ = values
_

[3, 4, 5]

#####Tuple methods


Since the size and contents of a tuple cannot be modified, it is very light on instance methods. A particularly useful one (also available on lists) is **count**, which counts the number of occurrences of a value:

In [None]:
a = (1, 2, 2, 2, 3, 4, 2)

a.count(2)

4

####List

In contrast with tuples, lists are variable length and their contents can be modified in place. Lists are **mutable**. You can define them using square brackets [] or using the list type function:

In [None]:
a_list = [2, 3, 7, None]

tup = ("foo", "bar", "baz")

b_list = list(tup)
b_list

['foo', 'bar', 'baz']

In [None]:
b_list[1] = "peekaboo"
b_list

['foo', 'peekaboo', 'baz']

#####Adding and removing elements

Elements can be appended to the end of the list with the **append** method:

In [None]:
b_list.append('dwarf')
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

Using **insert** you can insert an element at a specific location in the list:

In [None]:
b_list.insert(1, 'red')
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']



---

Inserting elements into a sequence is **computationally expensive** compared to appending because it requires shifting internal references. If you need to insert elements at both the beginning and end of a sequence, consider using [collections.deque](https://www.geeksforgeeks.org/deque-in-python/), a double-ended queue optimized for this purpose found in the Python Standard Library.

---







The inverse operation to insert is **pop**, which removes and returns an element at a particular index:



In [None]:
b_list.pop(2)
b_list

['foo', 'red', 'dwarf']

Elements can be removed by value with **remove**, which locates the first such value and removes it from the list:

In [None]:
b_list.remove('red')
b_list

['dwarf']

Check if a list contains a value using the **in** keyword:

In [None]:
'dwarf' in b_list

True

The keyword **not** can be used to negate **in**:

In [None]:
'dwarf' not in b_list

False



---
Checking whether a **list** contains a value is a lot **slower than** doing so with **dictionaries and sets** (to be introduced shortly), as Python makes a linear scan across the values of the list, whereas it can check the others (based on hash tables) in constant time.


---





#####Concatenating and combining lists

Similar to tuples, adding two lists together with **+** concatenates them:

In [None]:
[4, None, "foo"] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

If you have a list **already defined**, you can append multiple elements to it using the **extend** method:

In [None]:
x = [4, None, "foo"]

x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

Note that list concatenation by addition is a **comparatively expensive operation** since a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable. Thus:

```
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)
```

is **faster** than the concatenative alternative:

```
everything = []
for chunk in list_of_lists:
    everything = everything + chunk
```

#####Sorting

You can sort a list in place (without creating a new object) by calling its **sort** function:

In [None]:
a = [7, 2, 5, 1, 3]

a.sort()
a

[1, 2, 3, 5, 7]

**sort** has a few options that will occasionally come in handy. One is the ability to pass a secondary sort key—that is, a function that produces a value to use to **sort the objects**. For example, we could sort a collection of strings by their lengths:

In [None]:
b = ["saw", "small", "He", "foxes", "six"]

b.sort(key=len)
b

['He', 'saw', 'six', 'small', 'foxes']

#####Slicing

You can select sections of most sequence types by using slice notation, which in its basic form consists of **start:stop** passed to the indexing operator **[]**:

In [None]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]

seq[1:5]

[2, 3, 7, 5]

While the element at the **start** index is **included**, the **stop** index is **not included**, so that the number of elements in the result is stop - start.

Either the start or stop can be **omitted**, in which case they default to the start of the sequence and the end of the sequence, respectively:

In [None]:
seq[:5]

[7, 2, 3, 7, 5]

In [None]:
seq[3:]

[7, 5, 6, 0, 1]

***Negative indices*** slice the sequence relative to the end:

In [None]:
seq[-4:]

[5, 6, 0, 1]

In [None]:
seq[-6:-2]

[3, 7, 5, 6]

A **step** can also be used after a second colon to, say, take every other element:

In [None]:
seq[::4]

[7, 5]

A clever use of this is to pass **-1**, which has the useful effect of reversing a list or tuple:

In [None]:
seq[::-1]

[1, 0, 6, 5, 7, 3, 2, 7]

####Dictionary

A dictionary stores a collection of **key-value pairs**, where key and value are Python objects. Each key is associated with a value so that a value can be conveniently retrieved, inserted, modified, or deleted given a particular key.

One approach for creating a dictionary is to use curly braces **{}** and colons to separate keys and values:

In [None]:
empty_dict = {}

d1 = {"a": "some value", "b": [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

You can access, insert, or set elements using the same syntax as for accessing elements of a list or tuple:

In [None]:
d1[7] = "an integer"
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

You can delete values using either the **del** keyword or the **pop** method (which simultaneously returns the value and deletes the key):

In [None]:
del d1[7]
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

The **keys** and **values** method gives you iterators of the dictionary's keys and values, respectively. The order of the keys depends on the order of their insertion, and these functions output the keys and values in the same respective order:

In [None]:
list(d1.keys())

['a', 'b']

In [None]:
list(d1.values())

['some value', [1, 2, 3, 4]]

If you need to **iterate** over both the keys and values, you can use the items method to iterate over the keys and values as 2-tuples:

In [None]:
list(d1.items())

[('a', 'some value'), ('b', [1, 2, 3, 4])]

You can **merge** one dictionary into another using the **update** method:

In [None]:
d1.update({"b": "foo", "c": 12})
d1

{'a': 'some value', 'b': 'foo', 'c': 12}

#####Creating dictionaries from sequences

It’s common to occasionally end up with two sequences that you want to pair up element-wise in a dictionary. As a first cut, you might write code like this:



```
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
```



Since a dictionary is essentially a collection of 2-tuples, the dict function accepts a list of 2-tuples:

In [None]:
tuples = zip(range(5), reversed(range(5)))
tuples

<zip at 0x7f61fe0f1c80>

In [None]:
mapping = dict(tuples)
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

#####Default values

It’s common to have logic like:

```
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value
```



Thus, the dictionary methods get and pop can take a default value to be returned, so that the above if-else block can be written simply as:

```
value = some_dict.get(key, default_value)
```



With setting values, it may be that the values in a dictionary are another kind of collection, like a **list**.

For example, you could imagine categorizing a list of words by their first letters as a dictionary of lists:

In [None]:
words = ["apple", "bat", "bar", "atom", "book"]

by_letter = {}

for word in words:
    first_letter = word[0]
    if first_letter not in by_letter:
        by_letter[first_letter] = [word]
    else:
        by_letter[first_letter].append(word)

by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The **setdefault** dictionary method can be used to simplify this workflow. The preceding for loop can be rewritten as:

```
by_letter = {}

for word in words:
     letter = word[0]
     by_letter.setdefault(letter, []).append(word)
```



The built-in collections module has a useful class, **defaultdict**, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dictionary:


```
from collections import defaultdict

by_letter = defaultdict(list)

for word in words:
     by_letter[word[0]].append(word)
```



#####Valid dictionary key types

Dictionaries in Python use keys and values, where keys have to be **immutable** (unchangeable) objects like numbers, strings or tuples and values can be any Python object. We can check if an object can be used as a key using the **hash()** function:

```
hash("string")
3634226001988967898

hash((1, 2, (2, 3)))
-9209053662355515447

hash((1, 2, [2, 3])) # fails because lists are mutable

```



To use a **list as a key**, one option is to convert it to a tuple, which can be hashed as long as its elements also can be:

In [None]:
d = {}

d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

####Set

A set is an **unordered** collection of unique elements. A set can be created in two ways: via the set function or via a set literal with curly braces

In [None]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [None]:
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

Sets support mathematical set operations like union, intersection, difference, and symmetric difference. Consider these two example sets:

In [None]:
a = {1, 2, 3, 4, 5}

b = {3, 4, 5, 6, 7, 8}

The **union** of these two sets is the set of distinct elements occurring in either set. This can be computed with either the union method or the **|** binary operator:

In [None]:
a.union(b) # or  a | b

{1, 2, 3, 4, 5, 6, 7, 8}

The **intersection** contains the elements occurring in both sets. The **&** operator or the intersection method can be used:

In [None]:
a.intersection(b) # or a & b

{3, 4, 5}

#####Commonly used set methods

######`a.add(x)`

Add element x to set a

######`a.clear()`

Reset set a to an empty state, discarding all of its elements


######`a.remove(x)`

Remove element x from set a

######`a.pop()`

Remove an arbitrary element from set a, raising KeyError if the set is empty

######`a.union(b)`

All of the unique elements in a and b and also `a | b`

######`a.update(b)`

Set the contents of a to be the union of the elements in a and b and also `a |= b`

######`a.intersection(b)`

All of the elements in both a and b and also `a & b`

######`a.intersection_update(b)`

Set the contents of a to be the intersection of the elements in a and b and also `a &= b`

######`a.difference(b)`

The elements in a that are not in b and also `a - b`

######`a.difference_update(b)`

Set a to the elements in a that are not in b and also `a -= b`

######`a.symmetric_difference(b)`

All of the elements in either a or b but not both and also `a ^ b`

######`a.symmetric_difference_update(b)`

Set a to contain the elements in either a or b but not both and also `a ^= b`

######`a.issubset(b)`

True if the elements of a are all contained in b and also `<=`

######`a.issuperset(b)`

True if the elements of b are all contained in a and also `>=`

######`a.isdisjoint(b)`

True if a and b have no elements in common

###Built-In Sequence Functions

####enumerate

It’s common when iterating over a sequence to want to keep track of the index of the current item. A do-it-yourself approach would look like:

```
index = 0
for value in collection:
   # do something with value
   index += 1
```



Since this is so common, Python has a built-in function, enumerate, which returns a sequence of (i, value) tuples:

```
for index, value in enumerate(collection):
   # do something with value
```



In summary, **enumerate()** is a useful function when we need to iterate over a sequence and keep track of the index or position of the elements. It helps to simplify the code and make it more readable, especially when we need to perform some operation on the elements based on their position or index.

####sorted

The sorted function returns a new sorted list from the elements of any sequence:

In [None]:
sorted([7, 1, 2, 6, 0, 3, 2])

[0, 1, 2, 2, 3, 6, 7]

In [None]:
sorted("horse race")

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

####zip

zip “pairs” up the elements of a number of lists, tuples, or other sequences to create a list of tuples:

In [None]:
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]

zipped = zip(seq1, seq2)

list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

zip can take an arbitrary number of sequences, and the number of elements it produces is determined by the **shortest** sequence:

In [None]:
seq3 = [False, True]

list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

####reversed

reversed iterates over the elements of a sequence in reverse order:

In [None]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

##2.   Functions

Functions are the primary and most important method of code organization and reuse in Python. As a rule of thumb, if you anticipate needing to repeat the same or very similar code more than once, it may be worth writing a reusable function. Functions can also help make your code more readable by giving a name to a group of Python statements.

Functions are declared with the **def** keyword. A function contains a block of code with an optional use of the **return** keyword:

```
def my_function(x, y):
return x + y
```



When a line with return is reached, the value or expression after return is sent to the context where the function was called, for example:

In [None]:
def my_function(x, y):
    return x + y

my_function(1,2)

3

In [None]:
result = my_function(1, 2)

result

3

###Namespaces, Scope, and Local Functions

Functions can access variables both inside and outside the function. Variables inside a function are assigned to the **local** namespace by default, which is created when the function is called and destroyed after the function completes:

```
def func():
    a = []
    for i in range(5):
        a.append(i)
```



When **func()** is called, the empty list a is created, five elements are appended, and then a is destroyed when the function exits. Suppose instead we had declared a as follows:

In [None]:
a = []

def func():
    for i in range(5):
        a.append(i)

In [None]:
func()

a

[0, 1, 2, 3, 4]

In [None]:
func()

a

[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

Assigning variables outside of the function's scope is possible, but those variables must be declared explicitly using either the **global** or nonlocal keywords:

In [None]:
a = None

def bind_a_variable():
  global a
  a = []
bind_a_variable()

print(a)

[]


###Returning Multiple Values

The ability to return multiple values from a function with simple syntax:

In [None]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()

###Functions Are Objects

Since Python functions are objects, many constructs can be easily expressed that are difficult to do in other languages. Suppose we were doing some data cleaning and needed to apply a bunch of transformations to the following list of strings:

```
states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda", "south   carolina##", "West virginia?"]
```



Anyone who has ever worked with user-submitted survey data has seen messy results like these. Lots of things need to happen to make this list of strings uniform and ready for analysis: stripping whitespace, removing punctuation symbols, and standardizing proper capitalization. One way to do this is to use built-in string methods along with the **re** standard library module for regular expressions:

In [None]:
import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub("[!#?]", "", value)
        value = value.title()
        result.append(value)
    return result

In [None]:
states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda", "south   carolina##", "West virginia?"]

clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

###Anonymous (Lambda) Functions

Python has support for so-called anonymous or lambda functions, which are a way of writing functions consisting of a single statement, the result of which is the return value. They are defined with the lambda keyword, which has no meaning other than “we are declaring an anonymous function”:

```
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

```


###Generators

Many objects in Python support **iteration**, such as over objects in a list or lines in a file. This is accomplished by means of the iterator protocol, a generic way to make objects iterable. For example, iterating over a dictionary yields the dictionary keys:

In [None]:
some_dict = {"a": 1, "b": 2, "c": 3}

for key in some_dict:
    print(key)

a
b
c


When you write for key in some_dict, the Python interpreter first attempts to create an iterator out of some_dict:

In [None]:
dict_iterator = iter(some_dict)

dict_iterator

<dict_keyiterator at 0x7fe97b055d60>

An iterator is an object that can provide elements to the Python interpreter when used in a for loop. Iterable objects, including lists and list-like objects, can be used with methods that expect a list.

This includes built-in methods such as min, max, and sum, as well as type constructors like list and tuple.

In [None]:
list(dict_iterator)

['a', 'b', 'c']

A **generator** is a convenient way, similar to writing a normal function, to construct a new iterable object. 

Whereas normal functions execute and return a single result at a time, generators can return a sequence of multiple values by pausing and resuming execution each time the generator is used.

To create a generator, use the **yield** keyword instead of return in a function:

In [None]:
def squares(n=10):
    print(f"Generating squares from 1 to {n ** 2}")
    for i in range(1, n + 1):
        yield i ** 2

When you actually call the generator, no code is immediately executed:

In [None]:
gen = squares()

gen

<generator object squares at 0x7fe97b0429e0>

It is not until you request elements from the generator that it begins executing its code:

In [None]:
for x in gen:
    print(x, end=" ")

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

The standard library **itertools** module has a collection of generators for many common data algorithms. For example, groupby takes any sequence and a function, grouping consecutive elements in the sequence by return value of the function. Here’s an example:

In [None]:
import itertools

def first_letter(x):
       return x[0]

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


###Error Handling

**SyntaxError:** indicates a syntax error in the code.

**NameError:** indicates a variable or function name is not defined.

**TypeError:** indicates an operation or function is applied to an object of inappropriate type.

**IndexError:** indicates an index is out of range.

**KeyError:** indicates a dictionary key is not found.

**ValueError:** indicates an operation or function received an argument of the correct type but with an inappropriate value.

**ImportError:** indicates a module cannot be imported.

**AttributeError:** indicates an attribute referenced does not exist.

**AssertionError:** indicates an assertion error in a program. An assertion is a statement used to check whether a given logical expression is true or false. If the assertion is true, then nothing happens and the program continues to run normally. If the assertion is false, then the program raises an AssertionError exception with an optional error message. 

**ZeroDivisionError:** indicates that division by zero occurred.

#NumPy Basics: Arrays and Vectorized Computation

NumPy is designed for efficient numerical computations on large arrays of data in Python. There are a number of reasons for this:
- NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects.
- NumPy's library of algorithms written in the C language can operate on this memory without any type checking or other overhead.
- NumPy arrays use much less memory than built-in Python sequences.
- NumPy operations perform complex computations on entire arrays without the need for Python for loops, which can be slow for large sequences.
- NumPy is faster than regular Python code because its C-based algorithms avoid overhead present with regular interpreted Python code.


A NumPy array of one million integers is faster and uses less memory than an equivalent Python list:

In [3]:
import numpy as np

my_arr = np.arange(1_000_000)

my_list = list(range(1_000_000))

##The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python.

Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.

To give you a flavor of how NumPy enables batch computations with similar syntax to scalar values on built-in Python objects, I first import NumPy and create a small array:

In [4]:
import numpy as np

data = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])
data

array([[ 1.5, -0.1,  3. ],
       [ 0. , -3. ,  6.5]])

In [7]:
data * 10

array([[ 15.,  -1.,  30.],
       [  0., -30.,  65.]])

In [8]:
data + data

array([[ 3. , -0.2,  6. ],
       [ 0. , -6. , 13. ]])

An ndarray is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type.

Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an object describing the data type of the array:

In [9]:
data.shape

(2, 3)

In [10]:
data.dtype

dtype('float64')

###Creating ndarrays

The easiest way to create an array is to use the array function.

In [11]:
data1 = [6, 7.5, 8, 0, 1]

arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

###Arithmetic with NumPy Arrays

Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays apply the operation element-wise:

In [12]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])

In [13]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [14]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

Comparisons between arrays of the same size yield Boolean arrays:

In [15]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

In [16]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

###Basic Indexing and Slicing

In [18]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]:
arr[5]

5

In [20]:
arr[5:8]

array([5, 6, 7])

In [22]:
arr[5:8] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

###Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping that similarly returns a view on the underlying data without copying anything. Arrays have the **transpose method** and the special **T** attribute:

In [23]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [24]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

Simple transposing with **.T** is a **special case of swapping** axes. ndarray has the method swapaxes, which takes a pair of axis numbers and switches the indicated axes to rearrange the data:

In [26]:
arr.swapaxes(0, 1)

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

##Pseudorandom Number Generation

The numpy.random module supplements the built-in Python random module with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.

For example, you can get a 4 × 4 array of samples from the standard normal distribution using numpy.random.standard_normal:

In [27]:
samples = np.random.standard_normal(size=(4, 4))
samples

array([[ 0.5566637 ,  0.03978011, -0.21191455, -0.03618346],
       [ 0.45320844, -0.71033148,  1.33866475,  0.51339232],
       [-0.66080802,  0.02602539, -0.43002497,  1.37105736],
       [ 0.93554776, -0.98467951,  1.1779419 , -0.47921325]])

####NumPy random number generator methods

| Method             | Description                                                                                     |
|--------------------|-------------------------------------------------------------------------------------------------|
| permutation        | Return a random permutation of a sequence, or return a permuted range                           |
| shuffle            | Randomly permute a sequence in place                                                            |
| uniform            | Draw samples from a uniform distribution                                                        |
| integers           | Draw random integers from a given low-to-high range                                             |
| standard_normal    | Draw samples from a normal distribution with mean 0 and standard deviation 1                    |
| binomial           | Draw samples from a binomial distribution                                                       |
| normal             | Draw samples from a normal (Gaussian) distribution                                              |
| beta               | Draw samples from a beta distribution                                                           |
| chisquare          | Draw samples from a chi-square distribution                                                     |
| gamma              | Draw samples from a gamma distribution                                                          |
| uniform [0, 1)     | Draw samples from a uniform distribution over the interval [0, 1)                               |



##Universal Functions: Fast Element-Wise Array Functions

####Some unary universal functions


| Function  | Description                                                                                      |
|-----------|--------------------------------------------------------------------------------------------------|
| abs, fabs | Compute the absolute value element-wise for integer, floating-point, or complex values           |
| sqrt      | Compute the square root of each element (equivalent to arr ** 0.5)                               |
| square    | Compute the square of each element (equivalent to arr ** 2)                                      |
| exp       | Compute the exponent ex of each element                                                          |
| log       | Natural logarithm (base e)                                                                       |
| log10     | Log base 10                                                                                      |
| log2      | Log base 2                                                                                       |
| log1p     | Log(1 + x)                                                                                       |
| sign      | Compute the sign of each element: 1 (positive), 0 (zero), or –1 (negative)                       |
| ceil      | Compute the ceiling of each element (i.e., the smallest integer greater than or equal to that    |
|           | number)                                                                                          |
| floor     | Compute the floor of each element (i.e., the largest integer less than or equal to each element) |
| rint      | Round elements to the nearest integer, preserving the dtype                                      |
| modf      | Return fractional and integral parts of array as separate arrays                                 |
| isnan     | Return Boolean array indicating whether each value is NaN (Not a Number)                         |
| isfinite  | Return Boolean array indicating whether each element is finite (non-inf, non-NaN)                |
| isinf     | Return Boolean array indicating whether each element is infinite                                 |
| cos       | Regular trigonometric functions                                                                  |
| cosh      | Hyperbolic trigonometric functions                                                               |
| sin       | Regular trigonometric functions                                                                  |
| sinh      | Hyperbolic trigonometric functions                                                               |
| tan       | Regular trigonometric functions                                                                  |
| tanh      | Hyperbolic trigonometric functions                                                               |
| arccos    | Inverse trigonometric functions                                                                  |
| arccosh   | Inverse hyperbolic trigonometric functions                                                       |
| arcsin    | Inverse trigonometric functions                                                                  |
| arcsinh   | Inverse hyperbolic trigonometric functions                                                       |
| arctan    | Inverse trigonometric functions                                                                  |
| arctanh   | Inverse hyperbolic trigonometric functions                                                       |
| logical_not | Compute truth value of not x element-wise (equivalent to ~arr)                                 |



####Some binary universal functions

| Function             | Description                                                                                                 |
|----------------------|-------------------------------------------------------------------------------------------------------------|
| add                  | Add corresponding elements in arrays                                                                        |
| subtract             | Subtract elements in second array from first array                                                          |
| multiply             | Multiply array elements                                                                                     |
| divide, floor_divide | Divide or floor divide (truncating the remainder)                                                           |
| power                | Raise elements in first array to powers indicated in second array                                           |
| maximum, fmax        | Element-wise maximum; fmax ignores NaN                                                                      |
| minimum, fmin        | Element-wise minimum; fmin ignores NaN                                                                      |
| mod                  | Element-wise modulus (remainder of division)                                                                |
| copysign             | Copy sign of values in second argument to values in first argument                                          |
| >, >=, <, <=, =, ≠   | Perform element-wise comparison, yielding Boolean array (equivalent to infix operators >, >=, <, <=, ==, !=)|
| logical_and          | Compute element-wise truth value of AND (&) logical operation                                               |
| logical_or           | Compute element-wise truth value of OR (|) logical operation                                                |
| logical_xor          | Compute element-wise truth value of XOR (^) logical operation                                               |


##Mathematical and Statistical Methods

A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class. You can use aggregations (sometimes called reductions) like **sum, mean, and std (standard deviation)** either by calling the array instance method or using the top-level NumPy function. When you use the NumPy function, like numpy.sum, you have to pass the array you want to aggregate as the first argument.

Here I generate some normally distributed random data and compute some aggregate statistics:

In [33]:
rng = np.random.default_rng(seed=12345)

arr = rng.standard_normal((5, 4))
arr

array([[-1.42382504,  1.26372846, -0.87066174, -0.25917323],
       [-0.07534331, -0.74088465, -1.3677927 ,  0.6488928 ],
       [ 0.36105811, -1.95286306,  2.34740965,  0.96849691],
       [-0.75938718,  0.90219827, -0.46695317, -0.06068952],
       [ 0.78884434, -1.25666813,  0.57585751,  1.39897899]])

In [34]:
arr.mean()

0.0010611661248891013

In [35]:
arr.sum()

0.021223322497782027

####Basic array statistical methods

| Method  | Description                                                                                      |
|-----------|--------------------------------------------------------------------------------------------------|
| sum       | Sum of all the elements in the array or along an axis; zero-length arrays have sum 0 |
| mean      | Arithmetic mean; invalid (returns NaN) on zero-length arrays                         |
| std, var  | Standard deviation and variance, respectively                                        |
| min, max  | Minimum and maximum                                                                  |
| argmin    |   Indices of minimum and maximum elements, respectively                              |
| cumsum    | Cumulative sum of elements starting from 0                                           |
| cumprod   | Cumulative product of elements starting fro                                          |
| sum       | Sum of all the elements in the array or along an axis; zero-length arrays have sum 0 |
| mean      | Arithmetic mean; invalid (returns NaN) on zero-length arrays                         |
| std, var  | Standard deviation and variance, respectively                                        |
| min, max  | Minimum and maximum                                                                  |
| cumsum    | Cumulative sum of elements starting from 0                                           |
| cumprod   | Cumulative product of elements starting from 1                                       |


#Getting Started with pandas