# NOT COMPLETE

### Technical Note on Series Construction
The command **`pd.Series([1, 3, 3, 7])`** appears as though the function **Series** from the pandas module is being invoked. All previous code in these notebooks that ended in parentheses was calling a function.  Technically, **`pd.Series([1, 3, 3, 7])`** is not a function. It is creating an **instance** of the **Series** class by passing a 4 element list to the Series constructor.

For practical purposes of this notebook, there are no differences between a class constructor and a function that returns an object. For example, **`np.arange(100)`** is technically a function that returns a numpy ndarray and  **`pd.Series([1, 3, 3, 7])`** calls the Series constructor and creates a pandas Series object.

Very typically, class names will be capitalized just like Series and function will not be like **`arange`**.

In [1]:
import numpy as np
import pandas as pd

In [2]:
type(np.arange), type(pd.Series)

(builtin_function_or_method, type)

### Construct a series with a given index
Series can be constructed by passing it a specific user-defined index. The arguments **data** and **index** can be used to explicity define each.

In [3]:
# construct series by explicitly defining the values and the index
s = pd.Series(data=[1,3,3,77], index=['a', 'b', 'c', 'd'])
s

a     1
b     3
c     3
d    77
dtype: int64

In [4]:
# create a series with an index of different data types
# almost never done in practice but shows what is possible
s = pd.Series(data=[1,3,3,77], index=[{'a', 'b'}, ('b','c'), range(10), 'd'])
s

{a, b}                             1
(b, c)                             3
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)     3
d                                 77
dtype: int64

In [5]:
s.index

Index([{'a', 'b'}, ('b', 'c'), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), 'd'], dtype='object')

### Technical Note on Series Construction
The command **`pd.Series([1, 3, 3, 7])`** appears as though the function **Series** from the pandas module is being invoked. All previous code in these notebooks that ended in parentheses was calling a function.  Technically, **`pd.Series([1, 3, 3, 7])`** is not a function. It is creating an **instance** of the **Series** class by passing a 4 element list to the Series constructor.

For practical purposes of this notebook, there are no differences between a class constructor and a function that returns an object. For example, **`np.arange(100)`** is technically a function that returns a numpy ndarray and  **`pd.Series([1, 3, 3, 7])`** calls the Series constructor and creates a pandas Series object.

Very typically, class names will be capitalized just like Series and function will not be like **`arange`**.

### Using NumPy to create a Series with random values
A common way to create Series for practice is to fill them with NumPy random values. NumPy has an excellent random module that provides more functionality than the built-in Python random module for creating all sort of random numbers from different distributions.

Below, a Series of length 100 will be created with random numbers between 0 and 1 using the `np.random.rand` function. The index will be started from 10.

### Manually construct a DataFrame
It is quite rare that you will actually need to construct a DataFrame manually.  Most frequently you will be reading external flat files, getting data from the web or reading from relational databases. Nonetheless, it does occur.

Just like the **`pd.Series`** is technically a constructor, **`pd.DataFrame`** constructs a DataFrame by accepting a dictionary with strings as the **keys** and lists as the **values**.

In [6]:
# Lets import our packages.
import pandas as pd
import numpy as np

In [7]:
# create dataframe from a dictionary of lists. The keys are the column names NOT the indices
df = pd.DataFrame({'name':['Ted', 'Ned', 'Jed'], 'Phone':['Samsung', 'Samsung', 'IOS'], 'Favorite Number':[99, 7, 4]})
df

Unnamed: 0,Favorite Number,Phone,name
0,99,Samsung,Ted
1,7,Samsung,Ned
2,4,IOS,Jed


### Closely Examine Output
It is clear that there are three columns and three rows to the DataFrame. What's not as obvious is that there exists an index for the rows with labels 0,1,2. The column names (which are also Index objects) are Favorite Number, Phone and name.

### Why did the column order get changed?
If you looked closely, the order of the columns did not match the order in which they were written in the dictionary. Dictionaries are inherently unordered, so it isn't likely that the order of the keys in the dictionary will be the same as the in the DataFrame produced from it.

It is possible to specify the order explicitly in the constructor with the **`columns`** parameter.

In [8]:
# Let's fix the column order 
df = pd.DataFrame({'name':['Ted', 'Ned', 'Jed'], 'Phone':['Samsung', 'Samsung', 'IOS'], 'Favorite Number':[99, 7, 4]},
                 columns=['name', 'Phone', 'Favorite Number'])
df

Unnamed: 0,name,Phone,Favorite Number
0,Ted,Samsung,99
1,Ned,Samsung,7
2,Jed,IOS,4


### Creating a DataFrame from NumPy
An easy way to create a DataFrame is using a 2-d numpy array. Below, a 5x10 numpy array is used with both column and row indexes provided.

In [9]:
df = pd.DataFrame(np.random.rand(10,5), columns=['one', 'two', 'three', 'four', 'five'], index=list('abcdefghij'))
df

Unnamed: 0,one,two,three,four,five
a,0.205787,0.000864,0.301004,0.111961,0.078666
b,0.84067,0.003686,0.726316,0.691093,0.511413
c,0.151676,0.217502,0.621104,0.002138,0.408129
d,0.492448,0.967101,0.600174,0.790895,0.641191
e,0.813088,0.869537,0.917518,0.807937,0.156791
f,0.656279,0.672472,0.408419,0.241512,0.67759
g,0.055616,0.588628,0.073936,0.122838,0.333524
h,0.550952,0.279748,0.612938,0.695909,0.930667
i,0.708171,0.352467,0.936233,0.150488,0.723554
j,0.719625,0.54412,0.087835,0.960436,0.286158
