# DATAFRAME
---

A DataFrame is a 2-dimensional data structure of rows and columns, similar to a spreadsheet.

## Creating DataFrame from Lists

You can create a DataFrame by adding columns in the data structure. Each column is created from a collection of data represented as a list.

In [93]:
import pandas as pd

In [94]:
#creating an empty DataFrame
company = pd.DataFrame()

# creating data collections as lists
company_months = ['January','February','March', 'April', 'May', 'June', "July"]
company_income = [23500,19700,31150, 234564, 843759, 934565, 398703]

# adding columns to DataFrame
company['Month'] = company_months
company['Income'] = company_income

# displaying DataFrame contents
company

Unnamed: 0,Month,Income
0,January,23500
1,February,19700
2,March,31150
3,April,234564
4,May,843759
5,June,934565
6,July,398703


### Tasks

Complete the data by adding the remaining months of the first half of the year and the income earned in these months. Then, display the data again.

In [95]:
company

Unnamed: 0,Month,Income
0,January,23500
1,February,19700
2,March,31150
3,April,234564
4,May,843759
5,June,934565
6,July,398703


Display descriptive statistics for income earned.

In [96]:
company['Income'].describe()

count         7.000000
mean     355134.428571
std      391235.812866
min       19700.000000
25%       27325.000000
50%      234564.000000
75%      621231.000000
max      934565.000000
Name: Income, dtype: float64

Display company income for the months of the second quarter.

In [97]:
company[3:6].sum()

Month     AprilMayJune
Income         2012888
dtype: object

Display descriptive statistics for the income earned in the months of the second quarter.

In [98]:
company[3:6].describe()

Unnamed: 0,Income
count,3.0
mean,670962.666667
std,380649.812308
min,234564.0
25%,539161.5
50%,843759.0
75%,889162.0
max,934565.0


## Creating DataFrame from 2D List

Instead of adding each column separately, you can create a DateFrame based on a two-dimensional (2D) list. Note that you will then need to add names to the columns you create.

In [99]:
# creating data collection as 2D list
company_data = [
    ['January',23500],
    ['February',19700],
    ['March',31150]
    ]

# creating DataFrame with column names
company = pd.DataFrame(data=company_data, columns=['Month','Income'])

# displaying DataFrame contents
company

Unnamed: 0,Month,Income
0,January,23500
1,February,19700
2,March,31150


### Tasks

The table below lists the university's students. 

StudentID | Name        | Surname      | Age | Program
----------|-------------|--------------|-----|-----------
902311    | Peter       | Red          | 21  | Accounting   
915027    | Sofia       | White        | 19  | Computer Science
900004    | Jack        | Grey         | 24  | Accounting
994031    | Mark        | Brown        | 22  | Engineering         

Create a DataFrame using a 2D list. Then, display the contents of the DataFrame.

In [100]:
# creating data collection as 2D list
uni_students = [
    [902311,'Peter', 'Red', 21, 'Accounting'],
    [915027, 'Sofia', 'White', 19, 'Computer Science'],
    [900004,'Jack', 'Grey', 24, 'Accounting'],
    [900004,'Mark', 'Brown', 22, 'Engineering']
    ]

# creating DataFrame with column names
uni_students = pd.DataFrame(data=uni_students, columns=['StudentID','Name', 'Surname', 'Age', 'Program'])
uni_students

Unnamed: 0,StudentID,Name,Surname,Age,Program
0,902311,Peter,Red,21,Accounting
1,915027,Sofia,White,19,Computer Science
2,900004,Jack,Grey,24,Accounting
3,900004,Mark,Brown,22,Engineering


Calculate and display the average age of students.

In [101]:
uni_students['Age'].sum() / 4

21.5

## Creating DataFrame from Dictionary

As you know, a dictionary contains data consisting of key and value pairs of information, separated by a colon. Each pair of information represents one column in the DataFrame. The key is the name of the column and the value is the data collection (list). Below is an example of creating a DataFrame based on a dictionary.

In [102]:
# creating data collection as a dictionary
company_data = {
    'Month':['January','February','March'],
    'Income':[23500,19700,31150],
    'Tax':[1200, 2350, 995]
    }

#creating DataFrame
company = pd.DataFrame(data=company_data)

#displaying DataFrame contents
company

Unnamed: 0,Month,Income,Tax
0,January,23500,1200
1,February,19700,2350
2,March,31150,995


### Tasks

Complete the DataFrame by adding a 'Tax' column along with the following values: 1200, 2350, 995. Then, display DataFrame contents.

Display descriptive statistics for the company income and tax.

In [103]:
company.describe()

Unnamed: 0,Income,Tax
count,3.0,3.0
mean,24783.333333,1515.0
std,5831.880772,730.359501
min,19700.0,995.0
25%,21600.0,1097.5
50%,23500.0,1200.0
75%,27325.0,1775.0
max,31150.0,2350.0


## Creating DataFrame from File

Creating a DataFrame based on the data contained in a CSV file is incredibly simple. All you need to do is use the read_csv() function.

In [104]:
sales = pd.read_csv('product_sales.csv')
sales

Unnamed: 0,SaleRep,Region,Orders,TotalSales
0,Felice Lunck,West,218,44489
1,Doralynn Pesak,West,233,61035
2,Madelle Martland,East,264,62603
3,Yasmin Myhan,South,110,59377
4,Marmaduke Webbe,East,188,78771
5,Christiano Vero,East,265,68506
6,Cecelia Jealous,West,93,53634
7,Isaak Housiaux,East,189,62455
8,Derril Howland,East,385,73460
9,Judon Allom,West,230,51067


### Tasks

For sales data, calculate and display the average number of orders.

In [105]:
sales['Orders'].sum() / 9

241.66666666666666

For sales data, calculate and display the total sales value.

In [106]:
sales['TotalSales'].sum()

615397