## **Variables**

* Variables are used to store data of different types in python.

In [1]:
# Variable declaration
order_init = 20
print('Initial order:', order_init)

# updating the variable
order_upd = order_init + 10
print('Updated order:', order_upd)

Initial order: 20
Updated order: 30


* In the above cell, 'order_init' is a variable of string data type.
* 'order_init' and 'order_upd' above are of integer (int) type.

## **Operators**

* Operators are special symbols that are used to show the relationship between certain variables.

#### **Arithmetic operators**
* This includes addition, subtraction, multiplication and division.

In [2]:
# Multiplication of variables.
quantity = 30
unit_price = 150.50

total_amnt = quantity*unit_price

print('Total Order Amount:', total_amnt)

Total Order Amount: 4515.0


#### **Comparison operators**

* These operators are used to compare two quantities.
* Sample comparison operators are "==", "!=" etc.

In [3]:
# Taking the user input
product = input('Enter the product: ')

# Logical operators
if product == 'phone':
    print('You will receive a 20% discount on this product!!')
elif product == 'chair':
    print('You will receive a 10% discount on this product!!')
else:
    print('Sorry! There is no discount on this product.')

Enter the product: chair
You will receive a 10% discount on this product!!


#### **Assignment operators**

* These are used to assign a certain value to a variable.
* Few of the sample assignment operators are "=","+=" etc

In [4]:
# Here order_id is a variable that is assigned the value "WOWORD001".
order_id = "WOWORD001"

print("Order IDs: ", order_id)

Order IDs:  WOWORD001


## **Data structures**

* Data structures are the way to store and organize data to make it easy to process and operate.
* Some of the examples are lists, tuples, dictionary, sets etc.
* Let us cover them one by one.

#### **Lists**

* A list is a mutable data structure used to store data.

In [5]:
# Lists
order_id_list = ["WOWORD001", "WOWORD002", "WOWORD003", "WOWORD004"]

# let's print the list we just created
print('Order IDs:', order_id_list)

Order IDs: ['WOWORD001', 'WOWORD002', 'WOWORD003', 'WOWORD004']


In [6]:
# The "append" method of list.
order_id_list.append("WOWORD005")
print('Order IDs:', order_id_list)

Order IDs: ['WOWORD001', 'WOWORD002', 'WOWORD003', 'WOWORD004', 'WOWORD005']


* The list of order IDs is fixed now.
* Let's create lists for the other attributes too.

#### **Tuples**

It is an immutable data structure used in python to store and operate on data.

In [7]:
sample_tuple=(23,45,56,12)
sample_tuple

(23, 45, 56, 12)

#### **Dictionaries**

* It is a data structure that stores data in key, value pair. 
* It is created by using the curly bracs {}.

In [8]:
sample_dict = {"a":23, "b":45}
sample_dict

{'a': 23, 'b': 45}

#### **Sets**

* A set is an unordered collection of data type that is mutable, has unique entries and is iterable.

In [9]:
# Creating the sample set
set_a = {3,4,5,6,5}
# While calling the set_a it will only show the unique values
set_a

{3, 4, 5, 6}

## **Loops**

* A loop is a sequence of instructions that is repeated until a certain condition is reached.
* The For and While loops are two examples of loops. 

In [10]:
# Let us cover "for" loop here.
superstore_data=["Office Supplies","Others"]
for i in superstore_data:
  print('Product Ordered:', i)
  if i == 'Office Supplies':
    print('Sorry! The product is currently not being shipped.\n')
  else:
    print('Your order will be shipped at the earliest.\n')

Product Ordered: Office Supplies
Sorry! The product is currently not being shipped.

Product Ordered: Others
Your order will be shipped at the earliest.



## **Context** ##

**WOW Superstore sells different products like office supplies, furniture and appliances. For each order that is placed, the superstore keeps a record of various attributes related to the product, like state, city, category, subcategory and quantity.**

## **Statistics** ###

* We will begin with central tendencies in the stats section.
* After this, measure of dispersions and correlation will be covered.

In [11]:
import pandas as pd

ModuleNotFoundError: No module named 'pandas'

In [None]:
# loading the data file into a pandas dataframe
data = pd.read_csv('WOW.csv')

**Let's take a quick look at the data.**

In [None]:
# viewing the first 5 rows of the data
data.head()

In [None]:
# viewing the last 5 rows of the data
data.tail()

#### **Mean**
* Let us start with the mean of the Sales. 
* It is the average value of all records in sales.
* The mean sales value is 192.3 as shown in the below output.

In [None]:
# Calculate mean of Sales
data.Sales.mean()

#### **Median**
* Median is the value that lies in the middle of the records arranged in either increasing or decreasing order. 
* Let us find out the median of the Sales.

In [None]:
# Calculate median of Sales
data.Sales.median()

#### **Mode**
* Mode is the most frequent value in the given set of records. 
* In the Sales column let us see the mode.

In [None]:
# Calculate mode of the Sales. It gives the most frequent item of the column.
data.Sales.mode()

#### **Range**
* It is one of the measures that shows the dispersion of the data points.
* Range of a set of numbers is the difference of the maximum and the minimum values in the given set.
* A higher range shows high dispersion overall.

In [None]:
range=data.Sales.max()-data.Sales.min()

In [None]:
import numpy as np

#### **Variance**
* Variance is a dispersion measure in statistics that shows the spread of the records from a fix value. In general mean is taken as to find the spread/variance from.
* Let us find the variance of the Sales column in the data.

In [None]:
np.var(data.Sales)

#### **Standard Deviation**
* Standard deviation also shows the dispersion in the data.
* It is in the same unit as the data points.
* Let us find the Standard Deviation of the Sales column in the data.

In [None]:
np.std(data.Sales)

#### **Interquartile Range**
* Interquartile range is the difference between the third quartile and the first quartile of given records.
* It is one of the measures of dispersions of the records, let us compute for Sales.

In [None]:
# First quartile (Q1)
Q1 = np.percentile(data.Sales, 25, interpolation = 'midpoint')
  
# Third quartile (Q3)
Q3 = np.percentile(data.Sales, 75, interpolation = 'midpoint')
  
# Interquaritle range (IQR)
IQR = Q3 - Q1
  
print(IQR)

In [None]:
import matplotlib.pyplot as plt

In [None]:
data.head()

In [None]:
data.columns

## **Boxplot**

* A boxplot is a visual representation of the five point summary of a given set of datapoints.
* It includes the minimum, the first quartile, the median, the third quartile and the maximum value in the data.

In [None]:
# A sample boxplot is shown here
plt.figure(figsize=(10,6))
plt.boxplot(data.Quantity)
plt.show()

#### **Correlation**
* Correlation is the measure of strength of relationship between two features.
* In the current data let us find it for all the numeric features.

In [None]:
data.corr()

* The table above is the pairwise correlation coefficient of variables. 
* A positive high value shows a strong positive relation between the features and vice versa.

* One of the  important python methods for statistical description is the describe method.
* Let us see that for the sales column in the data.

In [None]:
data.describe()

* As an ouput we get the count of entries in the variable, the mean value, the standard deviation, minimum value, first, second and the third quartiles and the maximum value.