# Data Basic

## What is Data ? 


Data are raw facts or information, often in the form of numbers, text, images, or other formats. They're collected and analyzed to gain insights and make informed decisions in different areas like science, business, and technology.

In [1]:
# Importing the NumPy and Pandas library with the alias 'np' with the alias 'pd'
import numpy as np  
import pandas as pd
# Importing the pyplot module from the Matplotlib library with the alias 'plt'
import matplotlib.pyplot as plt  
# Jupyter Notebook magic command to display Matplotlib plots inline
%matplotlib inline  
# Importing the Seaborn library with the alias 'sns'
import seaborn as sns  

In [2]:
#Load csv data and Display the data

df = pd.read_csv('Dataset/ST.csv')
df.head(8)

Unnamed: 0,Customer ID,Age,Gender,Item Purchased,Category,Purchase Amount (USD),Location,Size,Color,Season,Review Rating,Subscription Status,Shipping Type,Discount Applied,Promo Code Used,Previous Purchases,Payment Method,Frequency of Purchases
0,1,55,Male,Blouse,Clothing,53,Kentucky,L,Gray,Winter,3.1,Yes,Express,Yes,Yes,14,Venmo,Fortnightly
1,2,19,Male,Sweater,Clothing,64,Maine,L,Maroon,Winter,3.1,Yes,Express,Yes,Yes,2,Cash,Fortnightly
2,3,50,Male,Jeans,Clothing,73,Massachusetts,S,Maroon,Spring,3.1,Yes,Free Shipping,Yes,Yes,23,Credit Card,Weekly
3,4,21,Male,Sandals,Footwear,90,Rhode Island,M,Maroon,Spring,3.5,Yes,Next Day Air,Yes,Yes,49,PayPal,Weekly
4,5,45,Male,Blouse,Clothing,49,Oregon,M,Turquoise,Spring,2.7,Yes,Free Shipping,Yes,Yes,31,PayPal,Annually
5,6,46,Male,Sneakers,Footwear,20,Wyoming,M,White,Summer,2.9,Yes,Standard,Yes,Yes,14,Venmo,Weekly
6,7,63,Male,Shirt,Clothing,85,Montana,M,Gray,Fall,3.2,Yes,Free Shipping,Yes,Yes,49,Cash,Quarterly
7,8,27,Male,Shorts,Clothing,34,Louisiana,L,Charcoal,Winter,3.2,Yes,Free Shipping,Yes,Yes,19,Credit Card,Weekly


## Data Variable

A “data variable” refers to a container that holds certain information or data. It can be a single value, such as a number or string, or a collection of values, such as a list, tuple, dictionary, or a more complex data structure like a DataFrame or NumPy array used in data analysis. The data types included are Category and Numeric

### Dataset : ST.csv (Shopping Trends)

[Data Source](https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset)

#### Category : 
- Nominal : Nominal scales involve labeling variables without assigning quantitative values, essentially serving as straightforward labels.

Example from the dataset : Location, Category, Gender, and other data

- Ordinal : Ordinal scales prioritize the order of values, but the exact differences between them aren't precisely quantified. 

Example from the dataset : Size

### Dataset : Flight.csv (Airplane Passenger Flight History)

[Data Source](https://www.kaggle.com/datasets/sandhiyakumar/airline-passenger-satisfication-data)

In [3]:
#Load csv data and Display the data

df = pd.read_csv('Dataset/Flight.csv')
df.head(8)

Unnamed: 0.1,Unnamed: 0,id,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Inflight wifi service,Departure/Arrival time convenient,...,Inflight entertainment,On-board service,Leg room service,Baggage handling,Checkin service,Inflight service,Cleanliness,Departure Delay in Minutes,Arrival Delay in Minutes,satisfaction
0,0,19556,Female,Loyal Customer,52,Business travel,Eco,160,5,4,...,5,5,5,5,2,5,5,50,44.0,satisfied
1,1,90035,Female,Loyal Customer,36,Business travel,Business,2863,1,1,...,4,4,4,4,3,4,5,0,0.0,satisfied
2,2,12360,Male,disloyal Customer,20,Business travel,Eco,192,2,0,...,2,4,1,3,2,2,2,0,0.0,neutral or dissatisfied
3,3,77959,Male,Loyal Customer,44,Business travel,Business,3377,0,0,...,1,1,1,1,3,1,4,0,6.0,satisfied
4,4,36875,Female,Loyal Customer,49,Business travel,Eco,1182,2,3,...,2,2,2,2,4,2,4,0,20.0,satisfied
5,5,39177,Male,Loyal Customer,16,Business travel,Eco,311,3,3,...,5,4,3,1,1,2,5,0,0.0,satisfied
6,6,79433,Female,Loyal Customer,77,Business travel,Business,3987,5,5,...,5,5,5,5,4,5,3,0,0.0,satisfied
7,7,97286,Female,Loyal Customer,43,Business travel,Business,2556,2,2,...,4,4,4,4,5,4,3,77,65.0,satisfied


#### Numeric :
- Discrite : Discrete data encompass items that are countable, assuming possible values that can be enumerated. The range of potential values may be fixed, termed finite, or extend infinitely from 0 onward.

Example from the dataset : Flight Distance

- Continous : Continuous data, on the other hand, represent measurements without countable values, only describable through intervals on the real number line.

Example from the dataset : Age

# Data Categories
"Data categories" typically denote different classifications or types of data based on their characteristics, attributes, or properties. These classifications aid in organizing and comprehending various types of information and direct the methods employed for data collection, storage, analysis, and interpretation. Two primary categories within this context are Quantitative Data and Qualitative Data.

- Quantitative data : Quantitative data is information that is expressed in numerical form or quantities. This type of data can be measured and counted numerically.

Example from the data : Age

In [4]:
df['Age'].describe()

count    25976.000000
mean        39.620958
std         15.135685
min          7.000000
25%         27.000000
50%         40.000000
75%         51.000000
max         85.000000
Name: Age, dtype: float64

- Qualitative data : qualitative data is information that is expressed in descriptive or qualitative form. It describes specific characteristics or qualities that cannot be measured numerically.

Example from the data : satisfaction

In [5]:
df['satisfaction'].unique()

array(['satisfied', 'neutral or dissatisfied'], dtype=object)

# Metadata
Metadata is descriptive or structural data that provides information about other data, offering context, meaning, and organization. It includes attributes like title, author, creation date, file format, and keywords. Essentially, metadata is crucial for efficiently managing, understanding, and retrieving data.

Example : 

Soure : 
[Data Source](https://opendata.jabarprov.go.id/id/dataset/jumlah-peminatan-kelas-pelatihan-candradimuka-jabar-coding-camp-berdasarkan-batch-dan-mode-kelas-di-jawa-barat)

![](Image/Metadata.png)

# Big Data
Big data refers to extremely large and complex datasets that cannot be easily managed, processed, or analyzed using traditional data processing applications or techniques.

Example : 

- Social Media Analytics: Social media companies like Facebook, Twitter, or Instagram collect billions of user data every day, including statuses, comments, images, and more.