<img src=https://i.ibb.co/6gCsHd6/1200px-Pandas-logo-svg.png width="700" height="200">

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#060108; font-size:200%; text-align:center; border-radius:10px 10px;">Data Analysis with Python</p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#4d77cf; font-size:200%; text-align:center; border-radius:10px 10px;">Playing with Pandas Series & DataFrames</p>

<a id="toc"></a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Content</p>

* [OVERVIEW](#0)
* [IMPORTING LIBRARIES NEEDED IN THIS NOTEBOOK](#1)
* [CREATING NUMPY ARRAYS](#2)
* [WORKING WITH SERIES DATA STRUCTURE](#3)
* [CREATING A PANDAS DATAFRAMES](#4)
* [WORKING WITH DATAFRAMES](#5)
* [INDEXING, SLICING & SELECTION](#6)
* [THE END OF THE LAB-01 SESSION](#7)

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Overview</p>

<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

## What is Pandas in Python?

[**Pandas**](http://pandas.pydata.org/) is the most famous python library providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way towards this goal.

In Pandas, the data is usually utilized to support the statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.

Its popularity has surged in recent years, coincident with the rise of fields such as data science and machine learning. Here’s a popularity comparison over time against STATA, SAS, and [dplyr](https://dplyr.tidyverse.org/) courtesy of Stack Overflow Trends

<img src="https://i.ibb.co/crf3ksp/pandas-vs-rest.png" style="">

## Core Components of Pandas Data Structure

Organizing any data in a particular way is known as a data structure. **``Pandas``** have **two core data structure** components, and all operations are based on those two objects. Here are the two pandas data structures:

  - [**Series :**](https://pandas.pydata.org/docs/reference/api/pandas.Series.html) A kind of one-dimensional array of any data type that we specified in the pandas module.
  - [**DataFrame :**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns of potentially different types.

## Main Features

Just as [**NumPy**](http://www.numpy.org/) provides the basic array data type plus core array operations, **``Pandas``**;

1. defines fundamental structures for working with data and  
1. endows them with methods that facilitate operations such as  
  
  - reading in data  
  - adjusting indices  
  - working with dates and time series  
  - sorting, grouping, re-ordering and general data munging <sup><a href=#mung id=mung-link>[1]</a></sup>  
  - dealing with missing values, etc., etc.  
  
Here are just a few of the things that pandas does well:

  - Easy handling of [missing data](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html) (represented as **``NaN``**) in floating point as well as non-floating point data
  - Size mutability: columns can be [inserted and deleted](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html) from DataFrame and higher dimensional objects
  - Automatic and explicit [data alignment](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html): objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let **``Series``**, **``DataFrame``**, etc. automatically align the data for you in computations
  - Powerful, flexible [group by](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
  - Make it [easy to convert](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html) ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
  - Intelligent label-based [slicing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html), [fancy indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html), and [subsetting](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html) of large data sets
  - Intuitive [merging](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html) and [joining](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html) datasets
  - Flexible [reshaping](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html) and [pivoting](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html) of datasets
  - [Hierarchical labeling](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html) of axes (possible to have multiple labels per tick)
  - Robust IO tools for loading data from [flat files](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html) (CSV and delimited), [Excel files](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html), [databases](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html), and saving/loading data from the ultrafast [HDF5 format](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html)
  - [Time series](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html)-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

More sophisticated statistical functionality is left to other packages, such as [statsmodels](http://www.statsmodels.org/) and [scikit-learn](http://scikit-learn.org/), which are built on top of pandas.

This session will provide a basic introduction to Pandas. Throughout the session, we will assume that the following imports have taken place.

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Importing Libraries Needed in This Notebook</p>

<a id="1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Once you've installed NumPy & Pandas you can import them as a library:

In [1]:
import numpy as np
import pandas as pd

# pd.options.display.float_format = '{:20,.2f}'.format  # Suppressing scientific notation in pandas

In [2]:
df = pd.DataFrame(np.random.random(5)**10, columns=["random"])
df

Unnamed: 0,random
0,1.679932e-05
1,5.448758e-06
2,0.0007888373
3,6.6881e-13
4,0.06096921


In [3]:
print(7.187458e-07 == 0.0000007187458)

True


In [4]:
# Solution-1 --> round()function

print(df.round(2))

   random
0    0.00
1    0.00
2    0.00
3    0.00
4    0.06


In [5]:
# For numpy arrays

arr = np.random.random(5)**10
print(arr)
print("---"*10)
print(np.round(arr,3))

[4.55139908e-02 5.50226246e-05 1.06939103e-07 9.55188196e-11
 2.59118416e-04]
------------------------------
[0.046 0.    0.    0.    0.   ]


In [6]:
# Solution-2

pd.options.display.float_format = '{:20,.4f}'.format
df

Unnamed: 0,random
0,0.0
1,0.0
2,0.0008
3,0.0
4,0.061


In [6]:
# For numpy arrays

print(arr)
print("---"*10)
np.set_printoptions(suppress=True)
print(arr)
print("---"*10)
np.set_printoptions(precision=4)
print(arr)

[1.22162358e-09 1.49971158e-02 5.36457930e-05 1.27556586e-03
 5.02661524e-03]
------------------------------


<IPython.core.display.Javascript object>

[0.         0.01499712 0.00005365 0.00127557 0.00502662]
------------------------------


<IPython.core.display.Javascript object>

[0.     0.015  0.0001 0.0013 0.005 ]


In [3]:
np.set_printoptions(precision=None, suppress=None)

In [7]:
pd.options.display.float_format = '{:20,.2f}'.format  # Suppressing scientific notation in pandas

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Creating Numpy Arrays & Series</p>

<a id="2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

**``NumPy``**, which stands for Numerical Python, is a library consisting of multidimensional array objects. **``Numpy Array``** is the foundation for all data science in Python. Arrays can be multidimensional, and all elements in an array need to be of the same type, all integers or all floats. Python lists are a substitute for arrays, but they fail to deliver the performance required while computing large sets of numerical data.

**Advantages of using an Array**
* Arrays can handle very large datasets efficiently
* Computationally-memory efficient
* Faster calculations and analysis than lists
* Diverse functionality (many functions in Python packages). With several Python packages that make trend modeling, statistics, and visualization easier.

**You can create a basic Array by calling** **``numpy.array()``**.

**``numpy.array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0, like=None)``**

**``Pandas Series``** is a **one-dimensional** data structure. It can hold data of many types including **``objects``**, **``floats``**, **``strings``** and **``integers``**. You can create a Series by calling **``pandas.Series()``**. A **``list``**, **``numpy array``**, **``dict``** can be turned into a **``Pandas Series``**. You should use the simplest data structure that meets your needs [Source](https://pythonbasics.org/pandas-series/). The **``axis labels``** are collectively called **``index``**. **``Labels``** need not to be unique but must be a [**hashable type**](https://stackoverflow.com/questions/14535730/what-does-hashable-mean-in-python#:~:text=In%20Python%2C%20any%20immutable%20object,sets%20to%20track%20unique%20values.). The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index [Source](https://www.geeksforgeeks.org/creating-a-pandas-series/).

**``Series``** and **``DataFrame``** are two important data types defined by Pandas.

You can think of a Series as a “column” of data, such as a collection of observations on a single variable.

A DataFrame is an object for storing related columns of data or combination of Series.

**``A Series``** holding a variety of object types is a **one-dimensional data structure** and **homogeneous**; that is, all data are of the same type and are implicitly labelled with an index. For example, we can have a Series of integers, real numbers, characters, strings, dictionaries, etc. We can conveniently manipulate these series performing operations like adding, deleting, ordering, joining, filtering, vectorized operations, statistical analysis, plotting, etc. 

**``A Series``** is very **similar to a NumPy array** (in fact it is built on top of the NumPy array object). **What differentiates** the NumPy array from a Series, is that a Series can **have axis labels**, meaning it can be indexed by a label, instead of just a number location. It also doesn’t need to hold numeric data, it can hold any arbitrary Python Object [Source](http://www.datasciencelovers.com/python-for-data-science/pandas-series/).

So important point to remember for Pandas series is:

- Homogeneous data
- Size Immutable
- Values of Data Mutable

**You can create a Series by calling** **``pandas.Series()``**. A **``list``**, **``numpy array``**, **``dict``** can be turned into a Pandas Series.

**``pd.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)``**

**Create a basic Numpy Array named "my_array" using the following list.**

[0, 1, 2, 3, 4]

In [8]:
# YOUR CODE IS HERE
my_array = np.arange(5)
my_array

array([0, 1, 2, 3, 4])

**Create an empty NumPy array named "empty_array" of (3, 4) shape including integers.**

In [9]:
# YOUR CODE IS HERE
empty_array = np.zeros((3,4), dtype=int)
empty_array

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [10]:
empty_array = np.empty((3,4), dtype=int)
empty_array

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

**Create a full NumPy array of named "full_array" (3, 3) shape including 99.99 floating points.**

In [11]:
# YOUR CODE IS HERE

full_array = np.full((3,3), 99.99)
full_array

array([[99.99, 99.99, 99.99],
       [99.99, 99.99, 99.99],
       [99.99, 99.99, 99.99]])

In [12]:
np.ones((3,3))*99.99  # 2nd way

array([[99.99, 99.99, 99.99],
       [99.99, 99.99, 99.99],
       [99.99, 99.99, 99.99]])

In [27]:
full_array = np.full((3,3), fill_value = 99.99, dtype = float)
full_array

array([[99.99, 99.99, 99.99],
       [99.99, 99.99, 99.99],
       [99.99, 99.99, 99.99]])

In [32]:
full_array = np.full(fill_value = 99.99, shape=(3,3), dtype = float)
full_array

array([[99.99, 99.99, 99.99],
       [99.99, 99.99, 99.99],
       [99.99, 99.99, 99.99]])

**Create a Pandas Series named "my_series" using the following list.**

['a', 'b', 'c', 'd', 'e']

In [35]:
# YOUR CODE IS HERE

pd.Series("a b c d e".split())


0    a
1    b
2    c
3    d
4    e
dtype: object

One of the essential pieces of NumPy is the ability to perform quick element-wise operations, both with basic arithmetic (addition, subtraction, multiplication, etc.) and with more sophisticated operations (trigonometric functions, exponential and logarithmic functions, etc.). 

**``Pandas Series``** **are built on top of** **``NumPy arrays``** **and support many** **``similar operations:``**

In [36]:
# list, arr ve series farkını gorelim

my_list = [0,1,2,3,4,5]
my_list

[0, 1, 2, 3, 4, 5]

In [37]:
arr = np.array(my_list)
arr

array([0, 1, 2, 3, 4, 5])

In [39]:
ser= pd.Series(my_list, index = "a b c d e f".split() )
ser

a    0
b    1
c    2
d    3
e    4
f    5
dtype: int64

In [42]:
print(my_list *3)
print(arr *3)
print(ser *3)

[0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5]
[ 0  3  6  9 12 15]
a     0
b     3
c     6
d     9
e    12
f    15
dtype: int64


In mathematics, element-wise operations refer to operations on individual elements of a matrix. Any arithmetic operations in arrays applies the operation elementwise. **[NumPy Basics: Arrays and Vectorized Computation](https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.html)** & **[Numerical Operations on Arrays](https://scipy-lectures.org/intro/numpy/operations.html)**

**Create a random Pandas Series of float numbers and name this Series as "Daily Returns".**

In [43]:
# YOUR CODE IS HERE

daily_returns = pd.Series(np.random.rand(4), name = "Daily Returns")
daily_returns

0                   0.73
1                   0.83
2                   0.53
3                   0.97
Name: Daily Returns, dtype: float64

In [44]:
daily_returns > 0.5

0    True
1    True
2    True
3    True
Name: Daily Returns, dtype: bool

In [45]:
daily_returns[daily_returns > 0.5]

0                   0.73
1                   0.83
2                   0.53
3                   0.97
Name: Daily Returns, dtype: float64

In [46]:
daily_returns + 5

0                   5.73
1                   5.83
2                   5.53
3                   5.97
Name: Daily Returns, dtype: float64

In [47]:
daily_returns / 5

0                   0.15
1                   0.17
2                   0.11
3                   0.19
Name: Daily Returns, dtype: float64

In [48]:
daily_returns.std()

0.18375538248317655

In [49]:
# np std ile de aliriz
np.std(daily_returns)  # sonuc aynı cikmaz cunku ddof farklı - sample ve population

0.15913682931255693

In [51]:
print(daily_returns.mean())
np.mean(daily_returns)

0.7630461307305667


0.7630461307305667

In [52]:
print(daily_returns.median())
np.median(daily_returns)

0.7783658610406564


0.7783658610406564

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Working with Series Data Structure</p>

<a id="3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

**SOME COMMON ATTRIBUTES** [Official Pandas API Document](https://pandas.pydata.org/docs/reference/api/pandas.Series.html)<br>

**Series.index**	Defines the index of the Series.<br>
**Series.values**   Returns Series as ndarray or ndarray-like depending on the dtype.<br>
**Series.shape**	It returns a tuple of shape of the data.<br>
**Series.dtype**	It returns the data type of the data.<br>
**Series.size**	It returns the size of the data.<br>
**Series.empty**	It returns True if Series object is empty, otherwise returns false.<br>
**Series.hasnans**	It returns True if there are any NaN values, otherwise returns false.<br>
**Series.nbytes**	It returns the number of bytes in the data.<br>
**Series.ndim**	It returns the number of dimensions in the data.<br>

In [13]:
import pandas as pd
games = pd.read_csv("vgsalesGlobale.csv")
games

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.00,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.00,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.00,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.00,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.00,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.00,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.00,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.00,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.00,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


In [56]:
type(games)

pandas.core.frame.DataFrame

In [58]:
print(type(games.Genre))


<class 'pandas.core.series.Series'>


In [59]:
games.Genre

0              Sports
1            Platform
2              Racing
3              Sports
4        Role-Playing
             ...     
16593        Platform
16594         Shooter
16595          Racing
16596          Puzzle
16597        Platform
Name: Genre, Length: 16598, dtype: object

In [60]:
games["Genre"]

0              Sports
1            Platform
2              Racing
3              Sports
4        Role-Playing
             ...     
16593        Platform
16594         Shooter
16595          Racing
16596          Puzzle
16597        Platform
Name: Genre, Length: 16598, dtype: object

In [61]:
games.index

RangeIndex(start=0, stop=16598, step=1)

In [62]:
games.Genre.index

RangeIndex(start=0, stop=16598, step=1)

In [63]:
games.values

array([[1, 'Wii Sports', 'Wii', ..., 3.77, 8.46, 82.74],
       [2, 'Super Mario Bros.', 'NES', ..., 6.81, 0.77, 40.24],
       [3, 'Mario Kart Wii', 'Wii', ..., 3.79, 3.31, 35.82],
       ...,
       [16598, 'SCORE International Baja 1000: The Official Game', 'PS2',
        ..., 0.0, 0.0, 0.01],
       [16599, 'Know How 2', 'DS', ..., 0.0, 0.0, 0.01],
       [16600, 'Spirits & Spells', 'GBA', ..., 0.0, 0.0, 0.01]],
      dtype=object)

In [64]:
games.Genre.values

array(['Sports', 'Platform', 'Racing', ..., 'Racing', 'Puzzle',
       'Platform'], dtype=object)

In [65]:
games.shape

(16598, 11)

In [66]:
games.Genre.shape

(16598,)

In [67]:
len(games)

16598

In [68]:
len(games.Genre)

16598

In [70]:
games.dtypes
# dataframe ozelinde bakiyorsak dtype degil dtypes yazarız.

Rank              int64
Name             object
Platform         object
Year            float64
Genre            object
Publisher        object
NA_Sales        float64
EU_Sales        float64
JP_Sales        float64
Other_Sales     float64
Global_Sales    float64
dtype: object

In [73]:
games.Genre.dtypes

dtype('O')

In [74]:
games.size  # row x columns = total data

182578

In [76]:
games.Genre.size

16598

In [77]:
games.empty
# True if Series/DataFrame is entirely empty (no items), meaning any of the axes are of length 0.

False

In [78]:
games.Genre.empty

False

In [81]:
games.Name.hasnans

False

In [14]:
# df icin hasnans degil isnull kullanılır
games.isnull().any()
# any olmazsa 16bin satır verir, gormek imkansiz.any() ile goruruz. sum() 
# ile de sutun ozelinde missing value sayısı gorulur

Rank            False
Name            False
Platform        False
Year             True
Genre           False
Publisher        True
NA_Sales        False
EU_Sales        False
JP_Sales        False
Other_Sales     False
Global_Sales    False
dtype: bool

In [84]:
games.isnull().sum()

Rank              0
Name              0
Platform          0
Year            271
Genre             0
Publisher        58
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
dtype: int64

In [85]:
games.Genre.nbytes

132784

In [15]:
games.memory_usage()  # df icin memory_usage

Index              128
Rank            132784
Name            132784
Platform        132784
Year            132784
Genre           132784
Publisher       132784
NA_Sales        132784
EU_Sales        132784
JP_Sales        132784
Other_Sales     132784
Global_Sales    132784
dtype: int64

In [88]:
games.Genre.ndim

1

In [89]:
games.ndim

2

[**head(n=5)**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html) function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n].

In [54]:
games.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


**[tail()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.tail.html)** function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.

In [55]:
games.tail()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.0,0.0,0.0,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.0,0.0,0.0,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.0,0.0,0.0,0.0,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.0,0.01,0.0,0.0,0.01
16597,16600,Spirits & Spells,GBA,2003.0,Platform,Wanadoo,0.01,0.0,0.0,0.0,0.01


In [90]:
games.sample(10)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
9715,9717,Colony Wars III: Red Sun,PS,2000.0,Simulation,Psygnosis,0.07,0.05,0.0,0.01,0.12
7850,7852,Jampack: Summer 2003 (RP-T),PS2,2003.0,Misc,Sony Computer Entertainment,0.09,0.07,0.0,0.02,0.19
13155,13157,Guild 01,3DS,2012.0,Action,Level 5,0.0,0.0,0.05,0.0,0.05
6853,6855,Grand Slam Tennis 2,PS3,2012.0,Sports,Electronic Arts,0.09,0.11,0.0,0.04,0.24
1291,1293,Metal Gear Rising: Revengeance,PS3,2013.0,Action,Konami Digital Entertainment,0.45,0.4,0.44,0.18,1.47
15796,15799,NHL 16,X360,2015.0,Sports,Electronic Arts,0.0,0.02,0.0,0.0,0.02
11740,11742,Myst,PS,1995.0,Adventure,Psygnosis,0.0,0.0,0.07,0.0,0.07
2653,2655,James Bond 007: Nightfire,GC,2002.0,Shooter,Electronic Arts,0.6,0.16,0.0,0.02,0.78
9046,9048,ArmA II,PC,2009.0,Shooter,505 Games,0.0,0.12,0.0,0.03,0.14
11318,11320,Bomberman Jetters,GC,2002.0,Puzzle,Hudson Soft,0.06,0.02,0.0,0.0,0.08


In [91]:
games[20:30]

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
20,21,Pokemon Diamond/Pokemon Pearl,DS,2006.0,Role-Playing,Nintendo,6.42,4.52,6.04,1.37,18.36
21,22,Super Mario Land,GB,1989.0,Platform,Nintendo,10.83,2.71,4.18,0.42,18.14
22,23,Super Mario Bros. 3,NES,1988.0,Platform,Nintendo,9.54,3.44,3.84,0.46,17.28
23,24,Grand Theft Auto V,X360,2013.0,Action,Take-Two Interactive,9.63,5.31,0.06,1.38,16.38
24,25,Grand Theft Auto: Vice City,PS2,2002.0,Action,Take-Two Interactive,8.41,5.49,0.47,1.78,16.15
25,26,Pokemon Ruby/Pokemon Sapphire,GBA,2002.0,Role-Playing,Nintendo,6.06,3.9,5.38,0.5,15.85
26,27,Pokemon Black/Pokemon White,DS,2010.0,Role-Playing,Nintendo,5.57,3.28,5.65,0.82,15.32
27,28,Brain Age 2: More Training in Minutes a Day,DS,2005.0,Puzzle,Nintendo,3.44,5.36,5.32,1.18,15.3
28,29,Gran Turismo 3: A-Spec,PS2,2001.0,Racing,Sony Computer Entertainment,6.85,5.09,1.87,1.16,14.98
29,30,Call of Duty: Modern Warfare 3,X360,2011.0,Shooter,Activision,9.03,4.28,0.13,1.32,14.76


[**dtypes**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dtypes.html) attribute returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with **mixed data types** are stored with the **``object dtype``**. See the [User Guide](https://pandas.pydata.org/docs/user_guide/basics.html#basics-dtypes) for more.

In [16]:
df = pd.DataFrame({'float': [1.0, 2.2],
                   'int': [1, 2],
                   'datetime': pd.date_range('12/1/2018', periods=2, freq='D'),
                   'string': ['foo', 2]})

print(df)
print("*"*50)
print(df.dtypes)

                 float  int   datetime string
0                 1.00    1 2018-12-01    foo
1                 2.20    2 2018-12-02      2
**************************************************
float              float64
int                  int64
datetime    datetime64[ns]
string              object
dtype: object


In [17]:
df["string"]

0    foo
1      2
Name: string, dtype: object

In [18]:
print(df["string"][0])
type(df["string"][0])

foo


str

In [96]:
df["string"][1]

2

In [19]:
type(df["string"][1])

int

[**value_counts()**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.value_counts.html) returns a Series containing counts of unique values.

The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

In [154]:
games["Name"].value_counts()

Need for Speed: Most Wanted                12
Ratatouille                                 9
FIFA 14                                     9
LEGO Marvel Super Heroes                    9
Madden NFL 07                               9
                                           ..
Ar tonelico Qoga: Knell of Ar Ciel          1
Galaga: Destination Earth                   1
Nintendo Presents: Crossword Collection     1
TrackMania: Build to Race                   1
Know How 2                                  1
Name: Name, Length: 11493, dtype: int64

With normalize set to True, returns the relative frequency by dividing all values by the sum of values.

In [4]:
games["Publisher"] .value_counts(normalize=True)

Electronic Arts                 0.081681
Activision                      0.058948
Namco Bandai Games              0.056348
Ubisoft                         0.055683
Konami Digital Entertainment    0.050302
                                  ...   
Warp                            0.000060
New                             0.000060
Elite                           0.000060
Evolution Games                 0.000060
UIG Entertainment               0.000060
Name: Publisher, Length: 578, dtype: float64

20% of the computer games sold are action games

**``bin``** parameter inside value_counts() is probably the most underutilized one. value_counts() can be used to bin continuous data into discrete intervals with the help of the bin parameter. This option works only with numerical data. It is similar to the **``pd.cut()``** function. 

Using value_counts() in a plain way sometimes doesn’t convey much information as the output contains a lot of categories for every value of related feature. Instead, let’s group them into 4 bins.

Binning makes it easy to understand the idea being conveyed. We can easily see that most of the gamers in EU paid less than 7.255 for their games. Also, we can see that having four bins serves our purpose since no sales falls into the last bin.

**Let us make some statistical operations with Series.**

In [7]:
import numpy as np

In [20]:
# cretae a df from an array of number 0-19, with columns a, b , c ,d
df7 = pd.DataFrame(np.arange(20).reshape(5,4), dtype = "float", columns = "a b c d".split())
df7

Unnamed: 0,a,b,c,d
0,0.0,1.0,2.0,3.0
1,4.0,5.0,6.0,7.0
2,8.0,9.0,10.0,11.0
3,12.0,13.0,14.0,15.0
4,16.0,17.0,18.0,19.0


In [21]:
df7.iat[1,1]

5.0

In [23]:
df7[["b","d"]].value_counts(normalize=True)

b      d    
 1.00   3.00                   0.20
 5.00   7.00                   0.20
 9.00  11.00                   0.20
13.00  15.00                   0.20
17.00  19.00                   0.20
dtype: float64

In [120]:
ser = df["b"]
print(ser)
ser.value_counts(bins=3 )

0     1
1     5
2     9
3    13
4    17
Name: b, dtype: int64


(0.983, 6.333]     2
(11.667, 17.0]     2
(6.333, 11.667]    1
Name: b, dtype: int64

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Creating a Pandas DataFrames</p>

<a id="4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Pandas DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is generally the most commonly used pandas object. Pandas DataFrame can be created in multiple ways:

  - Creating Pandas DataFrame from lists of lists.
  - Creating DataFrame from dict of narray/lists.
  - Creating Dataframe from list of dicts.
  - Creating DataFrame using zip() function.
  - Creating DataFrame from Dicts of series.
  
[Source](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), [Source](https://www.javatpoint.com/how-to-create-a-dataframes-in-python), [Source](https://towardsdatascience.com/15-ways-to-create-a-pandas-dataframe-754ecc082c17)

**Let's remember how to create a DataFrame in Pandas:**

In [24]:
data = {"name":["Bill", "Tom", "Tim", "John", "Alex", "Vanessa", "Kate"],
        "score":[90, 80, 85, 75, 95, 60, 65],
        "sport":["Wrestling", "Football", "Skiing", "Swimming", "Tennis", "Karete", "Surfing"],
        "sex":["M", "M", "M", "M", "F", "F", "F"]}
data

{'name': ['Bill', 'Tom', 'Tim', 'John', 'Alex', 'Vanessa', 'Kate'],
 'score': [90, 80, 85, 75, 95, 60, 65],
 'sport': ['Wrestling',
  'Football',
  'Skiing',
  'Swimming',
  'Tennis',
  'Karete',
  'Surfing'],
 'sex': ['M', 'M', 'M', 'M', 'F', 'F', 'F']}

As seen, we have created a Dictionary and assigned it to an object named "data".

**pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)** [Official Pandas API](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)

In [25]:
df = pd.DataFrame(data)
df

Unnamed: 0,name,score,sport,sex
0,Bill,90,Wrestling,M
1,Tom,80,Football,M
2,Tim,85,Skiing,M
3,John,75,Swimming,M
4,Alex,95,Tennis,F
5,Vanessa,60,Karete,F
6,Kate,65,Surfing,F


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Working with DataFrames</p>

<a id="5"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

While a `Series` is a single column of data, a `DataFrame` is several columns, one for each variable.

In essence, a `DataFrame` in pandas is analogous to a (highly optimized) Excel spreadsheet.

The two main data structures in pandas both have at least one axis. A **Series** has **one axis**, the index. A **DataFrame** has **two axes**, the index and the columns. It’s useful to note here that in all the DataFrame functions that can be applied to either rows or columns, an axis of 0 refers to the index, an axis of 1 refers to the columns.

Thus, it is a powerful tool for representing and analyzing data that are naturally organized into rows and columns, often with  descriptive indexes for individual rows and individual columns.

Let’s look at an example that reads data from the CSV file named `test_lab.csv`. 

In [26]:
df = pd.read_csv('test_lab.csv')
print(f"\033[1mThe type of test_lab.csv is\033[0m {type(df)}")
df

[1mThe type of test_lab.csv is[0m <class 'pandas.core.frame.DataFrame'>


Unnamed: 0,country,country isocode,year,POP,XRAT,tcgdp,cc,cg
0,Argentina,ARG,2000,37335.65,1.0,295072.22,75.72,5.58
1,Australia,AUS,2000,19053.19,1.72,541804.65,67.76,6.72
2,India,IND,2000,1006300.3,44.94,1728144.37,64.58,14.07
3,Israel,ISR,2000,6114.57,4.08,129253.89,64.44,10.27
4,Malawi,MWI,2000,11801.5,59.54,5026.22,74.71,11.66
5,South Africa,ZAF,2000,45064.1,6.94,227242.37,72.72,5.73
6,United States,USA,2000,282171.96,1.0,9898700.0,72.35,6.03
7,Uruguay,URY,2000,3219.79,12.1,25255.96,78.98,5.11


[**info()**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html) prints a concise summary of a DataFrame. This method prints information about a DataFrame including the number of columns, column labels, column data types, memory usage, range index, and the number of cells in each column (non-null values).

In [27]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   country          8 non-null      object 
 1   country isocode  8 non-null      object 
 2   year             8 non-null      int64  
 3   POP              8 non-null      float64
 4   XRAT             8 non-null      float64
 5   tcgdp            8 non-null      float64
 6   cc               8 non-null      float64
 7   cg               8 non-null      float64
dtypes: float64(5), int64(1), object(2)
memory usage: 640.0+ bytes


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Indexing, Slicing & Selection</p>

<a id="6"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

As stated and implemented by examples above, a Series is very similar to a NumPy array. [**What differentiates the NumPy array from a Series**](https://www.educba.com/pandas-vs-numpy/) is that **a Series can have axis labels**, meaning it can be indexed by a label, instead of just a number location. In otherwords, the essential difference is **the presence of the index**: while the **``Numpy Array``** has an implicitly defined integer index used to access the values, the **``Pandas Series``** has an explicitly defined index associated with the values (labels). Moreover, it doesn NOT need to hold numeric data, it can hold any arbitrary Python Object [Source](https://rpubs.com/pjozefek/659184).

So, the key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look up of information.

The axis labeling information in pandas objects serves many purposes:

- Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display.
- Enables automatic and explicit data alignment.
- Allows intuitive getting and setting of subsets of the data set.

In this part of our session, we will focus on the final point: namely, how to slice, dice, and generally get and set subsets of pandas objects. The primary focus will be on Series and DataFrame as they have received more development attention in this area. For more information, please visit [pandas-docs.github.io](https://pandas-docs.github.io/pandas-docs-travis/user_guide/indexing.html)

The most robust and consistent way of slicing ranges along arbitrary axes is described in the [Selection by Position](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-integer) section detailing the .iloc method. First, let us look at the semantics of slicing using the **[ ]** operator.

[Some More Examples](https://sparkbyexamples.com/pandas/how-to-slice-columns-in-pandas-dataframe/#:~:text=By%20using%20pandas.,columns%2C%20the%20syntax%20is%20df.)

In [165]:
df[["country", "POP"]]

Unnamed: 0,country,POP
0,Argentina,37335.65
1,Australia,19053.19
2,India,1006300.3
3,Israel,6114.57
4,Malawi,11801.5
5,South Africa,45064.1
6,United States,282171.96
7,Uruguay,3219.79


### [.loc[ ]](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) → allows us to select data using **labels** (names) of rows (index) & columns

### [.iloc[ ]](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html) → allows us to select data using **index numbers** of rows (index) & columns. it's like classical indexing logic

Let us first remember our df: 

In [197]:
df

Unnamed: 0,country,country isocode,year,POP,XRAT,tcgdp,cc,cg
0,Argentina,ARG,2000,37335.65,1.0,295072.22,75.72,5.58
1,Australia,AUS,2000,19053.19,1.72,541804.65,67.76,6.72
2,India,IND,2000,1006300.3,44.94,1728144.37,64.58,14.07
3,Israel,ISR,2000,6114.57,4.08,129253.89,64.44,10.27
4,Malawi,MWI,2000,11801.5,59.54,5026.22,74.71,11.66
5,South Africa,ZAF,2000,45064.1,6.94,227242.37,72.72,5.73
6,United States,USA,2000,282171.96,1.0,9898700.0,72.35,6.03
7,Uruguay,URY,2000,3219.79,12.1,25255.96,78.98,5.11


In [225]:
df.loc[:,["year","cc"]] # row,column

Unnamed: 0,year,cc
0,2000,75.72
1,2000,67.76
2,2000,64.58
3,2000,64.44
4,2000,74.71
5,2000,72.72
6,2000,72.35
7,2000,78.98


### **``QUESTION:``** **What happened? Why was South Africa included when** **``loc``** **used?**

The first thing remembered is that **``.iloc[ ]``** is **exclusive** while **``.loc[ ]``** is **inclusive.**

## 1) **``Using Pandas.DataFrame.loc[]``** (By label)


**1.1 – Slicing Columns by Names or Labels**

By using **``pandas.DataFrame.loc[ ]``** you can slice columns by names or labels. To slice the columns, the syntax is **``df.loc[:, start:stop:step]``**; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction

**1.2 – Slicing DataFrame Columns by Labels**

To slice DataFrame columns by labels or names, all you need is to provide the multiple labels you wanted to slice as a list. Here we use the list of labels instead of the start:stop:step approach.

**1.3 – Slicing DataFrame Columns by Range**

When you wanted to slice a DataFrame by the range of columns, provide start and stop column names.

  - By not providing a start column, loc[] selects from the beginning.
  - By not providing stop, loc[] selects all columns from the start label.
  - Providing both start and stop, selects all columns in between.

In [13]:
# Slicing all columns between "country" an 'POP' columns

df.loc[:, "country":"POP"]

Unnamed: 0,country,country isocode,year,POP
0,Argentina,ARG,2000,37335.653
1,Australia,AUS,2000,19053.186
2,India,IND,2000,1006300.297
3,Israel,ISR,2000,6114.57
4,Malawi,MWI,2000,11801.505
5,South Africa,ZAF,2000,45064.098
6,United States,USA,2000,282171.957
7,Uruguay,URY,2000,3219.793


In [14]:
# Slicing by start from 'country isocode' column

df.loc[:, "country isocode":]

Unnamed: 0,country isocode,year,POP,XRAT,tcgdp,cc,cg
0,ARG,2000,37335.653,0.9995,295072.2,75.716805,5.578804
1,AUS,2000,19053.186,1.72483,541804.7,67.759026,6.720098
2,IND,2000,1006300.297,44.9416,1728144.0,64.575551,14.072206
3,ISR,2000,6114.57,4.07733,129253.9,64.436451,10.266688
4,MWI,2000,11801.505,59.543808,5026.222,74.707624,11.658954
5,ZAF,2000,45064.098,6.93983,227242.4,72.71871,5.726546
6,USA,2000,282171.957,1.0,9898700.0,72.347054,6.032454
7,URY,2000,3219.793,12.099592,25255.96,78.97874,5.108068


In [15]:
# Slicing by start from the beginning and end at 'XRAT' column

df.loc[:,:"XRAT"]

Unnamed: 0,country,country isocode,year,POP,XRAT
0,Argentina,ARG,2000,37335.653,0.9995
1,Australia,AUS,2000,19053.186,1.72483
2,India,IND,2000,1006300.297,44.9416
3,Israel,ISR,2000,6114.57,4.07733
4,Malawi,MWI,2000,11801.505,59.543808
5,South Africa,ZAF,2000,45064.098,6.93983
6,United States,USA,2000,282171.957,1.0
7,Uruguay,URY,2000,3219.793,12.099592


**1.4 – Slicing Certain Selective Columns in pandas**

Sometimes you may want to select random certain columns from pandas DataFrame, you can do this by passing selected column names/labels as a list.

**1.5 – Selecting Every Alternate Column**

Using **``loc[ ]``**, you can also slice columns by selecting every other column from pandas DataFrame.

## 2) **``Using Pandas.DataFrame.iloc[]``** (By position)

By using **``pandas.DataFrame.iloc[ ]``** you can slice DataFrame by column **position/index**. Always remember that index starts from 0. You can use **``pandas.DataFrame.iloc[ ]``** with the syntax **``[:, start:stop:step]``**; where **start** indicates the index of the first column to take, **stop** indicates the index of the last column to take, and **step** indicates the number of indices to advance after each extraction. Or, use the syntax: **``[:, [indices]]``** with indices as a list of column indices to take.

**2.1 – Slicing Columns by Index Position**

In [19]:
# Let us first remember our df

df

Unnamed: 0,country,country isocode,year,POP,XRAT,tcgdp,cc,cg
0,Argentina,ARG,2000,37335.65,1.0,295072.22,75.72,5.58
1,Australia,AUS,2000,19053.19,1.72,541804.65,67.76,6.72
2,India,IND,2000,1006300.3,44.94,1728144.37,64.58,14.07
3,Israel,ISR,2000,6114.57,4.08,129253.89,64.44,10.27
4,Malawi,MWI,2000,11801.5,59.54,5026.22,74.71,11.66
5,South Africa,ZAF,2000,45064.1,6.94,227242.37,72.72,5.73
6,United States,USA,2000,282171.96,1.0,9898700.0,72.35,6.03
7,Uruguay,URY,2000,3219.79,12.1,25255.96,78.98,5.11


We are going to use columns by their index positions, and retrieve slices of DataFrame. Below example retrieves "country isocode", "POP" and "XRAT" slices of columns at the DataFrame.

In [238]:
# Slicing by selected column position
df.iloc[:, 0:5:2]


Unnamed: 0,country,year,XRAT
0,Argentina,2000,1.0
1,Australia,2000,1.72
2,India,2000,44.94
3,Israel,2000,4.08
4,Malawi,2000,59.54
5,South Africa,2000,6.94
6,United States,2000,1.0
7,Uruguay,2000,12.1


**2.2 Column Slices by Position Range**

Like slices by column labels, you can also slice a DataFrame by a range of positions.

In [239]:
# Slicing between indexes 1 (inclusive) and 4 (exclusive)

df.iloc[1:4]

Unnamed: 0,country,country isocode,year,POP,XRAT,tcgdp,cc,cg
1,Australia,AUS,2000,19053.19,1.72,541804.65,67.76,6.72
2,India,IND,2000,1006300.3,44.94,1728144.37,64.58,14.07
3,Israel,ISR,2000,6114.57,4.08,129253.89,64.44,10.27


In [240]:
# Slicing from the 3rd index (inclusive) to end

df.iloc[3:]

Unnamed: 0,country,country isocode,year,POP,XRAT,tcgdp,cc,cg
3,Israel,ISR,2000,6114.57,4.08,129253.89,64.44,10.27
4,Malawi,MWI,2000,11801.5,59.54,5026.22,74.71,11.66
5,South Africa,ZAF,2000,45064.1,6.94,227242.37,72.72,5.73
6,United States,USA,2000,282171.96,1.0,9898700.0,72.35,6.03
7,Uruguay,URY,2000,3219.79,12.1,25255.96,78.98,5.11


In [16]:
# Slicing from the beginning to the 2nd index (exclusive)
df.iloc[:2]


Unnamed: 0,country,country isocode,year,POP,XRAT,tcgdp,cc,cg
0,Argentina,ARG,2000,37335.653,0.9995,295072.21869,75.716805,5.578804
1,Australia,AUS,2000,19053.186,1.72483,541804.6521,67.759026,6.720098


To get the **last column** use **``df.iloc[:, -1:]``** and to get just **first column** **``df.iloc[:, :1]``**

In [250]:
df.iloc[:,:1]

Unnamed: 0,country
0,Argentina
1,Australia
2,India
3,Israel
4,Malawi
5,South Africa
6,United States
7,Uruguay


## BONUS

In [20]:
df

Unnamed: 0,country,country isocode,year,POP,XRAT,tcgdp,cc,cg
0,Argentina,ARG,2000,37335.65,1.0,295072.22,75.72,5.58
1,Australia,AUS,2000,19053.19,1.72,541804.65,67.76,6.72
2,India,IND,2000,1006300.3,44.94,1728144.37,64.58,14.07
3,Israel,ISR,2000,6114.57,4.08,129253.89,64.44,10.27
4,Malawi,MWI,2000,11801.5,59.54,5026.22,74.71,11.66
5,South Africa,ZAF,2000,45064.1,6.94,227242.37,72.72,5.73
6,United States,USA,2000,282171.96,1.0,9898700.0,72.35,6.03
7,Uruguay,URY,2000,3219.79,12.1,25255.96,78.98,5.11


**[Pandas loc vs. iloc vs. at vs. iat?](https://stackoverflow.com/questions/28757389/pandas-loc-vs-iloc-vs-at-vs-iat)**

**``loc``** : only work on index<br>
**``iloc``** : work on position<br>
**``at``** : get scalar values. It's a very fast loc<br>
**``iat``** : Get scalar values. It's a very fast iloc<br>

**``at``** and **``iat``** are meant to access a scalar, that is, a single element in the dataframe, while **``loc``** and **``iloc``** are ments to access several elements at the same time, potentially to perform vectorized operations.

In [242]:
# By position

df.iat[1,1]

'AUS'

In [241]:
# By label
games.at[21, "Year"]


1989.0

**``iat[]``** and **``at[]``** gives only a single value output (working with scalar only), so very fast, while **``iloc[]``** and **``loc[]``** can give multiple row output. [Source](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html)