In [1]:
# importing pandas
import pandas as pd

# csv file location
url = 'https://dq-content.s3.amazonaws.com/291/f500.csv'

# making data frame from csv file
data = pd.read_csv(url, index_col = 'company')

# Indexing and Selecting Data

Indexing in Pandas means simply selecting particular rows and columns of data from a DataFrame.

Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns.

Indexing can also be known as subset selection.

Some common methods and attributes for viewing and selecting data in a Pandas DataFrame include:

* `head(n=5)` : Displays the first n rows of a DataFrame.
* `tail(n=5)` : Displays the last n rows of a DataFrame.
* `loc` : Accesses a group of rows and columns by labels or a boolean array.
* `iloc` : Accesses a group of rows and columns by integer indices.
* `columns` : Lists the column labels of the DataFrame.
* `['A']` : Selects the column named `A` from the DataFrame.

**Example**



In [None]:
# selecting data
data_selection = data.head()

data_selection

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [None]:
# selecting data
data_selection = data.tail()

data_selection

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337
New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507
Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006
AutoNation,500,21609,3.6,430.5,10060,-2.7,Michael J. Jackson,Specialty Retailers,Retailing,0,USA,"Fort Lauderdale, FL",http://www.autonation.com,12,26000,2310


# Indexing a DataFrame using indexing operator `[ ]`

Indexing operator refers to the square brackets `[ ]` following an object.

In order to select a single column, we simply put the name of the column in-between the brackets

Note: use a column name other than the index column.

**Example**

In [None]:
# selecting columns by indexing operator
data_selection = data['rank']

data_selection

Unnamed: 0_level_0,rank
company,Unnamed: 1_level_1
Walmart,1
State Grid,2
Sinopec Group,3
China National Petroleum,4
Toyota Motor,5
...,...
Teva Pharmaceutical Industries,496
New China Life Insurance,497
Wm. Morrison Supermarkets,498
TUI,499


Note that the instruction (`data['rank']`) will return a `Series` and not a `DataFrame`.

In [None]:
print(type(data_selection))

<class 'pandas.core.series.Series'>


To convert it to a `DataFrame`, we can enclose the single column inside the `[]` operator. The syntax will be as follows:

In [None]:
data_selection = data[['rank']]

print(type(data_selection))

<class 'pandas.core.frame.DataFrame'>


## Select multiple columns

In [None]:
# selecting columns by indexing operator
data_selection = data[['rank', 'country']]

data_selection

Unnamed: 0_level_0,rank,country
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,1,USA
State Grid,2,China
Sinopec Group,3,China
China National Petroleum,4,China
Toyota Motor,5,Japan
...,...,...
Teva Pharmaceutical Industries,496,Israel
New China Life Insurance,497,China
Wm. Morrison Supermarkets,498,Britain
TUI,499,Germany


# Indexing a DataFrame using indexer attributes

Pandas provides some special indexer attributes that explicitly expose certain indexing schemes.

These are not functional methods, but attributes that expose a particular slicing interface to the data in the `Series`.

## The **`loc`** attribute

The **`loc`** attribute selects data by the label of the rows and columns.

This attribute allows indexing and slicing that always references the explicit index.

It selects data in a different way than just the indexing operator.

It can select subsets of rows or columns.

It can also simultaneously select subsets of rows and columns.

**Example:** select a single row with lable Toyota Motor:

In [None]:
# select a single row
data_selection = data.loc['Toyota Motor']

# show dataframe
data_selection

Unnamed: 0,Toyota Motor
rank,5
revenues,254694
revenue_change,7.7
profits,16899.3
assets,437575
profit_change,-12.3
ceo,Akio Toyoda
industry,Motor Vehicles and Parts
sector,Motor Vehicles & Parts
previous_rank,8


**Example:** select rows with labels Toyota Motor and Volkswagen:

In [2]:
# select rows with labels Toyota Motor and Volkswagen
data_selection = data.loc[['Toyota Motor', 'Volkswagen']]

# show dataframe
data_selection

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
Volkswagen,6,240264,1.5,5937.3,432116,,Matthias Muller,Motor Vehicles and Parts,Motor Vehicles & Parts,7,Germany,"Wolfsburg, Germany",http://www.volkswagen.com,23,626715,97753


**Example:** select rows with labels Toyota Motor and Volkswagen, and column with label revenues:

In [3]:
# select rows with labels Toyota Motor and Volkswagen, and column with label revenues
data_selection = data.loc[['Toyota Motor', 'Volkswagen'], 'revenues']

# show dataframe
data_selection

Unnamed: 0_level_0,revenues
company,Unnamed: 1_level_1
Toyota Motor,254694
Volkswagen,240264


**Example:** select the columns revenues and profits for the company Toyota Motor:

In [None]:
# select columns revenues and profits for the company Toyota Motor
data_selection = data.loc['Toyota Motor', ['revenues', 'profits']]

# show dataframe
data_selection

Unnamed: 0,Toyota Motor
revenues,254694.0
profits,16899.3


**Example:** select the columns revenues and profits for the companies Toyota Motor and Volkswagen:

In [None]:
# select columns revenues and profits for the companies Toyota Motor and Volkswagen
row_labels = ['Toyota Motor', 'Volkswagen']
columns_labels = ['revenues', 'profits']
data_selection = data.loc[row_labels, columns_labels]

# show dataframe
data_selection

Unnamed: 0_level_0,revenues,profits
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Toyota Motor,254694,16899.3
Volkswagen,240264,5937.3


# The **`iloc`** attribute

The **`iloc`** attribute allows indexing and slicing that always references the implicit Python-style index.

This attribute is used to select rows from a DataFrame.

In order to select a row using `iloc`, we need to specify the position of this row.

In [None]:
# selecting row by iloc method
data_selection = data.iloc[0]

data_selection

Unnamed: 0,Walmart
rank,1
revenues,485873
revenue_change,0.8
profits,13643.0
assets,198825
profit_change,-7.2
ceo,C. Douglas McMillon
industry,General Merchandisers
sector,Retailing
previous_rank,1
