# 02. Setting a meaningful index

### Objectives
* Extract the components of a DataFrame and verify their type
* Know that the default index is a **`RangeIndex`**
* Select values from the index like a list
* Understand what makes a meaningful index
* Use the `index_col` parameter of `read_csv` to set an index on read
* Use the `set_index` method to set an index after read

# Extracting the components of a DataFrame - The Index, Columns, and Data
The DataFrame consists of three components - the index, columns, and data. It is possible to extract each component and assign them into their own variable.

Let's read in a small dataset to show how this is done. Notice that when we read in the data, we choose the first column to be the index with the **`index_col`** parameter. More on this later.

In [1]:
import pandas as pd

In [32]:
pd.read_csv? # or shift + tab + tab to show documentation

[0;31mSignature:[0m [0mpd[0m[0;34m.[0m[0mread_csv[0m[0;34m([0m[0mfilepath_or_buffer[0m[0;34m,[0m [0msep[0m[0;34m=[0m[0;34m','[0m[0;34m,[0m [0mdelimiter[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mheader[0m[0;34m=[0m[0;34m'infer'[0m[0;34m,[0m [0mnames[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mindex_col[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0musecols[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0msqueeze[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m [0mprefix[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mmangle_dupe_cols[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m [0mdtype[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mengine[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mconverters[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mtrue_values[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mfalse_values[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mskipinitialspace[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m [0mskiprows[0m[0;34m=[0m[0;32mNo

In [2]:
df = pd.read_csv('../data/sample_data.csv', index_col=0) # shift + tab + tab to show documentation
df

Unnamed: 0,state,color,food,age,height,score
Jane,NY,blue,Steak,30,165,4.6
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5
Cornelia,TX,red,Beans,69,150,2.2


### Use the attributes `index`, `columns`, and `values`
The index, columns, and data are each separate objects. Let's assign them into their own variables.

In [3]:
index = df.index
columns = df.columns
data = df.values

### View these objects
Let's output each of these objects:

In [4]:
index

Index(['Jane', 'Niko', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'], dtype='object')

In [5]:
columns

Index(['state', 'color', 'food', 'age', 'height', 'score'], dtype='object')

In [6]:
data

array([['NY', 'blue', 'Steak', 30, 165, 4.6],
       ['TX', 'green', 'Lamb', 2, 70, 8.3],
       ['FL', 'red', 'Mango', 12, 120, 9.0],
       ['AL', 'white', 'Apple', 4, 80, 3.3],
       ['AK', 'gray', 'Cheese', 32, 180, 1.8],
       ['TX', 'black', 'Melon', 33, 172, 9.5],
       ['TX', 'red', 'Beans', 69, 150, 2.2]], dtype=object)

### What are these objects?
The output of these objects looks correct but we don't know the exact type of each one. Let's find out:

In [7]:
type(index)

pandas.core.indexes.base.Index

In [8]:
type(columns)

pandas.core.indexes.base.Index

In [9]:
type(data)

numpy.ndarray

### Pandas `Index` Type
Pandas has a special type of object called an **`Index`**. This object is similar to a list or a one dimensional array. You can think of it as a sequence of labels for either the rows or the columns. You will not deal with this object directly much at all, so there's not much of a need to know more about it for now.

Notice that the both the index and columns are of the same type.

### NumPy's `ndarray`
The data is stored as a NumPy **`ndarray`** (which stands for n-dimensional array). It is this array that is doing the bulk of the workload in Pandas.

### Operating with DataFrame as a Whole
You will rarely need to operate with these components directly and instead be working with the entire DataFrame.

# Extracting the components of a Series - The Index and Data
Similarly we can extract the two Series components - the index and the data.

Let's first select a single column as a Series:

In [33]:
color = df['color'] # with square brackets to select some objects from a df. 
# list requires int only, lists don't have index 
color # it's a series

Jane          blue
Niko         green
Aaron          red
Penelope     white
Dean          gray
Christina    black
Cornelia       red
Name: color, dtype: object

In [11]:
color.index

Index(['Jane', 'Niko', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'], dtype='object')

In [12]:
color.values

array(['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
      dtype=object)

# More on the index
The index is an important (and sometimes confusing) part of both the Series and DataFrame. It provides us with a label for each row. It is always **bold** and is NOT a column of data. It is a separate component of our DataFrame.

# The default index
If you don't specify an index when first reading in a DataFrame, then Pandas will create one for you as the integers from 0 to n-1. An index always exists even if it just appears to be the row number.

Let's read in the movie dataset without setting an index.

In [13]:
movie = pd.read_csv('../data/movie.csv')
movie.head() 
# index are in bold, not data, just labels, could set index as descriptive names or numbers. 

Unnamed: 0,title,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
0,Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,...,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
1,Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
2,Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,...,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
3,The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,...,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
4,Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,...,,,Documentary,,8,,,,,7.1


## Notice the integers in the index
These integers are the default index labels for each of the rows. Let's examine the underlying index object.

In [14]:
idx = movie.index
idx

RangeIndex(start=0, stop=4916, step=1)

In [15]:
type(idx)

pandas.core.indexes.range.RangeIndex

### A RangeIndex
Pandas has many different types of index objects. A **`RangeIndex`** is similar to a Python **`range`** object. The integers are not actually stored in memory and only accessed when requested.

# Select a value from the index
The index is a complex object of its own and has many features. We will not cover it in-depth because it is used infrequently. That said, the minimum we should know about an index is how to select values from it. We use **integer location**, just like it were a Python list, to make selections.

Let's select a single value from it.

In [16]:
idx[5]

5

## A NumPy array underlies the index
To get the underlying NumPy array, use the `values` attribute. This is similar to how we get the underlying data from a Pandas DataFrame.

In [17]:
idx.values

array([   0,    1,    2, ..., 4913, 4914, 4915])

If you don't asign the index as a variable, you can retreive the array from the DataFrame by chaining the attributes together like this:

In [18]:
movie.index.values

array([   0,    1,    2, ..., 4913, 4914, 4915])

# Setting an index on read
Pandas allows us to use one of the columns as the index when reading in the data.

### Setting an index when reading in the data with `read_csv`
The **`read_csv`** function gives us dozens of parameters that allow us to read in a wide variety of csv files. The **`index_col`** parameter may be used to select a particular column as the index. We can either use the integer location of the column or its name.

### Re-read the movie dataset with the movie title as the index
There's a column in the movie dataset named **`title`**. Let's re-read in the data with it as the index.

In [19]:
movie = pd.read_csv('../data/movie.csv', index_col='title')
movie.head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,...,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,...,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,23000.0,...,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,12.0,...,,,Documentary,,8,,,,,7.1


Notice that now the titles of each movie serve as the label for each row. Also notice that the word **title** appears directly above the index. This is a bit confusing - **title** is NOT a column name, but rather the **name of the index**.

## Extract the new index and output its type
We now have a 'plain' **`Index`** object.

In [20]:
idx2 = movie.index
idx2

Index(['Avatar', 'Pirates of the Caribbean: At World's End', 'Spectre',
       'The Dark Knight Rises', 'Star Wars: Episode VII - The Force Awakens',
       'John Carter', 'Spider-Man 3', 'Tangled', 'Avengers: Age of Ultron',
       'Harry Potter and the Half-Blood Prince',
       ...
       'Primer', 'Cavite', 'El Mariachi', 'The Mongol King', 'Newlyweds',
       'Signed Sealed Delivered', 'The Following', 'A Plague So Pleasant',
       'Shanghai Calling', 'My Date with Drew'],
      dtype='object', name='title', length=4916)

In [21]:
type(idx2)

pandas.core.indexes.base.Index

## Selecting values from this index
Just like we did with our **`RangeIndex`** we use the selection operator, the brackets, to select a single index value.

In [22]:
idx2[105]

'Poseidon'

### Selection with slice notation
As with Python lists, you can select a range of values using slice notation. Remember the three components - **start:stop:step**.

In [23]:
idx2[100:120:4]

Index(['The Fast and the Furious', 'The Sorcerer's Apprentice', 'Warcraft',
       'Transformers', 'Hancock'],
      dtype='object', name='title')

### Selection with a list of integers
You can select multiple individual values with a list of integers. 

In [24]:
nums = [1000, 453, 713, 2999]
idx2[nums]

Index(['The Life Aquatic with Steve Zissou', 'Daredevil', 'Daddy Day Care',
       'The Ladies Man'],
      dtype='object', name='title')

# Choosing a good index
First, it's **never** necessary to choose an index for your DataFrames. You can complete all of your analysis with just the default **`RangeIndex`**. Setting a column to be an index can help identify the rows such as we did with the movie titles above.

I suggest choosing columns that are both **unique** and **descriptive**. Although uniqueness is not enforced, it does help when needing to identify one particular row.

## Setting the index after read with the `set_index` method
It is possible to set the index after reading the data with the **`set_index`** method. Pass it the name of the column you would like to use as the index. 

In [25]:
movie = pd.read_csv('../data/movie.csv')  # read in data without setting index
movie = movie.set_index('title')
movie.head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,...,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,...,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,23000.0,...,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,12.0,...,,,Documentary,,8,,,,,7.1


## Reassigned `movie` variable

Notice above that we re-assigned the variable name `movie` to the result of the `set_index` command. This is because `set_index` makes a entire new copy of the data. It does not change the original DataFrame. We say the operation **does NOT happen in-place**.

# Changing Display Options
Pandas gives you the ability to change how the output on your screen is displayed. For instance, the default number of columns displayed for a DataFrame is 20, meaning that if your DataFrame has more than 20 columns then only the first and last 10 columns will be shown on the screen.

The display options all come after `pd.options.display.<option_name>` where **`<option_name>`** is one of the following:

In [26]:
dir(pd.options.display)

['chop_threshold',
 'colheader_justify',
 'column_space',
 'date_dayfirst',
 'date_yearfirst',
 'encoding',
 'expand_frame_repr',
 'float_format',
 'height',
 'html',
 'large_repr',
 'latex',
 'line_width',
 'max_categories',
 'max_columns',
 'max_colwidth',
 'max_info_columns',
 'max_info_rows',
 'max_rows',
 'max_seq_items',
 'memory_usage',
 'multi_sparse',
 'notebook_repr_html',
 'pprint_nest_depth',
 'precision',
 'show_dimensions',
 'unicode',
 'width']

In [35]:
# Change display options 
pd.options.display.max_columns = 100 
# tab to show all the options, max_columns is the most commonly used

## Getting current option values
Let's retrieve some of the values of the above attributes through the dot notation.

In [27]:
pd.options.display.max_columns

20

In [28]:
pd.options.display.max_rows

60

In [29]:
pd.options.display.max_colwidth

50

## Use an assignment statement to set a new display value
To set a new option value, assign a new value to it like you do any other variable. Let's change the maximum number of columns to 40 so that we can see every column in the movie dataset.

In [36]:
pd.options.display.max_columns = 40

In [37]:
movie.head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,Wes Studi,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,Jack Davenport,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,Stephanie Sigman,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,23000.0,Joseph Gordon-Levitt,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,12.0,,,,Documentary,,8,,,,,7.1


# Practice the following on your own
* Read in the movie dataset and set the index to be something other than movie title. Are there any other good columns to use as an index?
* Can you use `set_index` to set the index and keep the column as part of the data?
* Practice making selections from the index object
* Use an integer instead of the column name for **`index_col`** when reading in the data using **`read_csv`**. What does it do?
* Retreive the values for many more display options. Change some of them. Look up what the possible values are in the documentation. You can use `pd.reset_option('all')` to reset the options to their default values.

In [44]:
# your code here
newmovie = movie.rename(columns = {'year': 'new_year', 'color' : 'new_color'})

In [45]:
newmovie

Unnamed: 0_level_0,new_year,new_color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,Wes Studi,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,Jack Davenport,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,Stephanie Sigman,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,23000.0,Joseph Gordon-Levitt,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,12.0,,,,Documentary,,8,,,,,7.1
John Carter,2012.0,Color,PG-13,132.0,Andrew Stanton,475.0,Daryl Sabara,640.0,Samantha Morton,632.0,Polly Walker,530.0,73058679.0,Action|Adventure|Sci-Fi,462.0,212204,alien|american civil war|male nipple|mars|prin...,English,USA,263700000.0,6.6
Spider-Man 3,2007.0,Color,PG-13,156.0,Sam Raimi,0.0,J.K. Simmons,24000.0,James Franco,11000.0,Kirsten Dunst,4000.0,336530303.0,Action|Adventure|Romance,392.0,383056,sandman|spider man|symbiote|venom|villain,English,USA,258000000.0,6.2
Tangled,2010.0,Color,PG,100.0,Nathan Greno,15.0,Brad Garrett,799.0,Donna Murphy,553.0,M.C. Gainey,284.0,200807262.0,Adventure|Animation|Comedy|Family|Fantasy|Musi...,324.0,294810,17th century|based on fairy tale|disney|flower...,English,USA,260000000.0,7.8
Avengers: Age of Ultron,2015.0,Color,PG-13,141.0,Joss Whedon,0.0,Chris Hemsworth,26000.0,Robert Downey Jr.,21000.0,Scarlett Johansson,19000.0,458991599.0,Action|Adventure|Sci-Fi,635.0,462669,artificial intelligence|based on comic book|ca...,English,USA,250000000.0,7.5
Harry Potter and the Half-Blood Prince,2009.0,Color,PG,153.0,David Yates,282.0,Alan Rickman,25000.0,Daniel Radcliffe,11000.0,Rupert Grint,10000.0,301956980.0,Adventure|Family|Fantasy|Mystery,375.0,321795,blood|book|love|potion|professor,English,UK,250000000.0,7.5
