## 5. How to sort and iterate values in DataFrame and Series?

Sorting basically refers to displaying data in an ordered format, ascending or descending. Iterating through values here means accessing each row of a Series or DataFrame one after another with help of a loop.

In [1]:
import pandas as pd

We will use a movie dataset that contains top-rated movies from the Internet Movie Database (IMDB). Each row represents a movie, and the columns represent its features.



In [2]:
movies = pd.read_csv("http://bit.ly/imdbratings")
movies.head()

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L...."


### 5.1. Sorting values in a Series and DataFrame

#### 5.1.1. Sorting values in Series

If we want to sort values for the “title” series, we use “.sort_values( )” as a series method. Understand that by default, the strings will be sorted in alphabetical order, but numbers come first. We can also sort in descending order by using “ascending=False” as a parameter. Also, understand that underlying data has not changed as the method requires “inplace=True” as a parameter.

In [3]:
movies.title.sort_values()
#or movies["title"].sort_values()

542     (500) Days of Summer
5               12 Angry Men
201         12 Years a Slave
698                127 Hours
110    2001: A Space Odyssey
               ...          
955         Zero Dark Thirty
677                   Zodiac
615               Zombieland
526                     Zulu
864                    [Rec]
Name: title, Length: 979, dtype: object

In [4]:
movies.title.sort_values(ascending=False)

864                    [Rec]
526                     Zulu
615               Zombieland
677                   Zodiac
955         Zero Dark Thirty
               ...          
110    2001: A Space Odyssey
698                127 Hours
201         12 Years a Slave
5               12 Angry Men
542     (500) Days of Summer
Name: title, Length: 979, dtype: object

#### 5.1.2. Sorting values in DataFrame

If we want to sort a DataFrame “movies” based on a series, we use “.sort_values( )” as DataFrame method, and then pass it the name of the series. In short, if we only want to sort the series we use “.sort_values( )” as a series method, but if we want to look at other features along with the sorted series, we use “.sort_values( )” as DataFrame method.

In [5]:
movies.sort_values("title").head()

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
542,7.8,(500) Days of Summer,PG-13,Comedy,95,"[u'Zooey Deschanel', u'Joseph Gordon-Levitt', ..."
5,8.9,12 Angry Men,NOT RATED,Drama,96,"[u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals..."
201,8.1,12 Years a Slave,R,Biography,134,"[u'Chiwetel Ejiofor', u'Michael Kenneth Willia..."
698,7.6,127 Hours,R,Adventure,94,"[u'James Franco', u'Amber Tamblyn', u'Kate Mara']"
110,8.3,2001: A Space Odyssey,G,Mystery,160,"[u'Keir Dullea', u'Gary Lockwood', u'William S..."


In [6]:
movies.sort_values("duration", ascending=False).head()

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
476,7.8,Hamlet,PG-13,Drama,242,"[u'Kenneth Branagh', u'Julie Christie', u'Dere..."
157,8.2,Gone with the Wind,G,Drama,238,"[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit..."
78,8.4,Once Upon a Time in America,R,Crime,229,"[u'Robert De Niro', u'James Woods', u'Elizabet..."
142,8.3,Lagaan: Once Upon a Time in India,PG,Adventure,224,"[u'Aamir Khan', u'Gracy Singh', u'Rachel Shell..."
445,7.9,The Ten Commandments,APPROVED,Adventure,220,"[u'Charlton Heston', u'Yul Brynner', u'Anne Ba..."


We can also sort a Dataframe based on two columns by using “.sort_values( )” a list of column names. If we pass two series, the panda will first sort for the first one and then within the same values, it will sort for the second.



In [7]:
movies.sort_values(["content_rating", "duration"])

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
713,7.6,The Jungle Book,APPROVED,Animation,78,"[u'Phil Harris', u'Sebastian Cabot', u'Louis P..."
513,7.8,Invasion of the Body Snatchers,APPROVED,Horror,80,"[u'Kevin McCarthy', u'Dana Wynter', u'Larry Ga..."
272,8.1,The Killing,APPROVED,Crime,85,"[u'Sterling Hayden', u'Coleen Gray', u'Vince E..."
703,7.6,Dracula,APPROVED,Horror,85,"[u'Bela Lugosi', u'Helen Chandler', u'David Ma..."
612,7.7,A Hard Day's Night,APPROVED,Comedy,87,"[u'John Lennon', u'Paul McCartney', u'George H..."
...,...,...,...,...,...,...
387,8.0,Midnight Cowboy,X,Drama,113,"[u'Dustin Hoffman', u'Jon Voight', u'Sylvia Mi..."
86,8.4,A Clockwork Orange,X,Crime,136,"[u'Malcolm McDowell', u'Patrick Magee', u'Mich..."
187,8.2,Butch Cassidy and the Sundance Kid,,Biography,110,"[u'Paul Newman', u'Robert Redford', u'Katharin..."
936,7.4,True Grit,,Adventure,128,"[u'John Wayne', u'Kim Darby', u'Glen Campbell']"


#### 5.1.3. Sorting index in Series

When we use “value_counts( )”, a series method, it returns the number of times a particular value appeared in the series. It can be followed by “sort_index( )” which sorts the index of the series i.e. unique values in that particular series. Although we are sorting genre of movies, know that sorting index of a series is most useful in time-series data.

In [8]:
movies.genre.value_counts().head()

Drama        278
Comedy       156
Action       136
Crime        124
Biography     77
Name: genre, dtype: int64

In [9]:
movies.genre.value_counts().sort_index()

Action       136
Adventure     75
Animation     62
Biography     77
Comedy       156
Crime        124
Drama        278
Family         2
Fantasy        1
Film-Noir      3
History        1
Horror        29
Mystery       16
Sci-Fi         5
Thriller       5
Western        9
Name: genre, dtype: int64

#### 5.1.4. Sorting index in DataFrame

When we are using a column in our DataFrame as the index, we may need to use “sort_index( )” as a DataFrame method. Again, we may be using it for sorting the title of movies, but I have found them most useful with time-series data. Also, note that underlying data has not changed as “set_index( )” requires “inplace=True” to change underlying data.



In [10]:
movies.set_index("title").sort_index()

Unnamed: 0_level_0,star_rating,content_rating,genre,duration,actors_list
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
(500) Days of Summer,7.8,PG-13,Comedy,95,"[u'Zooey Deschanel', u'Joseph Gordon-Levitt', ..."
12 Angry Men,8.9,NOT RATED,Drama,96,"[u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals..."
12 Years a Slave,8.1,R,Biography,134,"[u'Chiwetel Ejiofor', u'Michael Kenneth Willia..."
127 Hours,7.6,R,Adventure,94,"[u'James Franco', u'Amber Tamblyn', u'Kate Mara']"
2001: A Space Odyssey,8.3,G,Mystery,160,"[u'Keir Dullea', u'Gary Lockwood', u'William S..."
...,...,...,...,...,...
Zero Dark Thirty,7.4,R,Drama,157,"[u'Jessica Chastain', u'Joel Edgerton', u'Chri..."
Zodiac,7.7,R,Crime,157,"[u'Jake Gyllenhaal', u'Robert Downey Jr.', u'M..."
Zombieland,7.7,R,Comedy,88,"[u'Jesse Eisenberg', u'Emma Stone', u'Woody Ha..."
Zulu,7.8,UNRATED,Drama,138,"[u'Stanley Baker', u'Jack Hawkins', u'Ulla Jac..."


### 5.2. Iterating through values in a Series and DataFrame

#### 5.2.1. Iterating through values in Series

We can iterate through values in a series as we iterate through values in a list in python.

In [11]:
for gen in movies.genre:
    print(gen)

Crime
Crime
Crime
Action
Crime
Drama
Western
Adventure
Biography
Drama
Adventure
Action
Action
Drama
Adventure
Adventure
Drama
Drama
Biography
Action
Action
Crime
Drama
Crime
Drama
Comedy
Western
Drama
Crime
Comedy
Animation
Biography
Drama
Drama
Crime
Comedy
Action
Action
Mystery
Horror
Crime
Drama
Biography
Action
Action
Action
Mystery
Drama
Comedy
Crime
Drama
Drama
Comedy
Drama
Adventure
Animation
Drama
Horror
Drama
Western
Comedy
Animation
Horror
Crime
Animation
Crime
Comedy
Drama
Adventure
Animation
Comedy
Adventure
Drama
Drama
Drama
Action
Mystery
Drama
Crime
Crime
Action
Animation
Action
Drama
Drama
Adventure
Crime
Drama
Comedy
Drama
Crime
Drama
Crime
Comedy
Comedy
Drama
Action
Comedy
Crime
Biography
Action
Adventure
Drama
Comedy
Drama
Film-Noir
Comedy
Western
Drama
Comedy
Mystery
Comedy
Crime
Action
Adventure
Crime
Drama
Animation
Action
Western
Adventure
Drama
Crime
Action
Biography
Biography
Animation
Drama
Adventure
Action
Drama
Animation
Drama
Adventure
Drama
Action
Drama
A

#### 5.2.2. Iterating through values in DataFrame

Interesting trough a DataFrame is a bit different, and kind of like “enumerate”. We will use a DataFrame method “iterrows( )” for this purpose.



In [12]:
for index, row in movies.iterrows():
    print(index, row.title, row.star_rating)

0 The Shawshank Redemption 9.3
1 The Godfather 9.2
2 The Godfather: Part II 9.1
3 The Dark Knight 9.0
4 Pulp Fiction 8.9
5 12 Angry Men 8.9
6 The Good, the Bad and the Ugly 8.9
7 The Lord of the Rings: The Return of the King 8.9
8 Schindler's List 8.9
9 Fight Club 8.9
10 The Lord of the Rings: The Fellowship of the Ring 8.8
11 Inception 8.8
12 Star Wars: Episode V - The Empire Strikes Back 8.8
13 Forrest Gump 8.8
14 The Lord of the Rings: The Two Towers 8.8
15 Interstellar 8.7
16 One Flew Over the Cuckoo's Nest 8.7
17 Seven Samurai 8.7
18 Goodfellas 8.7
19 Star Wars 8.7
20 The Matrix 8.7
21 City of God 8.7
22 It's a Wonderful Life 8.7
23 The Usual Suspects 8.7
24 Se7en 8.7
25 Life Is Beautiful 8.6
26 Once Upon a Time in the West 8.6
27 The Silence of the Lambs 8.6
28 Leon: The Professional 8.6
29 City Lights 8.6
30 Spirited Away 8.6
31 The Intouchables 8.6
32 Casablanca 8.6
33 Whiplash 8.6
34 American History X 8.6
35 Modern Times 8.6
36 Saving Private Ryan 8.6
37 Raiders of the Lost A

510 Moonrise Kingdom 7.8
511 Rebel Without a Cause 7.8
512 Fantastic Mr. Fox 7.8
513 Invasion of the Body Snatchers 7.8
514 October Sky 7.8
515 Dirty Harry 7.8
516 Ghostbusters 7.8
517 Captain America: The Winter Soldier 7.8
518 Wreck-It Ralph 7.8
519 The Hangover 7.8
520 Back to the Future Part II 7.8
521 Belle de Jour 7.8
522 O Brother, Where Art Thou? 7.8
523 Repulsion 7.8
524 Airplane! 7.8
525 Pride & Prejudice 7.8
526 Zulu 7.8
527 Night on Earth 7.8
528 From Here to Eternity 7.8
529 Apocalypto 7.8
530 Atonement 7.8
531 The Dirty Dozen 7.8
532 X-Men: First Class 7.8
533 Run Lola Run 7.8
534 The Longest Day 7.8
535 Zelig 7.8
536 The Last Emperor 7.8
537 The Goonies 7.8
538 The White Ribbon 7.8
539 The Fugitive 7.8
540 The Color Purple 7.8
541 South Park: Bigger Longer & Uncut 7.8
542 (500) Days of Summer 7.8
543 Lost in Translation 7.8
544 Argo 7.8
545 Blazing Saddles 7.8
546 Breakfast at Tiffany's 7.8
547 Finding Neverland 7.8
548 The Experiment 7.8
549 Lucky Number Slevin 7.8
550 

957 National Lampoon's Vacation 7.4
958 My Sister's Keeper 7.4
959 Deconstructing Harry 7.4
960 The Way Way Back 7.4
961 Capote 7.4
962 Driving Miss Daisy 7.4
963 La Femme Nikita 7.4
964 Lincoln 7.4
965 Limitless 7.4
966 The Simpsons Movie 7.4
967 The Rock 7.4
968 The English Patient 7.4
969 Law Abiding Citizen 7.4
970 Wonder Boys 7.4
971 Death at a Funeral 7.4
972 Blue Valentine 7.4
973 The Cider House Rules 7.4
974 Tootsie 7.4
975 Back to the Future Part III 7.4
976 Master and Commander: The Far Side of the World 7.4
977 Poltergeist 7.4
978 Wall Street 7.4
