# Lesson 3: Sorting and Ranking Data

Here's your text formatted in Markdown:

```markdown
# Lesson Introduction

Hey there! Today, we're diving into sorting and ranking data. These techniques help us organize and see patterns in our data, making it easier to analyze and draw conclusions. By the end of this lesson, you'll know how to sort data in different ways and rank data within groups using Pandas. We'll be working with the Titanic dataset, the same one we've used in previous lessons.

Sorting and ranking might sound a bit technical, but think of it like sorting your favorite toy collection by size or ranking your friends by age. It’s all about making data neat and meaningful!

## Sorting Data

Sorting data means arranging it in a specific order, like alphabetizing words in a dictionary or listing numbers from smallest to largest. Let's start with some basics. Here's how to sort data by a single column. Suppose we want to sort passengers by how much they paid for their tickets (fare).

```python
import seaborn as sns

# Load the Titanic dataset
titanic = sns.load_dataset('titanic')

# Sort by fare in descending order
titanic_sorted = titanic.sort_values(by='fare', ascending=False)
print(titanic_sorted[['fare', 'class']].head())
```

**Output:**

```
       fare   class
258  512.3292  First
680  512.3292  First
737  262.3750  First
27   263.0000  First
311  262.3750  First
```

Here, our data is sorted by fare in the descending order. We control it using `by` and `ascending` arguments of the `sort_values` function.

Imagine you're a librarian organizing books. Sorting helps you find books faster. Similarly, sorting data helps analysts focus on key information quickly, like the highest sales or the oldest customers.

## Ranking Data

Ranking data means assigning a rank (like 1st, 2nd, 3rd) to items in your data based on their values. Let's use a simple dataset to make this clearer. Below is a small dataset of students and their scores.

```python
import pandas as pd

# Sample dataset
data = {
    'student': ['Alice', 'Bob', 'Charlie', 'David'],
    'score': [88, 92, 85, 92]
}
students = pd.DataFrame(data)
print(students)
```

**Output:**

```
   student  score
0    Alice     88
1      Bob     92
2  Charlie     85
3    David     92
```

Now, let's see how to rank students by their scores.

```python
# Rank students by their score
students['score_rank'] = students['score'].rank(method='average', ascending=True)
print(students)
```

**Output:**

```
   student  score  score_rank
0    Alice     88         2.0
1      Bob     92         3.5
2  Charlie     85         1.0
3    David     92         3.5
```

This table clearly shows how the students' scores are ranked. For example, Charlie has the lowest score, and he is ranked 1. Bob and David have a tie – they both hold the score of 92. We specified the tie handling method `average`. As Bob and David share ranks 3 and 4, their average rank is 3.5.

### Methods of Ranking

There are different methods of sorting ties in ranking:

- **average**: Ranks are averaged if there are ties.
- **min**: The smallest rank is assigned to all ties.
- **max**: The largest rank is assigned to all ties.
- **first**: Ranks are assigned in the order they appear.
- **dense**: Like min, but the rank of the next group is just one more than the previous group.

For example:

```python
# Rank by score, using the 'min' method
students['score_rank_min'] = students['score'].rank(method='min')
print(students)
```

**Output:**

```
   student  score  score_rank  score_rank_min
0    Alice     88         2.0             2.0
1      Bob     92         3.5             3.0
2  Charlie     85         1.0             1.0
3    David     92         3.5             3.0
```

In this case, Bob and David, sharing ranks 3 and 4, were assigned the minimum rank – 3.

## Lesson Summary

In this lesson, we learned the importance of sorting and ranking data to organize and extract meaningful insights from it. We covered how to:

- Sort data by a single column using `.sort_values()`.
- Rank data within groups using `.rank()` and explored different ranking methods like average and min.

Now that you've grasped the theory, it's time to put your skills to the test! In the upcoming practice session, you'll use what you've learned to sort and rank data in various ways. This hands-on practice will help solidify your understanding and make you more comfortable with these essential data manipulation techniques.

Happy coding!
```

This Markdown format provides clear sections, code blocks, and outputs that make the lesson easy to follow and understand.

## Organizing a Library by Publication Year

In the given code, you can see how to organize books in a library by their publication year. Sorting helps us find books faster. Let's see the sorted list!

```py
import pandas as pd

# Create a small dataset of books with their publication year and genre
data = {
    'book': ['Book A', 'Book B', 'Book C', 'Book D'],
    'year': [1999, 2010, 2005, 2015],
    'genre': ['Fiction', 'Science', 'Fiction', 'Science']
}
library = pd.DataFrame(data)

# Sort the books by publication year in ascending order
library_sorted = library.sort_values(by='year', ascending=True)
print(library_sorted[['book', 'year']])

```

Your Python code snippet is well-structured and effectively demonstrates how to sort a list of books by their publication year using Pandas. Here's a quick review and explanation of what your code does:

1. **Importing Pandas**: You start by importing the Pandas library, which is essential for data manipulation in Python.

2. **Creating a Data Frame**: You create a DataFrame named `library` from a dictionary `data`. This dictionary contains three keys: 'book', 'year', and 'genre', each associated with a list of values representing the attributes of four books.

3. **Sorting the DataFrame**: You use the `sort_values()` method to sort the DataFrame based on the 'year' column in ascending order. This method is perfect for arranging data according to numerical or date/time values.

4. **Printing the Sorted DataFrame**: Finally, you print the sorted DataFrame but only display the 'book' and 'year' columns. This shows the books in the order of their publication year, making it easy to see the chronological order of release.

This code snippet is an excellent example of how to use Pandas for simple sorting tasks. It's clear, concise, and should work perfectly for anyone needing to organize data similarly. If you have any specific questions about this code or need further modifications, feel free to ask!

## Sorting Books in Descending Order

Now, Space Explorer, let's flip things around! Modify the starter code to sort the books by their publication year in descending order. This means you need to change one parameter in the sort_values method.

Let's code!

```py
import pandas as pd

# Sample dataset of books
data = {
    'book_title': ['Book A', 'Book B', 'Book C', 'Book D'],
    'publish_year': [2001, 1999, 2010, 1999]
}
books = pd.DataFrame(data)

# Sort books by their publish year in ascending order
books_sorted = books.sort_values(by='publish_year')
print(books_sorted)


```

To sort the books by their publication year in descending order, you simply need to adjust the `ascending` parameter in the `sort_values` method to `False`. Here's how you can modify your code:

```python
import pandas as pd

# Sample dataset of books
data = {
    'book_title': ['Book A', 'Book B', 'Book C', 'Book D'],
    'publish_year': [2001, 1999, 2010, 1999]
}
books = pd.DataFrame(data)

# Sort books by their publish year in descending order
books_sorted = books.sort_values(by='publish_year', ascending=False)
print(books_sorted)
```

This change will ensure that the DataFrame `books_sorted` will display the books starting from the most recently published to the oldest, as the `ascending=False` parameter sorts the values in descending order.

## Ranking Library Books Based on Ratings

Fill in the blanks to rank books by their ratings using the dense method in descending order. May the cosmos guide you!

```py
import pandas as pd

# Sample data representing a library with books and their ratings
data = {
    'book': ['Book A', 'Book B', 'Book C', 'Book D'],
    'rating': [4.7, 4.5, 4.9, 4.5]
}
library = pd.DataFrame(data)

# TODO: Rank books by their rating using an appropriate ranking method
print(library)

```

To rank the books by their ratings using the "dense" method in descending order, you'll need to use the `rank()` method from Pandas. Here's how to fill in the blanks in your code to accomplish this:

```python
import pandas as pd

# Sample data representing a library with books and their ratings
data = {
    'book': ['Book A', 'Book B', 'Book C', 'Book D'],
    'rating': [4.7, 4.5, 4.9, 4.5]
}
library = pd.DataFrame(data)

# Rank books by their rating using the 'dense' method in descending order
library['rating_rank'] = library['rating'].rank(method='dense', ascending=False)

print(library)
```

### Explanation:
- **`rank(method='dense', ascending=False)`**: This line ranks the books based on their ratings. The `dense` method means that the ranks will be consecutive integers, and no rank will be skipped even if there are ties. Setting `ascending=False` ensures that higher ratings get lower rank numbers (i.e., a higher rating gets a rank closer to 1).
- **Output**: The DataFrame will now include a new column `rating_rank` that shows the rank of each book based on its rating, with the best-rated book receiving the highest rank (or the smallest number if `ascending=False`).

This approach effectively prioritizes books with higher ratings and handles ties by assigning them the same rank, with the next rank number being incremented without gaps.

## Sorting Library Books by Pages

Keep going, Stellar Navigator! Sort the books in the library by the number of pages. Fill in the missing pieces to make the code work.

```py
import pandas as pd

# Library books data
data = {
    'title': ['Zorba the Greek', '1984', 'The Catcher in the Rye', 'To Kill a Mockingbird'],
    'pages': [330, 328, 277, 281]
}
books = pd.DataFrame(data)

# TODO: Sort the books by the number of pages in ascending order
print(books_sorted)
```

It seems like you have initialized the DataFrame `books` but you haven't yet sorted it by the number of pages in ascending order, nor have you defined `books_sorted`. Let's correct that by adding the necessary sorting operation before printing the sorted DataFrame. Here's the corrected code:

```python
import pandas as pd

# Library books data
data = {
    'title': ['Zorba the Greek', '1984', 'The Catcher in the Rye', 'To Kill a Mockingbird'],
    'pages': [330, 328, 277, 281]
}
books = pd.DataFrame(data)

# Sort the books by the number of pages in ascending order
books_sorted = books.sort_values(by='pages', ascending=True)

# Print the sorted DataFrame
print(books_sorted)
```

This code will now correctly display the books sorted by the number of pages in ascending order, from the fewest to the most pages.