import numpy as np
import pandas as pd

Great! Based on the structure of your `netflix.csv` file, here are some **lab-style Pandas questions** that comprehensively cover important topics:

---

### ✅ **Pandas Questions on Netflix Dataset**

#### 1. **Basic Exploration**

* Q1. Load the dataset and show the first 5 rows.
* Q2. Print the column names and their data types.
* Q3. How many rows and columns are present in the dataset?
* Q4. Show summary statistics for numeric columns.

#### 2. **Data Selection & Filtering**

* Q5. Display only the titles of all TV Shows.
* Q6. Show all movies released in 2020.
* Q7. List all records where the rating is `'TV-MA'`.

#### 3. **Handling Missing Values**

* Q8. Check for missing values in each column.
* Q9. Fill all missing `director` names with `"Not Available"`.
* Q10. Drop all rows where the `country` is missing.

#### 4. **Sorting & Grouping**

* Q11. Sort the dataset by `release_year` in descending order.
* Q12. Group by `type` and count how many entries of each type exist.
* Q13. Find the most common `rating` using value counts.

#### 5. **Column Operations**

* Q14. Create a new column called `content_category` where:

  * If `type` is "Movie", it's "Film"
  * If `type` is "TV Show", it's "Series"
* Q15. Extract the year from `date_added` into a new column called `added_year`.

#### 6. **Text Matching & String Operations**

* Q16. Display all rows where `title` contains the word `"Love"`.
* Q17. Count how many shows include `"International"` in `listed_in`.

#### 7. **File Export**

* Q18. Save the modified dataframe as `"netflix_cleaned.csv"` using `to_csv()`.

---

Would you like me to now answer these with actual **Python codes** using Pandas?


## q1

In [105]:
import pandas as pd
import numpy as np
data = pd.read_csv('netflix_titles.csv')
data.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [33]:
columns = data.shape[1] # to print the no. of columns 
print (columns)
data.dtypes

12


show_id         object
type            object
title           object
director        object
cast            object
country         object
date_added      object
release_year     int64
rating          object
duration        object
listed_in       object
description     object
dtype: object

In [34]:
data.describe()

Unnamed: 0,release_year
count,8807.0
mean,2014.180198
std,8.819312
min,1925.0
25%,2013.0
50%,2017.0
75%,2019.0
max,2021.0


## q2

In [35]:
dispaly = data[
(data["type"]== "TV Show") &
(data["title"])
]
print (dispaly[["type", "title"]])

         type                  title
1     TV Show          Blood & Water
2     TV Show              Ganglands
3     TV Show  Jailbirds New Orleans
4     TV Show           Kota Factory
5     TV Show          Midnight Mass
...       ...                    ...
8795  TV Show        Yu-Gi-Oh! Arc-V
8796  TV Show             Yunus Emre
8797  TV Show              Zak Storm
8800  TV Show     Zindagi Gulzar Hai
8803  TV Show            Zombie Dumb

[2675 rows x 2 columns]


Show all movies released in 2020.

In [36]:
movies = data[
(data['release_year'] == 2020) &
(data['type'] == 'Movie') 
]
print (movies[["release_year",'title' ]])


      release_year                                              title
0             2020                               Dick Johnson Is Dead
16            2020  Europe's Most Dangerous Man: Otto Skorzeny in ...
78            2020                                     Tughlaq Durbar
84            2020                               Omo Ghetto: the Saga
103           2020                                     Shadow Parties
...            ...                                                ...
3046          2020                      All the Freckles in the World
3060          2020                                      Ghost Stories
5972          2020                                   #cats_the_mewvie
7594          2020                 Norm of the North: Family Vacation
8099          2020                                        Straight Up

[517 rows x 2 columns]


In [37]:
ratings = data[
(data['rating']== 'TV-MA') 
]

print (ratings[['title', 'type']])

                       title     type
1              Blood & Water  TV Show
2                  Ganglands  TV Show
3      Jailbirds New Orleans  TV Show
4               Kota Factory  TV Show
5              Midnight Mass  TV Show
...                      ...      ...
8762         Wrong Side Raju    Movie
8769  Y.M.I.: Yeh Mera India    Movie
8788            You Carry Me    Movie
8798                Zed Plus    Movie
8801                 Zinzana    Movie

[3207 rows x 2 columns]


In [38]:
data.isnull().sum()

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

In [39]:
data["director"].fillna("Not available")

0       Kirsten Johnson
1         Not available
2       Julien Leclercq
3         Not available
4         Not available
             ...       
8802      David Fincher
8803      Not available
8804    Ruben Fleischer
8805       Peter Hewitt
8806        Mozez Singh
Name: director, Length: 8807, dtype: object

In [40]:
data["director"]

0       Kirsten Johnson
1                   NaN
2       Julien Leclercq
3                   NaN
4                   NaN
             ...       
8802      David Fincher
8803                NaN
8804    Ruben Fleischer
8805       Peter Hewitt
8806        Mozez Singh
Name: director, Length: 8807, dtype: object

Drop all rows where the country is missing.

In [41]:
data["country"].dropna()

0                                           United States
1                                            South Africa
4                                                   India
7       United States, Ghana, Burkina Faso, United Kin...
8                                          United Kingdom
                              ...                        
8801                         United Arab Emirates, Jordan
8802                                        United States
8804                                        United States
8805                                        United States
8806                                                India
Name: country, Length: 7976, dtype: object

Sort the dataset by release_year in descending order.

In [42]:
data["release_year"].sort_values(ascending= False)

1       2021
2       2021
3       2021
31      2021
30      2021
        ... 
8739    1943
8660    1943
7790    1942
8205    1942
4250    1925
Name: release_year, Length: 8807, dtype: int64

In [43]:
data = data.sort_values(by="release_year", ascending=False)
print (data)

     show_id     type                                          title  \
1         s2  TV Show                                  Blood & Water   
2         s3  TV Show                                      Ganglands   
3         s4  TV Show                          Jailbirds New Orleans   
31       s32  TV Show                             Chicago Party Aunt   
30       s31    Movie                                Ankahi Kahaniya   
...      ...      ...                                            ...   
8739   s8740    Movie             Why We Fight: The Battle of Russia   
8660   s8661    Movie  Undercover: How to Operate Behind Enemy Lines   
7790   s7791    Movie                                 Prelude to War   
8205   s8206    Movie                           The Battle of Midway   
4250   s4251  TV Show              Pioneers: First Women Filmmakers*   

                                               director  \
1                                                   NaN   
2                

 Group by type and count how many entries of each type exist.

In [44]:
type_count = data.groupby("type")["type"].size()
type_count

type
Movie      6132
TV Show    2675
Name: type, dtype: int64

Find the most common rating using value counts.

In [45]:
common = data["rating"].value_counts().head()
common

rating
TV-MA    3207
TV-14    2160
TV-PG     863
R         799
PG-13     490
Name: count, dtype: int64

Create a new column called content_category where:

If type is "Movie", it's "Film"
If type is "TV Show", it's "Series"

In [46]:
data["content_category"] = data["type"].apply(
    lambda x : "it is a movie" if x == "Movie" else "it is a series"
)

data[["type", "content_category"]]

Unnamed: 0,type,content_category
1,TV Show,it is a series
2,TV Show,it is a series
3,TV Show,it is a series
31,TV Show,it is a series
30,Movie,it is a movie
...,...,...
8739,Movie,it is a movie
8660,Movie,it is a movie
7790,Movie,it is a movie
8205,Movie,it is a movie


In [47]:
print (data['content_category'])

1       it is a series
2       it is a series
3       it is a series
31      it is a series
30       it is a movie
             ...      
8739     it is a movie
8660     it is a movie
7790     it is a movie
8205     it is a movie
4250    it is a series
Name: content_category, Length: 8807, dtype: object


Extract the year from date_added into a new column called added_year.

In [48]:
data["show_love"] = data["title"].apply(
    lambda x : "yes it contains Love" if "Love" in x else "not contains"
) 

data[["show_love", 'title']]

Unnamed: 0,show_love,title
1,not contains,Blood & Water
2,not contains,Ganglands
3,not contains,Jailbirds New Orleans
31,not contains,Chicago Party Aunt
30,not contains,Ankahi Kahaniya
...,...,...
8739,not contains,Why We Fight: The Battle of Russia
8660,not contains,Undercover: How to Operate Behind Enemy Lines
7790,not contains,Prelude to War
8205,not contains,The Battle of Midway


In [76]:
import pandas as pd

peta = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Score': [92, 88, 88, 70]
}
df = pd.DataFrame(peta)

df['Rank'] = df['Score'].rank(ascending=False)  # higher score = higher rank



In [77]:
import pandas as pd

data = pd.read_csv("netflix_titles.csv")  # or the correct path


In [78]:
data["type"].value_counts()

type
Movie      6132
TV Show    2675
Name: count, dtype: int64

In [79]:
data.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description'],
      dtype='object')

In [80]:
common = data["rating"].value_counts().head(1)

In [81]:
common

rating
TV-MA    3207
Name: count, dtype: int64

In [102]:
data["sortu"] = data["title"].apply(
    lambda x : "Yes this contains love" if "Love" in x else "it doesnt contain love"
)


In [103]:
data[['title', 'sortu']]

Unnamed: 0,title,sortu
0,Dick Johnson Is Dead,it doesnt contain love
1,Blood & Water,it doesnt contain love
2,Ganglands,it doesnt contain love
3,Jailbirds New Orleans,it doesnt contain love
4,Kota Factory,it doesnt contain love
...,...,...
8802,Zodiac,it doesnt contain love
8803,Zombie Dumb,it doesnt contain love
8804,Zombieland,it doesnt contain love
8805,Zoom,it doesnt contain love


In [104]:
data[['show_id' , 'type']]

Unnamed: 0,show_id,type
0,s1,Movie
1,s2,TV Show
2,s3,TV Show
3,s4,TV Show
4,s5,TV Show
...,...,...
8802,s8803,Movie
8803,s8804,TV Show
8804,s8805,Movie
8805,s8806,Movie


In [87]:
print(type(peta))


<class 'dict'>


In [88]:
op = pd.DataFrame(peta)

In [90]:
op.to_excel("netflix_titles.xlsx", sheet_name="Ohh yeah", index=False)


In [91]:
pd.DataFrame(data)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,sortu
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",it doesnt contain live
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",it doesnt contain live
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,it doesnt contain live
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",it doesnt contain live
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,it doesnt contain live
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a...",it doesnt contain live
8803,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g...",it doesnt contain live
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...,it doesnt contain live
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero...",it doesnt contain live


In [95]:
data.to_excel("netflix_titles.xlsx", sheet_name = "Data ki mkc", index = False)

In [98]:
data.index

RangeIndex(start=0, stop=8807, step=1)

In [106]:
data.groupby('type').size()

type
Movie      6132
TV Show    2675
dtype: int64