## **Let's use Leetcode**

If you're on your journey to mastering data manipulation, especially using **Pandas**, I've got something exciting for you. One of the best ways to sharpen your skills is by solving real-world problems, and **LeetCode** has introduced an excellent **Introduction to Pandas Study Plan** that I highly recommend you dive into. You can check it out [here](https://leetcode.com/studyplan/introduction-to-pandas/).

Why should you spend time solving these problems?

### 1. **Strengthen Your Pandas Skills**
Pandas is the go-to library for data manipulation in Python, and it's essential for anyone working in data science. This study plan will help you build a solid foundation in Pandas, from basic data manipulation to more advanced operations like grouping, merging, and reshaping data. You’ll face challenges that mimic real-world tasks, making it a practical learning experience.

### 2. **Learn By Doing**
Reading tutorials and watching videos is great, but nothing beats hands-on problem-solving. As you solve each problem, you'll become more confident in applying Pandas functions and writing efficient code. The interactive coding platform on LeetCode also provides instant feedback, which helps in refining your approach and learning better practices.

### 3. **Prepare for Interviews**
Pandas-related questions are common in data science and data analyst interviews. By tackling these problems, you'll be preparing yourself for common data challenges asked in technical interviews. It’s a double win — you're learning and also gearing up for interviews!

### 4. **Build Problem-Solving Mindset**
LeetCode is known for its vast collection of problems that not only focus on technical skills but also encourage a strong problem-solving mindset. As you progress, you’ll notice an improvement in your ability to break down a problem, figure out the best approach, and implement a solution efficiently — all key skills for a successful data science career.

---

### **I’ve Got Solutions to Share!**
To give you a head start, I’ve already solved a few problems from this study plan, and I’m excited to share my solutions with you. I’ve worked through some interesting challenges and would be happy to discuss the thought process and approaches I used. If you're stuck on a problem, feel free to reach out or compare your solution with mine — it’s a great way to learn from different perspectives.

---

### **Take Action!**
So, if you want to strengthen your Pandas knowledge while boosting your problem-solving skills, head over to the [LeetCode Pandas Study Plan](https://leetcode.com/studyplan/introduction-to-pandas/) and start solving. Whether you’re a beginner or someone looking to refresh your skills, this is a fantastic resource.

And remember, it’s all about consistency. Even if you solve just one problem a day, that’s progress. Stay patient, stay curious, and before you know it, you’ll be much more confident in handling data using Pandas.


Q1 : https://leetcode.com/problems/create-a-dataframe-from-list/description/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def createDataframe(student_data: List[List[int]]) -> pd.DataFrame:
    return pd.DataFrame(student_data, columns=['student_id', 'age'])

Q2 : https://leetcode.com/problems/get-the-size-of-a-dataframe/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def getDataframeSize(players: pd.DataFrame) -> List[int]:
    row_num, column_num = players.shape
    return [row_num, column_num]

Q3: https://leetcode.com/problems/display-the-first-three-rows/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def selectFirstRows(employees: pd.DataFrame) -> pd.DataFrame:
    return employees.head(3)

Q4 : https://leetcode.com/problems/select-data/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def selectData(students: pd.DataFrame) -> pd.DataFrame:
    return students.loc[students['student_id']==101, ['name', 'age']]

Q5 : https://leetcode.com/problems/create-a-new-column/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def createBonusColumn(employees: pd.DataFrame) -> pd.DataFrame:
    employees['bonus'] = employees['salary']*2
    return employees

Q6 : https://leetcode.com/problems/drop-duplicate-rows/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
    return customers.drop_duplicates(subset = 'email')

Q7 : https://leetcode.com/problems/drop-missing-data/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def dropMissingData(students: pd.DataFrame) -> pd.DataFrame:
    return students.dropna(subset="name")

Q8 : https://leetcode.com/problems/modify-columns/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def modifySalaryColumn(employees: pd.DataFrame) -> pd.DataFrame:
    employees['salary']= employees['salary']*2
    return employees

Q9 : https://leetcode.com/problems/rename-columns/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def renameColumns(students: pd.DataFrame) -> pd.DataFrame:
    return students.rename(columns={'id': 'student_id', 'first':'first_name', 'last':'last_name', 'age':'age_in_years'})
    

Q10 : https://leetcode.com/problems/change-data-type/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
    students['grade'] = students['grade'].astype({"grade" : "int"})
    return students

Q11 : https://leetcode.com/problems/fill-missing-data/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
    products['quantity'] = products['quantity'].fillna(0)
    return products

Q12 : https://leetcode.com/problems/reshape-data-concatenate/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def concatenateTables(df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
    return pd.concat([df1,df2])

Q13 : https://leetcode.com/problems/reshape-data-pivot/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
    pWeather = weather.pivot(index='month', columns='city', values='temperature')
    return pWeather

Q14: https://leetcode.com/problems/reshape-data-melt/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def meltTable(report: pd.DataFrame) -> pd.DataFrame:
    meltdf = pd.melt(report, id_vars=['product'], var_name = 'quarter', value_name ='sales')
    return meltdf

Q15 : https://leetcode.com/problems/method-chaining/?envType=study-plan-v2&envId=introduction-to-pandas&lang=pythondata

In [None]:
import pandas as pd

def findHeavyAnimals(animals: pd.DataFrame) -> pd.DataFrame:
    heavy_animals = animals[animals['weight'] > 100].sort_values(by='weight', ascending=False)
    return pd.DataFrame(heavy_animals['name'])

30 Day Challenge

Q : https://leetcode.com/problems/big-countries/submissions/1395277988/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata

In [None]:
import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    return world[(world.area >=3000000) | (world.population >= 25000000)][['name', 'population', 'area' ]]

Q : https://leetcode.com/problems/recyclable-and-low-fat-products/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata

In [None]:
import pandas as pd

def find_products(products: pd.DataFrame) -> pd.DataFrame:
    return products[(products.low_fats == 'Y') & (products.recyclable == 'Y')][['product_id']]

Q : https://leetcode.com/problems/customers-who-never-order/description/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata

In [None]:
import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    cust = customers[~customers['id'].isin(orders['customerId'])][['name']]
    return cust.rename(columns = {"name":'Customers'})

Q : https://leetcode.com/problems/article-views-i/submissions/1395339239/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata

In [None]:
import pandas as pd

def article_views(views: pd.DataFrame) -> pd.DataFrame:
    sameAuthor = views[(views['author_id']==views['viewer_id'])].sort_values(by='viewer_id', ascending=True)[['viewer_id']].drop_duplicates()
    return sameAuthor.rename(columns={'viewer_id': 'id'})

Q: https://leetcode.com/problems/invalid-tweets/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata

In [None]:
import pandas as pd

def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    return tweets[tweets['content'].str.len() >15][['tweet_id']]

Q : https://leetcode.com/problems/calculate-special-bonus/description/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata

In [None]:
import pandas as pd

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    # Apply the bonus condition to each row
    employees['bonus'] = employees.apply(
        lambda row: row['salary'] if row['employee_id'] % 2 == 1 and not row['name'].startswith('M') else 0, axis=1
    )
    
    # Select only the employee_id and bonus columns
    result = employees[['employee_id', 'bonus']].sort_values(by='employee_id')
    
    return result

Q : https://leetcode.com/problems/fix-names-in-a-table/description/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata

In [None]:
import pandas as pd

def fix_names(users: pd.DataFrame) -> pd.DataFrame:
    users['name'] = users['name'].str.capitalize()
    return users.sort_values('user_id')

Q : https://leetcode.com/problems/find-users-with-valid-e-mails/description/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata

In [None]:
import pandas as pd

def valid_emails(users: pd.DataFrame) -> pd.DataFrame:
    return users[users['mail'].str.match(r"^[a-zA-Z][a-zA-Z0-9_.-]*\@leetcode\.com$")]
    

Q : https://leetcode.com/problems/patients-with-a-condition/description/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata

In [None]:
import pandas as pd

def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
    return patients[patients['conditions'].str.contains(r"\bDIAB1")]

### Question:

You are given a dataset with two columns: `file_name` and `content`. Each row represents a file with its name and the textual content within that file. You need to determine how many files contain at least one occurrence of the words **"bull"** and **"bear"** as **standalone words**.

### Task:

1. Write a solution to count the number of files that have at least one standalone occurrence of the word **"bull"** and at least one standalone occurrence of the word **"bear"**, respectively. 
   
   - A word is considered standalone if it has spaces or punctuation marks on both sides, or it appears at the beginning or end of the content without being part of another word. For example, the word "bull" in "bull market" should be counted, but not in "bullish". Similarly, count "bear" as a standalone word, but not "bearish".
   
2. The output should return a table with two rows: 
   - One row for the word **"bull"**, showing how many files contain the word as a standalone occurrence.
   - One row for the word **"bear"**, showing how many files contain the word as a standalone occurrence.
   
3. The table should have the following columns:
   - `word`: Either **"bull"** or **"bear"**.
   - `occurrences`: The number of files that contain the word at least once as a standalone occurrence.

4. Implement the solution in Python using **Pandas** and **regular expressions**.

### Example:

Given the following `Employees` dataset:

| file_name | content                                                                |
|-----------|-------------------------------------------------------------------------|
| file1     | The bull market is strong, but beware of the bear market                |
| file2     | Investors see a bear approaching but no bull in sight                   |
| file3     | bull and bear are often used to describe market trends                  |
| file4     | The bullfight was exciting, but no mention of bear                      |
| file5     | A bullish trend may turn bearish at any moment                          |

### Output:

The output should be a DataFrame like this:

| word  | occurrences |
|-------|-------------|
| bull  | 4           |
| bear  | 4           |

### Additional Clarification:
- **"bull"** and **"bear"** should be counted only when they appear as standalone words, not part of other words like "bullish" or "bearish".
- Case sensitivity should be ignored (i.e., "Bull" and "bull" should both be considered valid occurrences).

### Constraints:
- You are required to use **Pandas** for data manipulation and regular expressions for pattern matching.
- Implement the solution in Python, returning the counts of occurrences of the words **"bull"** and **"bear"** in the specified format.

In [10]:
import pandas as pd

# Step 1: Create the DataFrame
data = {
    'file_name': ['file1', 'file2', 'file3', 'file4', 'file5'],
    'content': [
        'The bull market is strong, but beware of the bear market',
        'Investors see a bear approaching but no bull in sight',
        'bull and bear are often used to describe market trends',
        'The bullfight was exciting, but no mention of bear',
        'A bullish trend may turn bearish at any moment'
    ]
}

df = pd.DataFrame(data)

# Step 2: Define function to find standalone occurrences of 'bull' and 'bear'
def count_occurrences(df):
    # Use regular expressions with word boundaries to count standalone 'bull' and 'bear'
    bull_count = df['content'].str.contains(r'\bbull\b', case=False).sum()
    bear_count = df['content'].str.contains(r'\bbear\b', case=False).sum()
    
    # Return the counts as a DataFrame
    result_df = pd.DataFrame({
        'word': ['bull', 'bear'],
        'occurrences': [bull_count, bear_count]
    })
    
    return result_df

# Step 3: Get the result
result = count_occurrences(df)
print(result)


   word  occurrences
0  bull            3
1  bear            4
