# Dataframe Operations

* 1.value_counts()

The value_counts() function in Pandas is used to count the occurrences of unique values in a Series. It returns a Series containing counts of unique values, sorted in descending order by default.

In [1]:
import pandas as pd

# Create a DataFrame
data = {'A': ['foo', 'bar', 'foo', 'baz', 'bar', 'foo']}
df = pd.DataFrame(data)

# Count occurrences of unique values in column 'A'
value_counts = df['A'].value_counts()

print("Value Counts:\n", value_counts)


Value Counts:
 A
foo    3
bar    2
baz    1
Name: count, dtype: int64


* 2. apply()

The apply() function in Pandas allows you to apply a function along the axis of a DataFrame. It is a powerful tool for transforming data, especially when combined with lambda functions or custom functions.

In [2]:
import pandas as pd

# Create a DataFrame
data = {'A': [1, 2, 3, 4, 5],
        'B': [6, 7, 8, 9, 10]}

df = pd.DataFrame(data)

# Define a function to double the values
def double(x):
    return x * 2

# Apply the function to each element of the DataFrame
result = df.apply(double)

print("Result after applying function:\n", result)


Result after applying function:
     A   B
0   2  12
1   4  14
2   6  16
3   8  18
4  10  20


* 3. unique()
   
The unique() function returns an array of unique values in a Series.

In [3]:
import pandas as pd

# Create a DataFrame
data = {'A': ['foo', 'bar', 'foo', 'baz', 'bar', 'foo']}
df = pd.DataFrame(data)

# Get unique values in column 'A'
unique_values = df['A'].unique()

print("Unique Values:", unique_values)


Unique Values: ['foo' 'bar' 'baz']


* 4. nunique()
     
The nunique() function returns the number of unique values in a Series.

In [4]:
import pandas as pd

# Create a DataFrame
data = {'A': ['foo', 'bar', 'foo', 'baz', 'bar', 'foo']}
df = pd.DataFrame(data)

# Get the number of unique values in column 'A'
num_unique_values = df['A'].nunique()

print("Number of Unique Values:", num_unique_values)


Number of Unique Values: 3


* 5. describe()
     
The describe() function provides descriptive statistics for numeric columns in the DataFrame, such as count, mean, std, min, max, and quartiles.

In [5]:
import pandas as pd

# Create a DataFrame
data = {'A': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)

# Get descriptive statistics for numeric columns
description = df.describe()

print("Descriptive Statistics:\n", description)

Descriptive Statistics:
               A
count  5.000000
mean   3.000000
std    1.581139
min    1.000000
25%    2.000000
50%    3.000000
75%    4.000000
max    5.000000


* 5. idxmax() and idxmin()
     
The idxmax() and idxmin() functions return the index label of the maximum and minimum values in a Series, respectively.

In [6]:
import pandas as pd

# Create a DataFrame
data = {'A': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Get the index label of the maximum and minimum values in column 'A'
max_index = df['A'].idxmax()
min_index = df['A'].idxmin()

print("Index label of maximum value:", max_index)
print("Index label of minimum value:", min_index)

Index label of maximum value: 4
Index label of minimum value: 0


* 6. applymap()

The applymap() function applies a function to every element of the DataFrame.

In [7]:
import pandas as pd

# Create a DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

# Define a function to double the values
def double(x):
    return x * 2

# Apply the function to every element of the DataFrame
result = df.applymap(double)

print("Result after applying function:\n", result)


Result after applying function:
    A   B
0  2   8
1  4  10
2  6  12


  result = df.applymap(double)


* 7.map()
  
The map() function is used to substitute each value in a Series with another value.

In [8]:
import pandas as pd

# Create a DataFrame
data = {'A': ['foo', 'bar', 'baz']}
df = pd.DataFrame(data)

# Define a mapping dictionary
mapping = {'foo': 1, 'bar': 2, 'baz': 3}

# Map values in column 'A' using the mapping dictionary
df['A'] = df['A'].map(mapping)

print("DataFrame after mapping:\n", df)

DataFrame after mapping:
    A
0  1
1  2
2  3


* 8.groupby()
  
The groupby() function is used to group DataFrame using a mapper or by a Series of columns.

In [9]:
import pandas as pd

# Create a DataFrame
data = {'A': ['foo', 'bar', 'foo', 'bar'], 'B': [1, 2, 3, 4]}
df = pd.DataFrame(data)

# Group DataFrame by column 'A' and calculate the sum
grouped_df = df.groupby('A').sum()

print("Grouped DataFrame:\n", grouped_df)

Grouped DataFrame:
      B
A     
bar  6
foo  4


* 9.pivot_table()
  
The pivot_table() function creates a spreadsheet-style pivot table as a DataFrame.

In [10]:
import pandas as pd

# Create a DataFrame
data = {'A': ['foo', 'foo', 'bar', 'bar'], 'B': ['one', 'one', 'two', 'two'], 'C': [1, 2, 3, 4]}
df = pd.DataFrame(data)

# Create a pivot table
pivot_table = df.pivot_table(values='C', index='A', columns='B', aggfunc='sum')

print("Pivot Table:\n", pivot_table)

Pivot Table:
 B    one  two
A            
bar  NaN  7.0
foo  3.0  NaN


* 10.sort_values()  
The sort_values() function sorts the DataFrame by the values along either axis.

In [11]:
import pandas as pd

# Create a DataFrame
data = {'A': [3, 2, 1], 'B': [6, 5, 4]}
df = pd.DataFrame(data)

# Sort DataFrame by values in column 'A' in descending order
sorted_df = df.sort_values(by='A', ascending=False)

print("Sorted DataFrame:\n", sorted_df)

Sorted DataFrame:
    A  B
0  3  6
1  2  5
2  1  4


* 11. merge()  
The merge() function is used to merge DataFrame objects by performing a database-style join operation.

In [12]:
import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})
df2 = pd.DataFrame({'A': ['A0', 'A1'], 'C': ['C0', 'C1']})

# Merge the DataFrames based on common values in column 'A'
merged_df = pd.merge(df1, df2, on='A')

print("Merged DataFrame:\n", merged_df)

Merged DataFrame:
     A   B   C
0  A0  B0  C0
1  A1  B1  C1


* 12.concat()  
The concat() function is used to concatenate pandas objects along a particular axis.

In [13]:
import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})
df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']})

# Concatenate the DataFrames along rows (axis=0)
concatenated_df = pd.concat([df1, df2], axis=0)

print("Concatenated DataFrame:\n", concatenated_df)

Concatenated DataFrame:
     A   B
0  A0  B0
1  A1  B1
0  A2  B2
1  A3  B3


* 13.fillna()  
The fillna() function is used to fill missing values in the DataFrame.

In [14]:
import pandas as pd
import numpy as np

# Create a DataFrame with missing values
data = {'A': [1, np.nan, 3], 'B': [np.nan, 5, np.nan]}
df = pd.DataFrame(data)

# Fill missing values with 0
filled_df = df.fillna(0)

print("DataFrame after filling missing values:\n", filled_df)

DataFrame after filling missing values:
      A    B
0  1.0  0.0
1  0.0  5.0
2  3.0  0.0
