# Pandas `count()`

This is a notebook for the medium article [Getting more value from the Pandas count()](https://bindichen.medium.com/getting-more-value-from-the-pandas-count-3e45a62c7077)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)

In [80]:
import pandas as pd
import numpy as np

In [81]:
df = pd.DataFrame({
    "Person": ["John", "Tom", "Lewis", "John", "Myla"],
    "Age": [24., np.nan, 21., 33, 26],
    "Single": [False, True, True, None, np.datetime64('NaT')],
    "Department": ["Product", "IT", "IT", "IT", "Product"]
})
df

Unnamed: 0,Person,Age,Single,Department
0,John,24.0,False,Product
1,Tom,,True,IT
2,Lewis,21.0,True,IT
3,John,33.0,,IT
4,Myla,26.0,NaT,Product


## 1. Counting non-NA cells for each column and row

In [82]:
df.count()

Person        5
Age           4
Single        3
Department    5
dtype: int64

In [83]:
df.count(axis = 1)

0    4
1    3
2    4
3    3
4    3
dtype: int64

In [84]:
df.count(axis = 'columns')

0    4
1    3
2    4
3    3
4    3
dtype: int64

## 2. Counting non-NA cells on a MultiIndex DataFrame

In [85]:
df = pd.read_csv(
    'titanic_train.csv', 
    index_col=['Sex', 'Survived']
)
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,PassengerId,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Sex,Survived,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
male,0,1,3,"Braund, Mr. Owen Harris",22.0,1,0,A/5 21171,7.25,,S
female,1,2,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0,1,0,PC 17599,71.2833,C85,C
female,1,3,3,"Heikkinen, Miss. Laina",26.0,0,0,STON/O2. 3101282,7.925,,S
female,1,4,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0,1,0,113803,53.1,C123,S
male,0,5,3,"Allen, Mr. William Henry",35.0,0,0,373450,8.05,,S


In [86]:
df.count(level = "Sex")

Unnamed: 0_level_0,PassengerId,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
female,314,314,314,261,314,314,314,314,97,312
male,577,577,577,453,577,577,577,577,107,577


In [87]:
df.count(level = "Survived")

Unnamed: 0_level_0,PassengerId,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Survived,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,549,549,549,424,549,549,549,549,68,549
1,342,342,342,290,342,342,342,342,136,340


## 3. Count numeric only

In [88]:
df.count(numeric_only=True)

PassengerId    891
Pclass         891
Age            714
SibSp          891
Parch          891
Fare           891
dtype: int64

## 4. Applying `count()` to the `groupby()` result

In [90]:
df = pd.DataFrame({
    "Person": ["John", "Tom", "Lewis", "John", "Myla"],
    "Age": [24., np.nan, 21., 33, 26],
    "Single": [False, True, True, None, np.datetime64('NaT')],
    "Department": ["Product", "IT", "IT", "IT", "Product"]
})
df

Unnamed: 0,Person,Age,Single,Department
0,John,24.0,False,Product
1,Tom,,True,IT
2,Lewis,21.0,True,IT
3,John,33.0,,IT
4,Myla,26.0,NaT,Product


In [91]:
# On a specific column
df.groupby('Department')['Single'].count()

Department
IT         2
Product    1
Name: Single, dtype: int64

In [93]:
# On a specific column
df.groupby('Department')['Single'].agg('count')

Department
IT         2
Product    1
Name: Single, dtype: int64

In [94]:
# Without a column
df.groupby('Department').count()

Unnamed: 0_level_0,Person,Age,Single
Department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
IT,3,2,2
Product,2,2,1


In [95]:
# Without a column
df.groupby('Department').agg('count')

Unnamed: 0_level_0,Person,Age,Single
Department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
IT,3,2,2
Product,2,2,1


## 5. Combine the count back into the original DataFrame

In [96]:
# Create a dataframe
df = pd.DataFrame({
    "Person": ["John", "Tom", "Lewis", "John", "Myla"],
    "Age": [24., np.nan, 21., 33, 26],
    "Single": [False, True, True, None, np.datetime64('NaT')],
    "Department": ["Product", "IT", "IT", "IT", "Product"]
})
df

Unnamed: 0,Person,Age,Single,Department
0,John,24.0,False,Product
1,Tom,,True,IT
2,Lewis,21.0,True,IT
3,John,33.0,,IT
4,Myla,26.0,NaT,Product


In [98]:
# The problem with count() - can't match the number of rows in the original dataframe
df.groupby('Department')['Single'].count()

Department
IT         2
Product    1
Name: Single, dtype: int64

### Solution 1 - merge()

In [99]:
temp_df = df.groupby('Department')['Single'].count().rename('department_total_count').to_frame()
temp_df

Unnamed: 0_level_0,department_total_count
Department,Unnamed: 1_level_1
IT,2
Product,1


In [100]:
# Reset index
# temp_df.reset_index()

In [101]:
# Doing merge()
df_new = pd.merge(df, temp_df, on='Department', how='left')
df_new

Unnamed: 0,Person,Age,Single,Department,department_total_count
0,John,24.0,False,Product,1
1,Tom,,True,IT,2
2,Lewis,21.0,True,IT,2
3,John,33.0,,IT,2
4,Myla,26.0,NaT,Product,1


### Solution 2 - transform()

In [102]:
# With transform
df.groupby('Department')['Single'].transform('count')

0    1
1    2
2    2
3    2
4    1
Name: Single, dtype: int64

In [105]:
# So simply assign the result to a new column
df['department_total_single'] = df.groupby('Department')['Single'].transform('count')

In [106]:
df

Unnamed: 0,Person,Age,Single,Department,department_total_single
0,John,24.0,False,Product,1
1,Tom,,True,IT,2
2,Lewis,21.0,True,IT,2
3,John,33.0,,IT,2
4,Myla,26.0,NaT,Product,1


### Thanks for reading

This is a notebook for the medium article [Getting more value from the Pandas count()](https://bindichen.medium.com/getting-more-value-from-the-pandas-count-3e45a62c7077)

Please check out article for instructions