In [72]:
import pandas as pd

# <div style="background-color:#595959; color: white; text-align:center; vertical-align: middle; padding: 12px; border-radius: 25px;">**101 Pandas** - *Groupby*</div>

[Tenger Data Technologies Ltd.](http://www.tengerdata.com/)

*Joe T. Boka*

*joe.tb (at) tengerdata (dot) com*

In this notebook we will explore ***.groups*** and ***.get_group***.

So, what's the difference between ***.groups*** and ***.get_group()***?

# <div style="background-color:#21a2dd; color: white; text-align:center; vertical-align: middle; padding: 12px; border-radius: 25px;">**Example: Pandas.core.groupby.GroupBy.groups**</div>

First, we construct a Data Frame.

In [73]:
data =  [['bar', 'cat1', 1],
         ['foo', 'cat3', 2],
         ['foo', 'cat2', 3],
         ['foo', 'cat3', 4],
         ['bar', 'cat2', 5], 
         ['bar', 'cat1', 6]]

In [74]:
df = pd.DataFrame(data,columns=['A','B','C'])
df

Unnamed: 0,A,B,C
0,bar,cat1,1
1,foo,cat3,2
2,foo,cat2,3
3,foo,cat3,4
4,bar,cat2,5
5,bar,cat1,6


**.groups**

The ***groups*** attribute is a dictionary where the dict keys are unique groups and values are the axis labels.

In [75]:
groupped = df.groupby('B').groups
groupped

{'cat1': Int64Index([0, 5], dtype='int64'),
 'cat2': Int64Index([2, 4], dtype='int64'),
 'cat3': Int64Index([1, 3], dtype='int64')}

The ***groups*** returns a dict, so we can easily construct a new data frame with the dict key as the column names.

In [76]:
ndf = pd.DataFrame(groupped).reset_index(drop=True)
ndf

Unnamed: 0,cat1,cat2,cat3
0,0,2,1
1,5,4,3


# <div style="background-color:#21a2dd; color: white; text-align:center; vertical-align: middle; padding: 12px; border-radius: 25px;">**Example: Pandas.core.groupby.GroupBy.get_group( )**</div>

Using ***get_group()***, we can select a single group or groups from multiple columns.

In [77]:
df.groupby('B').get_group('cat2').reset_index(drop=True)# we selected a single group

Unnamed: 0,A,B,C
0,foo,cat2,3
1,bar,cat2,5


In [78]:
df.groupby(['A','B']).get_group(('bar','cat1')).reset_index(drop=True)

Unnamed: 0,A,B,C
0,bar,cat1,1
1,bar,cat1,6


Using ***get_group()*** in a for loop, we can output a nested list of the grouped values:

In [79]:
g = df['B'].unique()
g

array(['cat1', 'cat3', 'cat2'], dtype=object)

In [80]:
result = []
for cat in g:
    lst = df.groupby('B')['C'].get_group(cat).tolist()
    print(lst)
    result.append(lst)

[1, 6]
[2, 4]
[3, 5]


In [81]:
result

[[1, 6], [2, 4], [3, 5]]