I think it would be better to skip pandas from this.

# Counting in Python
We'll demonstrate four different ways to count in Python.  In terms of simplicity, probably the Counter method is the best, but since we'll often have data already in NumPy or pandas formats, then those methods will also be useful.  The initial approach is also important because it demonstrates basic Python concepts.

In [1]:
import numpy as np
import pandas as pd
from collections import Counter

In [2]:
A = [7, 1, 5, 3, 0, 5, 3, 6, 6, 5, 1, 5]

Given that list A, how can we make a dictionary like the following:
`{7: 1, 1: 2, 5: 4, 3: 2, 0: 1, 6: 2}`

## Basic approach

In [3]:
# Counting how often 5 occurs in A
A.count(5)

4

In [4]:
d = {}
for x in A:
    d[x] = A.count(x)
    print(d)

{7: 1}
{7: 1, 1: 2}
{7: 1, 1: 2, 5: 4}
{7: 1, 1: 2, 5: 4, 3: 2}
{7: 1, 1: 2, 5: 4, 3: 2, 0: 1}
{7: 1, 1: 2, 5: 4, 3: 2, 0: 1}
{7: 1, 1: 2, 5: 4, 3: 2, 0: 1}
{7: 1, 1: 2, 5: 4, 3: 2, 0: 1, 6: 2}
{7: 1, 1: 2, 5: 4, 3: 2, 0: 1, 6: 2}
{7: 1, 1: 2, 5: 4, 3: 2, 0: 1, 6: 2}
{7: 1, 1: 2, 5: 4, 3: 2, 0: 1, 6: 2}
{7: 1, 1: 2, 5: 4, 3: 2, 0: 1, 6: 2}


In [8]:
d = {}
for x in A:
    d[x] = A.count(x)
d

{7: 1, 1: 2, 5: 4, 3: 2, 0: 1, 6: 2}

In [9]:
d[5]

4

In [10]:
d[4]

KeyError: 4

In [11]:
d.get(4,0)

0

In [12]:
d.get(5,0)

4

In [15]:
# Sets in Python, like in Mathematics, do not contain duplicates.
set(A)

{0, 1, 3, 5, 6, 7}

In [16]:
d = {}
for x in set(A):
    d[x] = A.count(x)
    print(d)

{0: 1}
{0: 1, 1: 2}
{0: 1, 1: 2, 3: 2}
{0: 1, 1: 2, 3: 2, 5: 4}
{0: 1, 1: 2, 3: 2, 5: 4, 6: 2}
{0: 1, 1: 2, 3: 2, 5: 4, 6: 2, 7: 1}


In [17]:
d = {}
for x in set(A):
    d[x] = A.count(x)
d

{0: 1, 1: 2, 3: 2, 5: 4, 6: 2, 7: 1}

In [18]:
rng = np.random.default_rng()

In [21]:
B = list(rng.integers(low=0,high=10,size=20000))

In [23]:
%%time
d = {}
for x in B:
    d[x] = B.count(x)
d

CPU times: user 7.79 s, sys: 24.8 ms, total: 7.81 s
Wall time: 7.82 s


{8: 1993,
 9: 2048,
 4: 2012,
 6: 2016,
 7: 1968,
 5: 1979,
 1: 2014,
 2: 2023,
 3: 1953,
 0: 1994}

Iterating over the elements in set(B), rather than in B, is over 1000 times faster!

In [24]:
%%time
d = {}
for x in set(B):
    d[x] = B.count(x)
d

CPU times: user 5.34 ms, sys: 473 µs, total: 5.81 ms
Wall time: 5.41 ms


{0: 1994,
 1: 2014,
 2: 2023,
 3: 1953,
 4: 2012,
 5: 1979,
 6: 2016,
 7: 1968,
 8: 1993,
 9: 2048}

In [25]:
d = {}
for x in set(A):
    d[x] = A.count(x)
d

{0: 1, 1: 2, 3: 2, 5: 4, 6: 2, 7: 1}

## Using collections.Counter

In [26]:
A

[7, 1, 5, 3, 0, 5, 3, 6, 6, 5, 1, 5]

In [28]:
c = Counter(A)
c

Counter({7: 1, 1: 2, 5: 4, 3: 2, 0: 1, 6: 2})

In [29]:
isinstance(c,dict)

True

In [30]:
type(c)

collections.Counter

In [31]:
c

Counter({7: 1, 1: 2, 5: 4, 3: 2, 0: 1, 6: 2})

In [32]:
d

{0: 1, 1: 2, 3: 2, 5: 4, 6: 2, 7: 1}

In [33]:
c[5]

4

In [34]:
d[5]

4

In [35]:
c[4]

0

In [36]:
d[4]

KeyError: 4

I forgot that we'd already gone over the get method above!

In [37]:
d.get(4,0)

0

In [38]:
d.get(5,0)

4

In [39]:
c.most_common(2)

[(5, 4), (1, 2)]

In [41]:
%%time
Counter(B)

CPU times: user 1.82 ms, sys: 1e+03 ns, total: 1.82 ms
Wall time: 1.82 ms


Counter({8: 1993,
         9: 2048,
         4: 2012,
         6: 2016,
         7: 1968,
         5: 1979,
         1: 2014,
         2: 2023,
         3: 1953,
         0: 1994})

In [42]:
# There's probably no reason to convert from c to a dictionary, but if you really want to:
dict(c)

{7: 1, 1: 2, 5: 4, 3: 2, 0: 1, 6: 2}

## Using NumPy

In [43]:
A

[7, 1, 5, 3, 0, 5, 3, 6, 6, 5, 1, 5]

In [44]:
arr = np.array(A)

In [45]:
np.unique(arr)

array([0, 1, 3, 5, 6, 7])

In [46]:
set(A)

{0, 1, 3, 5, 6, 7}

In [47]:
np.unique(arr,return_counts=True)

(array([0, 1, 3, 5, 6, 7]), array([1, 2, 2, 4, 2, 1]))

In [48]:
elts, counts = np.unique(arr,return_counts=True)

In [49]:
elts

array([0, 1, 3, 5, 6, 7])

In [50]:
counts

array([1, 2, 2, 4, 2, 1])

In [51]:
zip(elts,counts)

<zip at 0x7fbae123e0a0>

In [52]:
list(zip(elts,counts))

[(0, 1), (1, 2), (3, 2), (5, 4), (6, 2), (7, 1)]

In [54]:
d = dict(zip(elts,counts))
d

{0: 1, 1: 2, 3: 2, 5: 4, 6: 2, 7: 1}

In [55]:
d[5]

4

In [56]:
arr == 5

array([False, False,  True, False, False,  True, False, False, False,
        True, False,  True])

In [57]:
# This is the numpy equivalent of A.count(5)
np.count_nonzero(arr==5)

4

In [67]:
elts == 5

array([False, False, False,  True, False, False])

In [59]:
# An example of what's called "Boolean indexing"
# Notice how elts == 5 is a Boolean array.
counts[elts == 5]

array([4])

## Using pandas

In [60]:
A

[7, 1, 5, 3, 0, 5, 3, 6, 6, 5, 1, 5]

In [61]:
s = pd.Series(A)
s

0     7
1     1
2     5
3     3
4     0
5     5
6     3
7     6
8     6
9     5
10    1
11    5
dtype: int64

In [63]:
s2 = s.value_counts()
s2

5    4
1    2
3    2
6    2
0    1
7    1
dtype: int64

In [64]:
s2[5]

4

In [65]:
s2[4]

KeyError: 4

In [66]:
dict(s2)

{5: 4, 1: 2, 3: 2, 6: 2, 0: 1, 7: 1}