# Pandas 101 11-20 

## 11. How to bin a numeric series to 10 groups of equal size?

Bin the series ser into 10 equal deciles and replace the values with the bin name

In [5]:
# Input
import pandas as pd 
import numpy as np 

ser = pd.Series(np.random.random(20))
ser

0     0.403074
1     0.848082
2     0.719311
3     0.342033
4     0.351218
5     0.473166
6     0.979855
7     0.949054
8     0.404358
9     0.973079
10    0.494721
11    0.453708
12    0.760552
13    0.099435
14    0.474192
15    0.778515
16    0.019128
17    0.194700
18    0.830901
19    0.020884
dtype: float64

In [6]:
# Solution
pd.qcut(ser, q=[0, .10, .20, .3, .4, .5, .6, .7, .8, .9, 1], 
        labels=['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th']).head()

0    4th
1    9th
2    7th
3    3rd
4    3rd
dtype: category
Categories (10, object): [1st < 2nd < 3rd < 4th ... 7th < 8th < 9th < 10th]

### pandas.cut:
pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False)

cut将根据值本身来选择箱子均匀间隔，即每个箱子的间距都是相同的

In [7]:
pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]), 3, retbins=True)
# retbins,布尔值。是否返回面元

([(0.19, 3.367], (0.19, 3.367], (0.19, 3.367], (3.367, 6.533], (6.533, 9.7], (0.19, 3.367]]
 Categories (3, interval[float64]): [(0.19, 3.367] < (3.367, 6.533] < (6.533, 9.7]],
 array([0.1905    , 3.36666667, 6.53333333, 9.7       ]))

### pd.qcut:
pandas.qcut(x, q, labels=None, retbins=False, precision=3, duplicates=’raise’)
qcut是根据这些值的频率来选择箱子的均匀间隔，即每个箱子中含有的数的数量是相同的

In [8]:
pd.qcut(range(5), 3, labels=["good", "medium", "bad"])

[good, good, medium, bad, bad]
Categories (3, object): [good < medium < bad]

## 12. How to convert a numpy array to a dataframe of given shape? (L1)

Reshape the series ser into a dataframe with 7 rows and 5 columns

In [11]:
# input 
ser = pd.Series(np.random.randint(1, 10, 35))

ser

0     2
1     4
2     1
3     2
4     5
5     1
6     8
7     3
8     4
9     3
10    1
11    1
12    2
13    1
14    6
15    8
16    1
17    4
18    6
19    1
20    8
21    5
22    3
23    2
24    9
25    3
26    8
27    9
28    9
29    5
30    6
31    8
32    9
33    7
34    1
dtype: int32

In [15]:
# solution 
print(type(ser.values))
ser.values

<class 'numpy.ndarray'>


array([2, 4, 1, 2, 5, 1, 8, 3, 4, 3, 1, 1, 2, 1, 6, 8, 1, 4, 6, 1, 8, 5,
       3, 2, 9, 3, 8, 9, 9, 5, 6, 8, 9, 7, 1])

In [16]:
df= pd.DataFrame(ser.values.reshape((7,5)))
df

Unnamed: 0,0,1,2,3,4
0,2,4,1,2,5
1,1,8,3,4,3
2,1,1,2,1,6
3,8,1,4,6,1
4,8,5,3,2,9
5,3,8,9,9,5
6,6,8,9,7,1


## 13. How to find the positions of numbers that are multiples of 3 from a series?

Find the positions of numbers that are multiples of 3 from ser.

In [19]:
# Input
ser = pd.Series(np.random.randint(1, 10, 7))
ser

0    2
1    1
2    2
3    6
4    4
5    6
6    6
dtype: int32

In [20]:
# Solution
print(ser)
np.argwhere(ser % 3==0)  # 给出 index : argwhere 

0    2
1    1
2    2
3    6
4    4
5    6
6    6
dtype: int32


array([[3],
       [5],
       [6]], dtype=int64)

## 14. How to extract items at given positions from a series

From ser, extract the items at positions in list pos.

In [23]:
# Input
ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]

ser

0     a
1     b
2     c
3     d
4     e
5     f
6     g
7     h
8     i
9     j
10    k
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
19    t
20    u
21    v
22    w
23    x
24    y
25    z
dtype: object

In [22]:
# Solution
ser.take(pos)

0     a
4     e
8     i
14    o
20    u
dtype: object

## 15. How to stack two series vertically and horizontally ?

Stack ser1 and ser2 vertically and horizontally (to form a dataframe).

In [24]:
# Input
ser1 = pd.Series(range(5))
ser2 = pd.Series(list('abcde'))

In [25]:
# Output
# Vertical
ser1.append(ser2)

0    0
1    1
2    2
3    3
4    4
0    a
1    b
2    c
3    d
4    e
dtype: object

In [26]:
# Horizontal
df = pd.concat([ser1, ser2], axis=1)
print(df)

   0  1
0  0  a
1  1  b
2  2  c
3  3  d
4  4  e


## 16. How to get the positions of items of series A in another series B?

Get the positions of items of ser2 in ser1 as a list.

### Do Not Understand

In [29]:
# Input
ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])

In [30]:
# Solution 2
[pd.Index(ser1).get_loc(i) for i in ser2]

[5, 4, 0, 8]

## 17. How to compute the mean squared error on a truth and predicted series?

Compute the mean squared error of truth and pred series.

In [31]:
# Input
truth = pd.Series(range(10))
pred = pd.Series(range(10)) + np.random.random(10)

In [32]:
# Solution
np.mean((truth-pred)**2)

0.3245596574717825

## 18. How to convert the first character of each element in a series to uppercase?

Change the first character of each word to upper case in each word of ser.

In [33]:
# Input
ser = pd.Series(['how', 'to', 'kick', 'ass?'])

In [34]:
# Solution 1
ser.map(lambda x: x.title())

0     How
1      To
2    Kick
3    Ass?
dtype: object

In [35]:
# Solution 2
ser.map(lambda x: x[0].upper() + x[1:])

0     How
1      To
2    Kick
3    Ass?
dtype: object

## 19. How to calculate the number of characters in each word in a series?

In [37]:
# Input
ser = pd.Series(['how', 'to', 'kick', 'ass?'])


In [38]:
# Solution
ser.map(lambda x: len(x))

0    3
1    2
2    4
3    4
dtype: int64

## 20. How to compute difference of differences between consequtive numbers of a series?

Difference of differences between the consequtive numbers of ser.

In [39]:
ser = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])

In [41]:
# Solution
print(ser.diff().tolist())

[nan, 2.0, 3.0, 4.0, 5.0, 6.0, 6.0, 8.0]


In [42]:
print(ser.diff().diff().tolist())

[nan, nan, 1.0, 1.0, 1.0, 1.0, 0.0, 2.0]
