# Exploring Canadian naturalization records, 1915 to 1946

This notebook explores data harvested from the Library and Archives Canada database of [Naturalization Records, 1915-1946](http://www.bac-lac.gc.ca/eng/discover/immigration/citizenship-naturalization-records/naturalized-records-1915-1951/Pages/introduction.aspx).

To create this dataset, we first harvested records where the `country` was listed as 'China'. However, we realised that the wives and children of a naturalised man weren't assigned a country value and so will be missing from the harvested data. We attempted to overcome this by adding in records of what appeared to be family members, but this data might be inaccurate or incomplete.

The harvested data was saved as a [CSV file](lac-naturalisations-china-with-families.csv).

For full details of the harvesting process, see the [LAC section](https://glam-workbench.net/lac/) of the GLAM Workbench.

In [2]:
import pandas as pd
import altair as alt

In [7]:
df = pd.read_csv('lac-naturalisations-china-with-families.csv')
df.head()

Unnamed: 0,item_id,surname,given_names,country,relation,year,reference,page,pdf_id,pdf_url
0,2711,Fern,Charlie,China,,1922-1923,Canadian Gazette 1922-1923,364,P22-23_364,http://central.bac-lac.gc.ca/.item/?id=P22-23_...
1,3997,Hing,Mah Qong,China,,1922-1923,Canadian Gazette 1922-1923,389,P22-23_389,http://central.bac-lac.gc.ca/.item/?id=P22-23_...
2,4910,Ko,Jim Lee,China,,1922-1923,Canadian Gazette 1922-1923,406,P22-23_406,http://central.bac-lac.gc.ca/.item/?id=P22-23_...
3,5426,Lem,Frank Ho,China,,1922-1923,Canadian Gazette 1922-1923,416,P22-23_416,http://central.bac-lac.gc.ca/.item/?id=P22-23_...
4,5560,Ling,Chin Jeng,China,,1922-1923,Canadian Gazette 1922-1923,419,P22-23_419,http://central.bac-lac.gc.ca/.item/?id=P22-23_...


How many records are there?

In [6]:
df.shape[0]

626

How many records are relations (ie wives and children)?

In [5]:
df['relation'].value_counts()

Wife           108
Minor child     35
Name: relation, dtype: int64

Some years are recorded as a range â€“ let's put the first year mentioned into a separate field for aggregation.

In [10]:
df['year_int'] = df['year'].str.slice(0,4)

Let's look at the number of records per year.

In [12]:
df['year_int'].value_counts()

1925    99
1924    74
1927    55
1914    40
1926    39
1928    39
1930    30
1921    29
1931    29
1946    28
1929    24
1920    24
1923    23
1944    19
1922    15
1936     9
1938     8
1935     8
1945     8
1939     6
1942     6
1941     5
1937     3
1932     2
1940     1
1933     1
1943     1
1934     1
Name: year_int, dtype: int64

Let's include the `relation` field as well, so we can highlight women and children.

In [17]:
df['relation'].fillna('Not recorded', inplace=True)
year_counts = df.value_counts(['year_int', 'relation']).to_frame().reset_index()
year_counts.columns = ['year', 'relation', 'count']

In [23]:
alt.Chart(year_counts).mark_bar(size=15).encode(
    x=alt.X('year:Q', axis=alt.Axis(format='c')),
    y=alt.Y('count:Q', stack=True),
    color='relation:N',
    tooltip=['year', 'relation', 'count']
).properties(width=700)