<a href="https://colab.research.google.com/github/veronikacoding/data-and-python/blob/main/Projects/Female_employment_data_lists.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Female employment rates

## 1971 to 2023 UK

Link to data source: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/timeseries/lf25/lms

The data has been retrieved and used to create a list of dictionary records.  Each record contains a time_period and the female employment rate (seasonally adjusted %) for the UK.

Some questions to ask of the data might be how much the percentage of the female population in the workplace has changed since 1971, is the rate of change different in different parts of the year?

Run the code cell below to generate, and print, a list of dictionaries, so that you can see the data.

In [None]:
import pandas as pd

def get_data():
  url = "https://drive.google.com/uc?id=1aQ8Xy-Sw046j-JCiIOu3ETqrACFsYuCj"
  df = pd.read_csv(url, skiprows=8, header=None)
  time_periods = df[0].tolist()
  employment = df[1].tolist()
  datalist = []
  for i in range(len(df)):
    datalist.append({"Time_period":time_periods[i], "Employment":employment[i]})
  return datalist

employment_data = get_data()
for item in employment_data:
  print(item)



There is a collection of different types of time_periods (whole years, years in quarters, months).

Observe the lists and then write code to create a set of smaller lists that make some sense.  Here are some ideas:

* a list of all the employment figures for whole years
* a list of all the employment figures for Q1 for all given years (and another for Q2, etc)
* a list of all the employment figures for January for all given years (and another for February, etc)  


In [None]:
df = pd.DataFrame(employment_data)
#a list of all the employment figures for whole years (the str.match(r'^\d{4}$') will find any value that has 4 digits only - basically excluding months or quarters)
whole_year_figures = df[df["Time_period"].str.match(r'^\d{4}$')]
for index, row in whole_year_figures.iterrows():
  print(f'Time_period: {row["Time_period"]}, Employment: {row["Employment"]}')

In [None]:
#a list of all the employment figures for Q1 for all given years (to change the code to refer to other quarter, please change "Q1" to "Q2" and so on...)
q1_employment_figures = df[df["Time_period"].str.contains("Q1")]
for index, row in q1_employment_figures.iterrows():
  print(f'Time_period: {row["Time_period"]}, Employment: {row["Employment"]}')

In [None]:
#a list of all the employment figures for January for all given years (to change the code to refer to other quarter, please change "JAN" to "FEB" and so on...)
q1_employment_figures = df[df["Time_period"].str.contains("JAN")]
for index, row in q1_employment_figures.iterrows():
  print(f'Time_period: {row["Time_period"]}, Employment: {row["Employment"]}')

Calculate some statistics from your lists (e.g. mean, max, min, etc for all years, for the same quarter, or month, over all years, etc) and think about correlation between quarter and employment rate - can you show this?

In [None]:
#yearly statistics:
df["Year"] = df["Time_period"].str.extract(r'(\d{4})')
df["Quarter"] = df["Time_period"].str.extract(r'Q(\d)')
df["Month"] = df["Time_period"].str.extract(r'(\w{3})')

group_by_year = df.groupby("Year")
yearly_average = group_by_year["Employment"].mean()
print("The yearly employment rate average for every year are: ")
print(yearly_average)

In [24]:
year_with_highest_avg = yearly_average.loc[yearly_average["Employment"].idmax()]
print(year_with_highest_avg)

KeyError: ignored

In [None]:
#quarterly statistics:
quarterly_mean_in_q1 = q1_employment_figures["Employment"].mean()
quarterly_mean_in_q2 = df[df["Time_period"].str.contains("Q2")]["Employment"].mean()
quarterly_mean_in_q3 = df[df["Time_period"].str.contains("Q3")]["Employment"].mean()
quarterly_mean_in_q4 = df[df["Time_period"].str.contains("Q4")]["Employment"].mean()
print("The quarterly means between quarter in all years are: ")
print(quarterly_mean_in_q1)
print(quarterly_mean_in_q2)
print(quarterly_mean_in_q3)
print(quarterly_mean_in_q4)
print("This shows that there isn't a big difference between which quarter (Q1 or Q2 etc.) we are considering as employment rate is averaging out amongst the quarters")

Think about how you might visualise something like the average for each quarter over the years available.  Maybe you can print a number of stars on a line for each 1%?  e.g.

```
Q1 ***************************************************
Q2 **********************************************************
Q3 *****************************************************************
Q4 ***************************************************
```
