# Appendix A - Python Refresher

|                | Static     | Interactive |
|----------------|------------|-------------|
|**Low-level**   | Matplotlib | Plotly      | 
|**High-level**  | Seaborn    | Plotly Express, Altair, Bokeh | 

This appendix provides a real-world example of data processing and can be used to test or refresh your Python knowledge.

FDA publishes medical device recalls dataset on regular basis. Thedata set has a field called "event_date_initiated". 
This is the date when a medical device recall is initiated. We extracted this field and saved the data to the file event_date_initiated.csv for exploration and processing.

This example uses the following Python basic concepts:
- File input and output
    - Read data from  a text file
    - Write data to a text file
- if/elif control flow
- for loops
- String operations 
    - split()
    - strip()
    - replace()
    - startswith()
    - indexing
- List operation 
    - append()
    - join()
    - indexing
- List comprehension
- Set 
    - Members in a set must be unique
    - convert a list to a set 
- f-string substitution
- build-in functions
    - Print()
    - len()
    - type()
    - range(start, stop, step)

## 1. Basic Data Exploration

**Read the file and assign the lines to a list.**
- Option "r" for read
- Option "t" for text

In [42]:
with open("../data/event_date_initiated.csv", "rt") as f:
    date_list = f.readlines() 

**Find out what type of object the date_list is**

In [43]:
type(date_list)

list

**Find out the number of elements in the list**

In [44]:
len(date_list)

43750

**Display the first five elements**

In [45]:
date_list[:5]

['event_date_initiated\n',
 '2002-12-26\n',
 '2003-03-25\n',
 '2003-03-25\n',
 '2004-01-27\n']

**Ignore the first element**

In [46]:
date_list = date_list[1:]
date_list[:5]

['2002-12-26\n',
 '2003-03-25\n',
 '2003-03-25\n',
 '2004-01-27\n',
 '2003-12-10\n']

**Display the last five elements**

In [47]:
date_list[-5:]

['2020-09-09\n',
 '2020-07-06\n',
 '2020-01-17\n',
 '2020-05-28\n',
 '2020-08-20\n']

**Remove the new line "\n" using List Comprehension**

In [48]:
date_list = [x.strip("\n") for x in date_list]
date_list[:5]

['2002-12-26', '2003-03-25', '2003-03-25', '2004-01-27', '2003-12-10']

## 2. Find out the time span (unique years)

**Extract the year from the date**

In [49]:
year_list = [x.split("-")[0] for x in date_list]
year_list[:5]

['2002', '2003', '2003', '2004', '2003']

In [50]:
len(year_list)

43749

**Get unique years using set() function**

In [51]:
year_set = set(year_list)
len(year_set)

26

In [52]:
year_set

{'0010',
 '0012',
 '0013',
 '1997',
 '1998',
 '2000',
 '2001',
 '2002',
 '2003',
 '2004',
 '2005',
 '2006',
 '2007',
 '2008',
 '2009',
 '2010',
 '2011',
 '2012',
 '2013',
 '2014',
 '2015',
 '2016',
 '2017',
 '2018',
 '2019',
 '2020'}

**Replace the year 0010, 0012, 0013 with 2010, 2012, 2013**

In [53]:
date_list2 = []

for init_date in date_list:
    if init_date.startswith("00"):
        date_list2.append(init_date.replace("00", "20", 1))
    else:
        date_list2.append(init_date)
        
date_list2[:10]

['2002-12-26',
 '2003-03-25',
 '2003-03-25',
 '2004-01-27',
 '2003-12-10',
 '2004-01-27',
 '2003-03-20',
 '2003-08-08',
 '2000-11-16',
 '2002-10-31']

## 3. How do you know the changes were successful?

### Method one 
Find the new unique list of years

In [54]:
year_list2 = [x.split("-")[0] for x in date_list2]
set(year_list2)

{'1997',
 '1998',
 '2000',
 '2001',
 '2002',
 '2003',
 '2004',
 '2005',
 '2006',
 '2007',
 '2008',
 '2009',
 '2010',
 '2011',
 '2012',
 '2013',
 '2014',
 '2015',
 '2016',
 '2017',
 '2018',
 '2019',
 '2020'}

### Method two
Rewrite the previous block of code to display conversion results.

In [55]:
date_list2 = []

for i in range(len(date_list)):
    init_date = date_list[i]
    if init_date.startswith("00"):
        print(f"found and corrected problemtic date {init_date} at position {i}")
        date_list2.append(init_date.replace("00", "20", 1))
    else:
        date_list2.append(init_date)

found and corrected problemtic date 0012-12-06 at position 2290
found and corrected problemtic date 0013-11-26 at position 2344
found and corrected problemtic date 0012-11-30 at position 2432
found and corrected problemtic date 0013-05-16 at position 5267
found and corrected problemtic date 0013-03-05 at position 6045
found and corrected problemtic date 0013-05-16 at position 6801
found and corrected problemtic date 0012-12-13 at position 19636
found and corrected problemtic date 0013-04-12 at position 19910
found and corrected problemtic date 0013-03-05 at position 25453
found and corrected problemtic date 0013-11-25 at position 27812
found and corrected problemtic date 0013-12-13 at position 27855
found and corrected problemtic date 0013-04-11 at position 29256
found and corrected problemtic date 0013-03-05 at position 32382
found and corrected problemtic date 0013-04-12 at position 33046
found and corrected problemtic date 0010-08-17 at position 36031
found and corrected problemtic 

In [56]:
print(f"The old date is {date_list[41681]}")

The old date is 0013-03-05


In [57]:
print(f"The old date is {date_list2[41681]}")

The old date is 2013-03-05


### Method three
Compare the two lists and display the differences

In [58]:
for i in range(len(date_list)):
    if date_list[i] != date_list2[i]:
        print(date_list[i], "->", date_list2[i])

0012-12-06 -> 2012-12-06
0013-11-26 -> 2013-11-26
0012-11-30 -> 2012-11-30
0013-05-16 -> 2013-05-16
0013-03-05 -> 2013-03-05
0013-05-16 -> 2013-05-16
0012-12-13 -> 2012-12-13
0013-04-12 -> 2013-04-12
0013-03-05 -> 2013-03-05
0013-11-25 -> 2013-11-25
0013-12-13 -> 2013-12-13
0013-04-11 -> 2013-04-11
0013-03-05 -> 2013-03-05
0013-04-12 -> 2013-04-12
0010-08-17 -> 2010-08-17
0013-03-05 -> 2013-03-05
0013-03-05 -> 2013-03-05
0013-03-05 -> 2013-03-05
0013-03-05 -> 2013-03-05
0013-03-05 -> 2013-03-05
0013-03-05 -> 2013-03-05


## 4. Saved the corrected data to a file

**Save the data to a text file**
- Option "w" for write
- Option "t" for text

In [59]:
with open("event_date_initiated_corrected.csv", "wt") as f:
    f.write("\n".join(date_list2))

**Read the saved file back in to make sure it works.**

In [60]:
with open("event_date_initiated_corrected.csv") as f:
    corrected_date_list = f.readlines() 
    
corrected_date_list[:5]

['2002-12-26\n',
 '2003-03-25\n',
 '2003-03-25\n',
 '2004-01-27\n',
 '2003-12-10\n']

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=d6dd3d4a-0a86-4798-8e4d-ea84481681f7' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>