<a href="https://colab.research.google.com/github/vanderbilt-data-science/p4ai-essentials/blob/main/3_iteration_and_tidbits_solns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Iteration and Final Tidbits
> Rounding out our Python knowledge and tidbits for sucess with HuggingFace

Welcome to our last day of our Python for AI crash course! In today's lesson, we'll cover conditional execution, iteration, and some last tidbits to knowledge to bolster your success with HuggingFace. We'll continue learning the syntax and grammar of the Python language to effectively communicate our goals to Python.

In this lesson, you'll learn:
* Communicating conditional execution syntax to Python
* Different ways of communicating iteration to Python
* Practice with common tools of iteration with HuggingFace
* Tips and tricks common in the HuggingFace documentation and coursework

Let's get started!


# Conditional Execution
Let's start by revisiting our oven example. Recall that our function was:

```
# A function to cook in oven
def cook_in_oven(food_to_cook, time=25, temperature=375, rack=2):
  '''
  Function cook_in_oven: cooks input food based on oven parameters
    food_to_cook: String indicating the food to be cooked
    time (default 25): Integer of the time required to cook the food in minutes
    temperature (default 375): Integer temperature at which food should be cooked in degrees Fahrenheit
    rack: Integer of rack that the food should be cooked at

    returns: string of food, if cooked, returns '- Cooked!' appended to the string
  '''

  #cook food, an example
  if time>20 and temperature>215:
    food_to_cook = food_to_cook + ' - Cooked!'
  
  return food_to_cook
```
This already has an example of conditional execution, in which a variable is only modified if conditions are met. Let's explore this at greater depth.

In [None]:
#@title Sweeping Generalizations about Recipes
#@markdown Let us for a moment suspend disbelief and assume that
#@markdown one can always cook certain foods at certain temperatures
#@markdown and times and we would like to encode this programmatically.
#@markdown Let's think about what these can be:

#@markdown **Condition A** -  food is a cake:
time = 0 #@param {type:"integer"}
temp = 0 #@param {type:"integer"}

#@markdown **Condition B** - the food is fish:
time = 0 #@param {type:"integer"}
temp = 0 #@param {type:"integer"}

#@markdown **Condition C** - the food is casserole:
time = 0 #@param {type:"integer"}
temp = 0 #@param {type:"integer"}

#@markdown If the food is anything else:
time = 0 #@param {type:"integer"}
temp = 0 #@param {type:"integer"}


## Writing conditional execution code
What we've written above is essentially the form of one or more `if`, `if-else`, `if-elif-else` statements. The syntax for communicating conditional execution looks like so:

### `if` statements
```
if condition:
  #code block for if condition true
```

### `if-else` statements
`if-else` statements allow binary, mutually exclusive decisions:

```
if condition:
  #code block for if condition true
else:
  #code block for if condition false
```

### `if-elif-else` statements

`if-elif-else` statements allow multiple, mutually exclusive decisions:
```
if condition A:
  #code block for if condition A true
elif condition B:
  #code block for if condition B true
elif condition C:
  #code block for if condition C true
else:
  #code block for none of the above conditions are true
```

Let's see what this looks like for our code.

## Guided Exercise 1 - `if`
Starting with the function that we had, let's ONLY consider Condition A; if only we knew about the settings for fish. Let's amend our code to set the cook and temperature times for cake foods only.

In [6]:
# Amend code to reflect only Condition A
def cook_in_oven_if(food_to_cook, time=20, temperature=375, rack=2):
  '''
  Function cook_in_oven: cooks input food based on oven parameters
    food_to_cook: String indicating the food to be cooked
    time (default 20): Integer of the time required to cook the food in minutes
    temperature (default 375): Integer temperature at which food should be cooked in degrees Fahrenheit
    rack: Integer of rack that the food should be cooked at

    returns: string of food, if cooked, returns '- Cooked!' appended to the string
  '''
  #New code for cake
  if 'cake' in food_to_cook:
    time = 30
    temperature = 350
    print('cake cooked at', time, 'at temperature', temperature, 'for', food_to_cook)

  #cook food, an example
  if time>20 and temperature>215:
    food_to_cook = food_to_cook + ' - Cooked!'

  return food_to_cook

In [12]:
#Verify behavior
demo_food_a = 'red velvet cake'
print(cook_in_oven_if(demo_food_a))

demo_food_b = 'lasagna'
print(cook_in_oven_if(demo_food_b))

cake cooked at 30 at temperature 350 for red velvet cake
red velvet cake - Cooked!
lasagna


## Guided Exercise 2 - `if-else`
Let's say we're unsatisfied with the code we started with and would prefer for uncooked food to have `'- NOT cooked'` appended to it. Let's change the code above to make this happen.

In [14]:
# Amend Guided Exercise 1 code to reflect uncooked
def cook_in_oven_if_else(food_to_cook, time=20, temperature=375, rack=2):
  '''
  Function cook_in_oven: cooks input food based on oven parameters
    food_to_cook: String indicating the food to be cooked
    time (default 20): Integer of the time required to cook the food in minutes
    temperature (default 375): Integer temperature at which food should be cooked in degrees Fahrenheit
    rack: Integer of rack that the food should be cooked at

    returns: string of food, if cooked, returns ' - Cooked!' appended to the string, otherwise appends ' - Not Cooked!'
  '''
  if 'cake' in food_to_cook:
    time = 30
    temperature = 350
    print('cake cooked at', time, 'at temperature', temperature, 'for', food_to_cook)

  #cook food, an example
  if time>20 and temperature>215:
    food_to_cook = food_to_cook + ' - Cooked!'
  else:
    food_to_cook = food_to_cook + ' - Not Cooked!'

  return food_to_cook

In [15]:
#Verify behavior
demo_food_a = 'red velvet cake'
print(cook_in_oven_if_else(demo_food_a))

demo_food_b = 'lasagna'
print(cook_in_oven_if_else(demo_food_b))

cake cooked at 30 at temperature 350 for red velvet cake
red velvet cake - Cooked!
lasagna - Not Cooked!


## Guided Exercise 3 - `if-elif-else`
Now, let's encode our original form for cakes, fish, and casseroles.

In [23]:
# Amend Guided Exercise 2 code to reflect all foods
def cook_in_oven_if_elif_else(food_to_cook, time=20, temperature=375, rack=2):
  '''
  Function cook_in_oven: cooks input food based on oven parameters
    food_to_cook: String indicating the food to be cooked
    time (default 20): Integer of the time required to cook the food in minutes
    temperature (default 375): Integer temperature at which food should be cooked in degrees Fahrenheit
    rack: Integer of rack that the food should be cooked at

    returns: string of food, if cooked, returns ' - Cooked!' appended to the string, otherwise appends ' - Not Cooked!'
  '''
  if 'cake' in food_to_cook:
    time = 30
    temperature = 350
    print('cake cooked at', time, 'at temperature', temperature, 'for', food_to_cook)
  elif 'fish' in food_to_cook:
    time = 25
    temperature = 375
    print('fish cooked at', time, 'at temperature', temperature, 'for', food_to_cook)
  elif 'casserole' in food_to_cook:
    time = 60
    temperature = 350
    print('casserole cooked at', time, 'at temperature', temperature, 'for', food_to_cook)
  else:
    time = 20
    temperature = 325

  #cook food, an example
  if time>20 and temperature>215:
    food_to_cook = food_to_cook + ' - Cooked!'
  else:
    food_to_cook = food_to_cook + ' - Not Cooked!'

  return food_to_cook

In [24]:
#Verify behavior with demo strings
demo_food_a = 'red velvet cake'
demo_food_b = 'swordfish'
demo_food_c = 'green bean casserole'
demo_food_d = 'fish casserole'
demo_food_e = 'lasagna'

print(cook_in_oven_if_elif_else(demo_food_a))
print(cook_in_oven_if_elif_else(demo_food_b))
print(cook_in_oven_if_elif_else(demo_food_c))
print(cook_in_oven_if_elif_else(demo_food_d))
print(cook_in_oven_if_elif_else(demo_food_e))

cake cooked at 30 at temperature 350 for red velvet cake
red velvet cake - Cooked!
fish cooked at 25 at temperature 375 for swordfish
swordfish - Cooked!
casserole cooked at 60 at temperature 350 for green bean casserole
green bean casserole - Cooked!
fish cooked at 25 at temperature 375 for fish casserole
fish casserole - Cooked!
lasagna - Not Cooked!


Consider the above behavior for `fish casserole`. It executed the line of code for `fish`, but DID NOT execute the line of code for `casserole`. Why is this?

# Iteration
We've already seen some examples of iteration, where we need to cycle through a collection data structure to apply statements to one or more of the elements of the collection (e.g. dictionaries, lists).

There are 2 primary ways that you see iteration in Python:
* `for` loops
* `comprehensions` (list, dictionary, generator)

A final type that we will see _at length_ with Huggingface is:
* `map`

Let's explore this.

## `for` loops
We've already seen some examples of `for` loops when we were learning about lists and dictionaries.

We said that our `for` goes through cyclical iterations, updating the index to process each element.

Our syntax was as follows:
```
for dummy_name in collection:
  ## indented code block steps to take
```

Let's explore this.

### Guided Exercise 1 - Avoiding copy/paste
Let's build on the exercise that we just did with our cakes. In even trying to do the demo with the cakes, we found that we had to copy/paste individual elements over and over. Let's see how we can combine lists with iteration to help us.

In [25]:
#create the list of elements
demo_foods = ['red velvet cake',
              'swordfish',
              'green bean casserole',
              'fish casserole',
              'lasagna']

In [26]:
#write for loop to iterate through elements and do the task
for demo_food in demo_foods:
  print(cook_in_oven_if_elif_else(demo_food))

cake cooked at 30 at temperature 350 for red velvet cake
red velvet cake - Cooked!
fish cooked at 25 at temperature 375 for swordfish
swordfish - Cooked!
casserole cooked at 60 at temperature 350 for green bean casserole
green bean casserole - Cooked!
fish cooked at 25 at temperature 375 for fish casserole
fish casserole - Cooked!
lasagna - Not Cooked!


### Guided Practical Exercise - Updating Elements
In yesterday's class, we saw an output data structure where we didn't like the labels that were in the data. We wanted to replace elements of the form `LABEL_0` with their true name, e.g., `negative`. Let's explore this.

In [36]:
#output from classifier
output = [{'label': 'LABEL_2', 'score': 0.9303200244903564},
          {'label': 'LABEL_0', 'score': 0.575447142124176},
          {'label': 'LABEL_1', 'score': 0.8416591286659241},
          {'label': 'LABEL_0', 'score': 0.9006277322769165}]

In [37]:
#build basic intuition with iteration
for cls_dict in output:

  #get the label for the element
  cls_label = cls_dict['label']

  #conditional execution
  if '0' in cls_label:
    cls_dict['label'] = 'negative'
  elif '1' in cls_label:
    cls_dict['label'] = 'neutral'
  elif '2' in cls_label:
    cls_dict['label'] = 'positive'
  else:
    raise ValueError(' '.join(['The class label', cls_label, 'is invalid']))

output

[{'label': 'positive', 'score': 0.9303200244903564},
 {'label': 'negative', 'score': 0.575447142124176},
 {'label': 'neutral', 'score': 0.8416591286659241},
 {'label': 'negative', 'score': 0.9006277322769165}]

This is one way in which we could do this. However, you'll see this more often in HuggingFace using the `id2label` style conversions using dictionaries. Let's explore this.

In [38]:
# Establish lists
label_list = ['LABEL_0', 'LABEL_1', 'LABEL_2']
label_semantics = ['negative', 'neutral', 'positive']

# Create lookup dictionary
label2semantics = dict(zip(label_list, label_semantics))
label2semantics

{'LABEL_0': 'negative', 'LABEL_1': 'neutral', 'LABEL_2': 'positive'}

Let's rewrite our for loop using this...

In [43]:
#output from classifier
output = [{'label': 'LABEL_2', 'score': 0.9303200244903564},
          {'label': 'LABEL_0', 'score': 0.575447142124176},
          {'label': 'LABEL_1', 'score': 0.8416591286659241},
          {'label': 'LABEL_0', 'score': 0.9006277322769165}]

Recall: When we look up a value using a key in a dictionary, the value is returned. So we can chain together this returned value with an operation.

What? Let's look.

In [44]:
# How do we get the value of a label from the dictionary element
output[0]['label']

'LABEL_2'

In [45]:
# How do we look up a value in a dictionary?
label2semantics['LABEL_2']

'positive'

In [46]:
# What does this look like chained together?
label2semantics[output[0]['label']]

'positive'

In [40]:
#update our for loop
for cls_dict in output:
  cls_dict['label'] = label2semantics[cls_dict['label']]

output

[{'label': 'positive', 'score': 0.9303200244903564},
 {'label': 'negative', 'score': 0.575447142124176},
 {'label': 'neutral', 'score': 0.8416591286659241},
 {'label': 'negative', 'score': 0.9006277322769165}]

### Try it Yourself!

In [None]:
#@markdown Recall yesterday's example where we wrote
#@markdown a function because we had a list of class IDs
#@markdown which we wanted to subtract 1 from.

#@markdown Yesterday, we wrote the function.

#@markdown In this exercise, you will apply that function to a list.

#@markdown The function that we wrote is shown below and
#@markdown placed in a code cell below this cell for execution.
#@markdown ```
#@markdown def adjust_label_values(label_in):
#@markdown 
#@markdown   #adjust value
#@markdown   label_in = label_in - 1
#@markdown 
#@markdown   #return the value
#@markdown   return label_in
#@markdown ```
#@markdown If you get stuck, use the `Show Code` button below
#@markdown to see the answer.

#@markdown Here is some guidance for how to think through how to solve this:
#@markdown 1. How do we access elements of a list?
#@markdown 2. How do we get the index and the value of an element when
#@markdown performing iteration? Recall the following code from Day 1 of our learning journey:
#@markdown ```
#@markdown for position, value in enumerate(floor0):
#@markdown 
#@markdown   # Room position 2 becomes a janitor closet
#@markdown   if position==2:
#@markdown     floor0[position] = 'janitor closet'
#@markdown   print(position, floor0[position])
#@markdown ```
#@markdown 3. What do we need to update the content of a single list element with?
#@markdown 4. Once we are able to access the index/position in a list,
#@markdown how do we update its contents?
#@markdown 5. How would we write the for loop to execute this task?

#1
# ids_list[0], where 0 is the position we're looking to see the contents of or change

#2
# We can use the enumerate function, which will provide both the index and value at
# that position in the list

#3
# We need to update the contents with the contents -1
# by calling our function

#4
# ids_list[pos] = adjust_label_values(single_id)

# Function
def adjust_label_values(label_in):

  #adjust value
  label_in = label_in - 1

  #return the value
  return label_in

# Demo list
ids_list = [2, 6, 3, 2, 1, 8, 2, 3, 2, 1]

# Code
for pos, single_id in enumerate(ids_list):
  ids_list[pos] = adjust_label_values(single_id)


In [47]:
# Function
def adjust_label_values(label_in):

  #adjust value
  label_in = label_in - 1

  #return the value
  return label_in

In [50]:
# Demo list
ids_list = [2, 6, 3, 2, 1, 8, 2, 3, 2, 1]

In [52]:
# Create your for loop here
for pos, single_id in enumerate(ids_list):
  ids_list[pos] = adjust_label_values(single_id)
ids_list

[1, 5, 2, 1, 0, 7, 1, 2, 1, 0]

In [56]:
#@ title An alternate solution - Copying
#@markdown It can sometimes be super annoying to directly
#@markdown update the list while developing your code. If
#@markdown your list is short enough (or you can subset) your list
#@markdown during development, you can either create a copy
#@markdown first or exercise other strategies to not modify
#@markdown the original list. 

#@markdown Let's see what this looks like.

# Demo list
ids_list = [2, 6, 3, 2, 1, 8, 2, 3, 2, 1]

# Method 1: Create a copy
ids_copy = ids_list.copy()

for pos, single_id in enumerate(ids_copy):
  ids_copy[pos] = adjust_label_values(single_id)

print('Method 1: Copy')
print('Original list:\n', ids_list)
print('Copy operated upon:\n', ids_copy)

# Method 2: Populate from empty list
ids_blank = list()
for pos, single_id in enumerate(ids_list):
  ids_blank.append(adjust_label_values(single_id))

print('\nMethod 2: Populate from empty list')
print('Original list:\n', ids_list)
print('Grown list :\n', ids_blank)

# Method 2: Populate from zeros list at size
ids_zeros = [0] * len(ids_list)
for pos, single_id in enumerate(ids_list):
  ids_zeros[pos] = adjust_label_values(single_id)

print('\nMethod 3: Replace elements of Zero List at Size')
print('Original list:\n', ids_list)
print('Replaced Zeros list :\n', ids_zeros)

Method 1: Copy
Original list:
 [2, 6, 3, 2, 1, 8, 2, 3, 2, 1]
Copy operated upon:
 [1, 5, 2, 1, 0, 7, 1, 2, 1, 0]

Method 2: Populate from empty list
Original list:
 [2, 6, 3, 2, 1, 8, 2, 3, 2, 1]
Grown list :
 [1, 5, 2, 1, 0, 7, 1, 2, 1, 0]

Method 3: Replace elements of Zero List at Size
Original list:
 [2, 6, 3, 2, 1, 8, 2, 3, 2, 1]
Replaced Zeros list :
 [1, 5, 2, 1, 0, 7, 1, 2, 1, 0]


## List Comprehensions
Another very compact way to represent for loops is through list comprehensions. They're great if you:
* Essentially have one function to apply to a list of elements
* Want to do binary conditional execution on elements of a list
* Want to perform filtering of elements (reduce the size of the list based on some condition)

The difficulty of list and dictionary comprehensions is the syntax because of the concise expression of the for loop. Let's take a look, but it offers wonderful functionality. It streamlines the creation of new lists based on old lists. Let's look at a brief comparison.

<center>
<img src="https://github.com/vanderbilt-data-science/p4ai-essentials/blob/main/img/iteration_comparison.png?raw=true" width="800">
</center>

### Guided Example 1 - Avoiding Copy Paste
Yep, that's right - we're doing the same exact example again, except using list comprehensions. Don't think about the outputs of this too hard, this is more like gaining experience with the syntax of list comprehensions rather than any output functionality.

Recall that we had:
```
demo_foods = ['red velvet cake',
              'swordfish',
              'green bean casserole',
              'fish casserole',
              'lasagna']

for demo_food in demo_foods:
  print(cook_in_oven_if_elif_else(demo_food))
```
We can re-express this as a list comprehension.

In [57]:
#demo foods
demo_foods = ['red velvet cake',
              'swordfish',
              'green bean casserole',
              'fish casserole',
              'lasagna']

In [60]:
# Re-express as a list comprehension
[print(cook_in_oven_if_elif_else(demo_food)) for demo_food in demo_foods]

cake cooked at 30 at temperature 350 for red velvet cake
red velvet cake - Cooked!
fish cooked at 25 at temperature 375 for swordfish
swordfish - Cooked!
casserole cooked at 60 at temperature 350 for green bean casserole
green bean casserole - Cooked!
fish cooked at 25 at temperature 375 for fish casserole
fish casserole - Cooked!
lasagna - Not Cooked!


[None, None, None, None, None]

### Guided Practical Exercise - Updating Elements
You guessed it! We'll now try list comprehensions with our example of replacing labels with their semantic equivalents! Before, we had:
```
#output from classifier
output = [{'label': 'LABEL_2', 'score': 0.9303200244903564},
          {'label': 'LABEL_0', 'score': 0.575447142124176},
          {'label': 'LABEL_1', 'score': 0.8416591286659241},
          {'label': 'LABEL_0', 'score': 0.9006277322769165}]

label_list = ['LABEL_0', 'LABEL_1', 'LABEL_2']
label_semantics = ['negative', 'neutral', 'positive']

# Create lookup dictionary
label2semantics = dict(zip(label_list, label_semantics))
label2semantics

for cls_dict in output:
  cls_dict['label'] = label2semantics[cls_dict['label']]
```

Let's do it again and re-express this as a list comprehension!

In [61]:
#express this as a list comprehension
[label2semantics[cls_dict['label']] for cls_dict in output]

['positive', 'negative', 'neutral', 'negative']

In [66]:
#express this as a list comprehension with the dictionary as the elements of the returned list
new_label_ids = [{**cls_dict, 'label':label2semantics[cls_dict['label']]} for cls_dict in output]
print(new_label_ids)
print(output)

[{'label': 'positive', 'score': 0.9303200244903564}, {'label': 'negative', 'score': 0.575447142124176}, {'label': 'neutral', 'score': 0.8416591286659241}, {'label': 'negative', 'score': 0.9006277322769165}]
[{'label': 'LABEL_2', 'score': 0.9303200244903564}, {'label': 'LABEL_0', 'score': 0.575447142124176}, {'label': 'LABEL_1', 'score': 0.8416591286659241}, {'label': 'LABEL_0', 'score': 0.9006277322769165}]


In [175]:
#@title Fun side tidbit
#@markdown When you're working with data structures like these,
#@markdown they lend themselves nicely to tabular inspection and usage.
#@markdown We can do this using the pandas API, which is a common
#@markdown Python package for tabular data manipulation. Let's check it out.

#@markdown Whoa, this visualization looks much better, and comes
#@markdown with tons of functionality for calculation and analysis!

import pandas as pd
pd.DataFrame(new_label_ids)

Unnamed: 0,label,score
0,positive,0.93032
1,negative,0.575447
2,neutral,0.841659
3,negative,0.900628


In [186]:
#@title Another fun side tidbit
#@markdown In the above example, we saw that pandas is super friendly
#@markdown for tabular data manipulation. Let's see how hard it would be
#@markdown if we wanted to concatenate this with our input text...

#make a fake data fields
texts = ['The dog was happy and ran along playfully.',
         'The cat glared at me, judging me from afar.',
         'The groundhog peeked its head above the ground.',
         'Opposums are criminally underrated.']

#Create new data frame out of texts
text_data = pd.DataFrame({'texts':texts}, index=range(4))

#Concat this with a data frame made from new_label_ids on the fly
pd.concat([text_data, pd.DataFrame(new_label_ids)], axis=1)

#@markdown Fun fun in the sun!

Unnamed: 0,texts,label,score
0,The dog was happy and ran along playfully.,positive,0.93032
1,"The cat glared at me, judging me from afar.",negative,0.575447
2,The groundhog peeked its head above the ground.,neutral,0.841659
3,Opposums are criminally underrated.,negative,0.900628


### Try it Yourself!

In [67]:
#@markdown That's right, you guessed it again! Now, we're going to try the same
#@markdown exercise of applying our `adjust_label_values` function to all of the
#@markdown elements of our list!

#@markdown Recall that our solution was:
#@markdown ```
#@markdown for pos, single_id in enumerate(ids_list):
#@markdown   ids_list[pos] = adjust_label_values(single_id)
#@markdown ```

#@markdown Here, write a list comprehension that returns a new list with the values of the corrected ids.
#@markdown Click the `Show Code` button if you need a little assistance!

# Function
def adjust_label_values(label_in):

  #adjust value
  label_in = label_in - 1

  #return the value
  return label_in

# Demo list
ids_list = [2, 6, 3, 2, 1, 8, 2, 3, 2, 1]

# Code
adjusted_labels = [adjust_label_values(id_value) for id_value in ids_list]
print('Adjusted labels', adjusted_labels)
print('Original labels', ids_list)


Adjusted labels [1, 5, 2, 1, 0, 7, 1, 2, 1, 0]
Original labels [2, 6, 3, 2, 1, 8, 2, 3, 2, 1]


In [None]:
# Original function code and labels list

# Function
def adjust_label_values(label_in):

  #adjust value
  label_in = label_in - 1

  #return the value
  return label_in

# Demo list
ids_list = [2, 6, 3, 2, 1, 8, 2, 3, 2, 1]

In [174]:
# Your code solution here
adjusted_labels = [adjust_label_values(id_value) for id_value in ids_list]
adjusted_labels

[1, 5, 2, 1, 0, 7, 1, 2, 1, 0]

# Iteration with Huggingface
You'll see the syntax of iteration above all through Huggingface and Python, but you'll see the `map` functionality THE MOST with Huggingface due to their favored data structures and desire for speed in vectorized operations.

Note that there is a map function provided through Python, and the functionality is similar. Today, we're going to talk about specifically the `map` _method_ of Datasets for iteration over your data in HuggingFace.

In [68]:
%%capture

#Make HF datasets available to our computing environment
!pip install datasets

In [69]:
#load relevant packages
import pandas as pd #this provides iteractivity with tabular data
from datasets import Dataset

In [75]:
#make a fake data fields
texts = ['The dog was happy and ran along playfully.',
         'The cat glared at me, judging me from afar.',
         'The groundhog peeked its head above the ground.',
         'Opposums are criminally underrated.']
text_label_names = ['positive', 'negative', 'neutral', 'positive']

In [76]:
#create label configurations
id2label = {0:'negative', 1:'neutral', 2:'positive'}

#create label2id configuration
label2id = {value:key for key,value in id2label.items()}
print(label2id)

#create label ids for `labels` above
labels = [label2id[text_label] for text_label in text_label_names]
labels

{'negative': 0, 'neutral': 1, 'positive': 2}


[2, 0, 1, 2]

In [77]:
#make a fake Dataset dictionary
dataset_dict = {'text':texts,
                'labels':labels,
                'label_names':text_label_names}
dataset_dict

{'label_names': ['positive', 'negative', 'neutral', 'positive'],
 'labels': [2, 0, 1, 2],
 'text': ['The dog was happy and ran along playfully.',
  'The cat glared at me, judging me from afar.',
  'The groundhog peeked its head above the ground.',
  'Opposums are criminally underrated.']}

In [81]:
#Observe the data in a more intuitive way using pandas
pd.DataFrame(dataset_dict)

Unnamed: 0,text,labels,label_names
0,The dog was happy and ran along playfully.,2,positive
1,"The cat glared at me, judging me from afar.",0,negative
2,The groundhog peeked its head above the ground.,1,neutral
3,Opposums are criminally underrated.,2,positive


Let's first make a Dataset so that we can use the `map` method on our data.

In [166]:
#make the fake Dataset using the API
demo_dataset = Dataset.from_dict(dataset_dict)
demo_dataset

Dataset({
    features: ['text', 'labels', 'label_names'],
    num_rows: 4
})

In [80]:
#assess text field
demo_dataset['text']

['The dog was happy and ran along playfully.',
 'The cat glared at me, judging me from afar.',
 'The groundhog peeked its head above the ground.',
 'Opposums are criminally underrated.']

## Guided Example: Modifying data
Let's say that we wanted to uppercase the entirety of each text field. How could we do this? Keep in mind that for Datasets `map` method, it always wants you to return a dictionary.

In [87]:
test_dataset = demo_dataset.map(lambda x: {'upper_text':x['text'].upper()})
test_dataset['upper_text']

  0%|          | 0/4 [00:00<?, ?ex/s]

['THE DOG WAS HAPPY AND RAN ALONG PLAYFULLY.',
 'THE CAT GLARED AT ME, JUDGING ME FROM AFAR.',
 'THE GROUNDHOG PEEKED ITS HEAD ABOVE THE GROUND.',
 'OPPOSUMS ARE CRIMINALLY UNDERRATED.']

## Practical Example: Tokenization
Tokenization is the process of taking our text inputs and processing them into something that can be used by a computer - some type of numerical representation.

Tokenizers go along with a model, so we use a tokenizer to process the text in a way that the model will understand.

Yesterday, we looked at pipelines, and we saw that there was a model and tokenizer field. Let's look at if we _didn't_ use pipeline and instead tried to instantiate from the class.

In [88]:
%%capture

#Make HF datasets available to our computing environment
!pip install transformers

In [161]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

In [91]:
# Lets create a model and tokenizer using the API
mdl = AutoModelForSequenceClassification.from_pretrained('cardiffnlp/twitter-roberta-base-sentiment')
tokenizer = AutoTokenizer.from_pretrained('cardiffnlp/twitter-roberta-base-sentiment')

Downloading:   0%|          | 0.00/747 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/476M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/150 [00:00<?, ?B/s]

In [108]:
#explore tokenizer
tokenizer(['the dog is cute'], padding='longest', truncation=True, max_length=256, return_tensors='pt')

{'input_ids': tensor([[    0,   627,  2335,    16, 11962,     2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]])}

In [167]:
#create tokenizer function
def tokenize_data(input_example, tok, how='pt'):

  return_dict = tok(input_example['text'], padding='longest', truncation=True, max_length = 256, return_tensors = how)

  return return_dict

In [171]:
#explore known lambda functionality
tok_dataset = demo_dataset.map(lambda x: tokenize_data(x, tokenizer))
tok_dataset

  0%|          | 0/4 [00:00<?, ?ex/s]

Dataset({
    features: ['text', 'labels', 'label_names', 'input_ids', 'attention_mask'],
    num_rows: 4
})

In [172]:
#explore direct use of function in batch
tok_dataset = demo_dataset.map(tokenize_data,
                               batched=True,
                               fn_kwargs={'tok':tokenizer, 'how':'np'})
tok_dataset

  0%|          | 0/1 [00:00<?, ?ba/s]

Dataset({
    features: ['text', 'labels', 'label_names', 'input_ids', 'attention_mask'],
    num_rows: 4
})

In [173]:
#call model
mdl(**{'input_ids':torch.tensor(tok_dataset['input_ids']),
       'attention_mask':torch.tensor(tok_dataset['attention_mask'])})

SequenceClassifierOutput([('logits', tensor([[-2.7216,  0.3107,  2.9494],
                                   [ 1.3014,  0.9623, -2.3747],
                                   [-1.7134,  1.8206, -0.0186],
                                   [ 2.2642, -0.0392, -2.3004]], grad_fn=<AddmmBackward0>))])

# Congratulations!
You made it through the first crash course with Python for using HuggingFace! You now:

1. How to use Google Colab
2. Have built some intuition around what it is to program and that programming is just another language with syntax, semantics, and grammar
3. Know several standard Python data types and how to use them
4. Know several standard Python data structures and how to use them
5. Have learned how to use functions, what they expect, and what they return
6. Know packages, libraries, modules, classes, functions, and methods all relate and how you can leverage this information to help understand APIs
7. Experienced using the HuggingFace API and understood APIs as contracts about what is expected to be input and what should be returned
8. Learned how to communication conditional execution
9. Learned about standard types of iteration with Python
10. Learned about the map method with HuggingFace

That is A LOT to cover in 3 days - and I'm proud of you for sticking with it!

Next week, we'll delve into Transformers, and we'll use this Python knowledge to help us understand tutorials and grow on our own in looking at the APIs and documentation HuggingFace provides.

# Homework Exercises

Remember that you can always click the `Show Code` button for help or to check your answers. Now that you're more familiar with the Google Colab interface, creat your own cells - markdown or code - to answer the questions below.

In [187]:
#@title Question 1. Mutually Exclusive behavior
#@markdown Consider the mutually exclusive behavior of Exercise
#@markdown 3 in our Conditional Execution section. Semantically, it
#@markdown doesn't make sense as the final `else` somewhat assumes
#@markdown that we've specified all of the meaningful foods that we might
#@markdown be cooking. How might we go about changing this behavior
#@markdown so that the the actual inputs of `time` and `temperature`
#@markdown can actually be used?

#1
#A set of `if` statements rather than `if-elif-else` statements.
#This does not confer the same mutually exclusive behavior.

In [189]:
#@title Question 2. Mutually Exclusive Behavior
#@markdown Following up to Question 1, what is the impact
#@markdown of this change to the execution of our code,
#@markdown particularly for "fish casserole"? Explain this behavior.

#In the if-elif-else construction, the code leaves the if-elif-else
#statement once a condition is satisfied. Thus, only the "fish" statement
#is executed.

#In the if-if-if solution of Question 1, ALL if statements are tested
#for matches. This means whatever statement that matches last is the one
#that will have the time and temp values.

In [190]:
#@title Question 3: Dictionaries as purses
#@markdown Recall the question in class where someone was
#@markdown asking whether we could specify the parameters of
#@markdown in batch. My answer was that they could do this with
#@markdown a dictionary and dump out the values while calling the function.

#@markdown Using the `cook_in_oven_if_else` function, explore
#@markdown how you can do this.

#soln
param_dict = {'time': 40, 'temperature':800, 'rack':7}

cook_in_oven_if_else('pizza', **param_dict)

'pizza - Cooked!'

In [199]:
#@title Question 3: Iteration
#@markdown On day 2, we explored the usage of the `lower`
#@markdown String function and wrote an enclosing function around 
#@markdown it to demonstrate functions. Instead, let's use
#@markdown the lower function directly, and apply
#@markdown the function exactly as desired - to lower
#@markdown some uppercase labels.

#@markdown Using the `output` list of dictionaries we've
#@markdown been using:
#@markdown 1. Write a for loop to implement this functionality,
#@markdown updating the list with the appropriate lowercased values
#@markdown 2. Write a for loop to implement this functionality
#@markdown by populating either an empty list or a list of zeros
#@markdown 3. Write a list comprehension to generate the same list
#@markdown of dictionaries but with lowercased labels.

#@markdown Note that if you write these in order, you'll
#@markdown likely have to update (re-run) the cell which
#@markdown establishes the initial values of `output`
#@markdown between questions 1 and 2.

output = [{'label': 'LABEL_2', 'score': 0.9303200244903564},
          {'label': 'LABEL_0', 'score': 0.575447142124176},
          {'label': 'LABEL_1', 'score': 0.8416591286659241},
          {'label': 'LABEL_0', 'score': 0.9006277322769165}]

#1.
for cls_dict in output:
  cls_dict['label'] = cls_dict['label'].lower()

#2
new_output = list()
for cls_dict in output:
  new_dict = cls_dict.copy()
  new_dict['label'] = new_dict['label'].lower()
  new_output.append(new_dict)

#3
new_output_comp = [{**cls_dict, 'label':cls_dict['label'].lower()} for cls_dict in output]
