In the cell below, create a Python function that wraps your previous solution for the Bag of Words lab.

Requirements:

1. Your function should accept the following parameters:
    * `docs` [REQUIRED] - array of document paths.
    * `stop_words` [OPTIONAL] - array of stop words. The default value is an empty array.

1. Your function should return a Python object that contains the following:
    * `bag_of_words` - array of strings of normalized unique words in the corpus.
    * `term_freq` - array of the term-frequency vectors.

In [28]:
# Import required libraries
import pandas as pd
import re
text1 = """
    Loop `docs` and read the content of each doc into a string in `corpus`.
    Remember to convert the doc content to lowercases and remove punctuation.
    """
text2 = """
    Loop `corpus`. Append the terms in each doc into the `bag_of_words` array. The terms in `bag_of_words` 
    should be unique which means before adding each term you need to check if it's already added to the array.
    In addition, check if each term is in the `stop_words` array. Only append the term to `bag_of_words`
    if it is not a stop word.
    """
text3 = """
    Loop `corpus` again. For each doc string, count the number of occurrences of each term in `bag_of_words`. 
    Create an array for each doc's term frequency and append it to `term_freq`.
    """
# Define function
def get_bag_of_words(text, stop_words=[]):
    bag_of_words = []
    pattern = re.compile(r'\W+')
    words = pattern.split(text)
    for word in words:
        if word not in bag_of_words:
            if len(word)>1:
                bag_of_words.append(word)
    return bag_of_words

def get_bow_from_docs(docs, stop_words=[]):
    term_freq = []
    bag_of_words = []
    for doc in docs:
        bag_words = get_bag_of_words(doc)
        word_freq = []
        for word in bag_words:
            if word in doc:
                word_freq.append(doc.count(word))
        bag_of_words.append(bag_words)
        term_freq.append(word_freq)
    return {
        "bag_of_words": bag_of_words,
        "term_freq": term_freq
    }
 
text = []
text.append(text1)
text.append(text2)
text.append(text3)
print(get_bow_from_docs(text))
   

{'bag_of_words': [['Loop', 'docs', 'and', 'read', 'the', 'content', 'of', 'each', 'doc', 'into', 'string', 'in', 'corpus', 'Remember', 'to', 'convert', 'lowercases', 'remove', 'punctuation'], ['Loop', 'corpus', 'Append', 'the', 'terms', 'in', 'each', 'doc', 'into', 'bag_of_words', 'array', 'The', 'should', 'be', 'unique', 'which', 'means', 'before', 'adding', 'term', 'you', 'need', 'to', 'check', 'if', 'it', 'already', 'added', 'In', 'addition', 'is', 'stop_words', 'Only', 'append', 'not', 'stop', 'word'], ['Loop', 'corpus', 'again', 'For', 'each', 'doc', 'string', 'count', 'the', 'number', 'of', 'occurrences', 'term', 'in', 'bag_of_words', 'Create', 'an', 'array', 'for', 'frequency', 'and', 'append', 'it', 'to', 'term_freq']], 'term_freq': [[1, 1, 2, 1, 2, 2, 1, 1, 3, 1, 1, 3, 1, 1, 3, 1, 1, 1, 1], [1, 1, 1, 5, 2, 5, 3, 1, 1, 3, 3, 1, 1, 2, 1, 1, 1, 1, 1, 5, 1, 1, 6, 2, 3, 3, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 5], [1, 1, 1, 1, 3, 2, 1, 1, 1, 1, 3, 1, 3, 3, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1]]

Test your function without stop words. You should see the output like below:

```{'bag_of_words': ['ironhack', 'is', 'cool', 'i', 'love', 'am', 'a', 'student', 'at'], 'term_freq': [[1, 1, 1, 0, 0, 0, 0, 0, 0], [1, 0, 0, 1, 1, 0, 0, 0, 0], [1, 0, 0, 1, 0, 1, 1, 1, 1]]}```

In [29]:
# Define doc paths array
docs = []
# Write your code here
f1 = open("./doc1.txt")
f2 = open("./doc2.txt")
f3 = open("./doc3.txt")
docs.append(f1.read())
docs.append(f2.read())
docs.append(f3.read())
print(docs)
# Obtain BoW from your function
bow = get_bow_from_docs(docs)

# Print BoW
print(bow)

['Ironhack is cool.\n', 'I love Ironhack.\n', 'I am a student at Ironhack.\n']
{'bag_of_words': [['Ironhack', 'is', 'cool'], ['love', 'Ironhack'], ['am', 'student', 'at', 'Ironhack']], 'term_freq': [[1, 1, 1], [1, 1], [1, 1, 1, 1]]}


If your attempt above is successful, nice work done!

Now test your function again with the stop words. In the previous lab we defined the stop words in a large array. In this lab, we'll import the stop words from Scikit-Learn.

In [51]:
from sklearn.feature_extraction import stop_words
print(stop_words.ENGLISH_STOP_WORDS)

ImportError: cannot import name 'stop_words' from 'sklearn.feature_extraction' (C:\Users\lgutierrez\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\feature_extraction\__init__.py)

You should have seen a large list of words that looks like:

```frozenset({'across', 'mine', 'cannot', ...})```

`frozenset` is a type of Python object that is immutable. In this lab you can use it just like an array without conversion.

Next, test your function with supplying `stop_words.ENGLISH_STOP_WORDS` as the second parameter.

In [None]:
bow = get_bow_from_docs(bow, stop_words.ENGLISH_STOP_WORDS)

print(bow)

You should have seen:

```{'bag_of_words': ['ironhack', 'cool', 'love', 'student'], 'term_freq': [[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1]]}```