In the cell below, create a Python function that wraps your previous solution for the Bag of Words lab.

Requirements:

1. Your function should accept the following parameters:
    * `docs` [REQUIRED] - array of document paths.
    * `stop_words` [OPTIONAL] - array of stop words. The default value is an empty array.

1. Your function should return a Python object that contains the following:
    * `bag_of_words` - array of strings of normalized unique words in the corpus.
    * `term_freq` - array of the term-frequency vectors.

In [1]:
# Import required libraries
import re
# Define function
docs = ['doc1.txt', 'doc2.txt', 'doc3.txt']

def get_bow_from_docs(docs, stop_words=[]):
    corpus = []
    bag_of_words = set()
    term_freq = []
    for doc in docs:
        with open(doc, "r") as f:
            text = f.read()
            doc_string = re.split(r'[,\.\n\s]', text)
            doc_string_no_spaces = [i.lower() for i in doc_string if i != ""]
            corpus.append(doc_string_no_spaces)
    for vector in corpus:       
        for word in vector:
            if word not in stop_words:
                bag_of_words.add(word)
    final_bag_of_words = list(bag_of_words)
    for vector in corpus:
        vector_freq = []
        for word in final_bag_of_words:
            vector_freq.append(vector.count(word))
        term_freq.append(vector_freq)
    return {
        "bag_of_words": final_bag_of_words,
        "term_freq": term_freq
    }


# In the function, first define the variables you will use such as `corpus`, `bag_of_words`, and `term_freq`.

    
# Now return your output as an object

get_bow_from_docs(docs, stop_words=[])

{'bag_of_words': ['is',
  'student',
  'cool',
  'a',
  'love',
  'i',
  'ironhack',
  'at',
  'am'],
 'term_freq': [[1, 0, 1, 0, 0, 0, 1, 0, 0],
  [0, 0, 0, 0, 1, 1, 1, 0, 0],
  [0, 1, 0, 1, 0, 1, 1, 1, 1]]}

Test your function without stop words. You should see the output like below:

```{'bag_of_words': ['ironhack', 'is', 'cool', 'i', 'love', 'am', 'a', 'student', 'at'], 'term_freq': [[1, 1, 1, 0, 0, 0, 0, 0, 0], [1, 0, 0, 1, 1, 0, 0, 0, 0], [1, 0, 0, 1, 0, 1, 1, 1, 1]]}```

In [2]:
# Define doc paths array

# Obtain BoW from your function
bow = get_bow_from_docs(docs, stop_words=[])

# Print BoW
print(bow)

{'bag_of_words': ['is', 'student', 'cool', 'a', 'love', 'i', 'ironhack', 'at', 'am'], 'term_freq': [[1, 0, 1, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 1, 1, 1, 0, 0], [0, 1, 0, 1, 0, 1, 1, 1, 1]]}


If your attempt above is successful, nice work done!

Now test your function again with the stop words. In the previous lab we defined the stop words in a large array. In this lab, we'll import the stop words from Scikit-Learn.

In [3]:
from sklearn.feature_extraction import _stop_words
print(_stop_words.ENGLISH_STOP_WORDS)

frozenset({'however', 'perhaps', 'seems', 'whoever', 'hence', 'ours', 'also', 'almost', 'former', 'onto', 'call', 'wherever', 'enough', 'twenty', 'nothing', 'thick', 'less', 'first', 'more', 'where', 'etc', 'hers', 'move', 'herself', 'have', 'my', 'can', 'whence', 'below', 'thereby', 'what', 'others', 'therefore', 'she', 'of', 'everywhere', 'because', 'get', 'forty', 'after', 'around', 'if', 'myself', 'side', 'cannot', 'namely', 'along', 'i', 'besides', 'bottom', 'amoungst', 'than', 'us', 'become', 'co', 'had', 'being', 'himself', 'whether', 'noone', 'anyone', 'same', 'up', 'always', 'un', 'together', 'bill', 'beside', 'his', 'formerly', 'when', 'him', 'ever', 'within', 'from', 'further', 'nevertheless', 'whose', 'before', 'too', 'most', 'part', 'it', 'down', 'but', 'please', 'six', 'their', 'interest', 'wherein', 'was', 'per', 'sincere', 'beyond', 'by', 'name', 'indeed', 'many', 'been', 'we', 'here', 'ten', 'no', 'the', 'cry', 'these', 'see', 'eight', 'mostly', 'somewhere', 'anything'

You should have seen a large list of words that looks like:

```frozenset({'across', 'mine', 'cannot', ...})```

`frozenset` is a type of Python object that is immutable. In this lab you can use it just like an array without conversion.

Next, test your function with supplying `stop_words.ENGLISH_STOP_WORDS` as the second parameter.

In [4]:
bow = get_bow_from_docs(docs, _stop_words.ENGLISH_STOP_WORDS)

print(bow)

{'bag_of_words': ['love', 'student', 'cool', 'ironhack'], 'term_freq': [[0, 0, 1, 1], [1, 0, 0, 1], [0, 1, 0, 1]]}


You should have seen:

```{'bag_of_words': ['ironhack', 'cool', 'love', 'student'], 'term_freq': [[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1]]}```

___________________________________________________________________________________________________________________________