In the cell below, create a Python function that wraps your previous solution for the Bag of Words lab.

Requirements:

1. Your function should accept the following parameters:
    * `docs` [REQUIRED] - array of document paths.
    * `stop_words` [OPTIONAL] - array of stop words. The default value is an empty array.

1. Your function should return a Python object that contains the following:
    * `bag_of_words` - array of strings of normalized unique words in the corpus.
    * `term_freq` - array of the term-frequency vectors.

In [9]:
# Import required libraries

# Define function
def get_doc_from_docs(docs, stop_words=[]):
    
    # In the function, first define the variables you will use such as `corpus`, `bag_of_words`, and `term_freq`.
    import re
    corpus = []
    bag_of_words = []
    term_freq = []
    #Loop `docs` and read the content of each doc into a string in `corpus`. Remember to convert the doc content to lowercases and remove punctuation.
    corpus = [open(i).read().lower() for i in docs]
    pattern = '[^A-Za-z0-9]+'
    corpus = [re.sub(pattern, ' ', i).strip() for i in corpus]
  
    # Loop `corpus`. Append the terms in each doc into the `bag_of_words` array. The terms in `bag_of_words` 
    # should be unique which means before adding each term you need to check if it's already added to the array.
    #In addition, check if each term is in the `stop_words` array. Only append the term to `bag_of_words`
    #if it is not a stop word.
    for x in corpus:
        array = re.split(' ', x)
        for y in array:
            if y not in bag_of_words:
                if y not in stop_words:
                    bag_of_words.append(y)
    
    # Loop `corpus` again. For each doc string, count the number of occurrences of each term in `bag_of_words`. 
    # Create an array for each doc's term frequency and append it to `term_freq`.
    for x in corpus:
        list_of_words = ''.join(x).split()
        word_freq = [list_of_words.count(z) for z in bag_of_words]
        term_freq.append(word_freq)  
    
    
    # Now return your output as an object
    return {
        "bag_of_words": bag_of_words,
        "term_freq": term_freq
    }
    

Test your function without stop words. You should see the output like below:

```{'bag_of_words': ['ironhack', 'is', 'cool', 'i', 'love', 'am', 'a', 'student', 'at'], 'term_freq': [[1, 1, 1, 0, 0, 0, 0, 0, 0], [1, 0, 0, 1, 1, 0, 0, 0, 0], [1, 0, 0, 1, 0, 1, 1, 1, 1]]}```

In [10]:
# Define doc paths array
docs = ['doc1.txt', 'doc2.txt', 'doc3.txt']

# Obtain BoW from your function
bow = get_doc_from_docs(docs)

# Print BoW
print(bow)

{'bag_of_words': ['ironhack', 'is', 'cool', 'i', 'love', 'am', 'a', 'student', 'at'], 'term_freq': [[1, 1, 1, 0, 0, 0, 0, 0, 0], [1, 0, 0, 1, 1, 0, 0, 0, 0], [1, 0, 0, 1, 0, 1, 1, 1, 1]]}


If your attempt above is successful, nice work done!

Now test your function again with the stop words. In the previous lab we defined the stop words in a large array. In this lab, we'll import the stop words from Scikit-Learn.

Collecting sklearn
  Downloading https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz
Collecting scikit-learn (from sklearn)
  Downloading https://files.pythonhosted.org/packages/85/04/49633f490f726da6e454fddc8e938bbb5bfed2001681118d3814c219b723/scikit_learn-0.21.2-cp36-cp36m-manylinux1_x86_64.whl (6.7MB)
[K    100% |████████████████████████████████| 6.7MB 270kB/s eta 0:00:01
[?25hCollecting scipy>=0.17.0 (from scikit-learn->sklearn)
  Downloading https://files.pythonhosted.org/packages/72/4c/5f81e7264b0a7a8bd570810f48cd346ba36faedbd2ba255c873ad556de76/scipy-1.3.0-cp36-cp36m-manylinux1_x86_64.whl (25.2MB)
[K    100% |████████████████████████████████| 25.2MB 69kB/s  eta 0:00:011
[?25hCollecting joblib>=0.11 (from scikit-learn->sklearn)
  Downloading https://files.pythonhosted.org/packages/cd/c1/50a758e8247561e58cb87305b1e90b171b8c767b15b12a1734001f41d356/joblib-0.13.2-py2.py3-none-any.whl (278kB)
[K    100% |█

In [11]:
from sklearn.feature_extraction import stop_words
print(stop_words.ENGLISH_STOP_WORDS)

frozenset({'before', 'found', 'name', 'will', 'another', 'fifty', 'first', 'show', 'would', 'interest', 'least', 'am', 'which', 'everywhere', 'throughout', 'every', 'where', 'full', 'elsewhere', 'forty', 'noone', 'nine', 'anyway', 'become', 'itself', 'she', 'thus', 'hereupon', 'whole', 'anyone', 'done', 'upon', 'seeming', 'while', 'beforehand', 'hundred', 'already', 'becoming', 'keep', 'ltd', 'whereby', 'how', 'most', 'for', 'above', 'below', 'beside', 'all', 'cry', 'becomes', 'ourselves', 'former', 'same', 'our', 'everyone', 'four', 'however', 'per', 'system', 'until', 'alone', 'with', 'not', 'around', 'bottom', 'amount', 'amongst', 'in', 'cant', 'to', 'himself', 'then', 'hers', 'meanwhile', 'some', 'thence', 'herself', 'both', 'toward', 'whence', 'herein', 'whom', 'without', 'along', 'sincere', 'why', 'whatever', 'because', 'since', 'sometimes', 'there', 'otherwise', 'are', 'enough', 'therein', 'hence', 'something', 'thereby', 'wherever', 'two', 'get', 'together', 'made', 'it', 'co',

You should have seen a large list of words that looks like:

```frozenset({'across', 'mine', 'cannot', ...})```

`frozenset` is a type of Python object that is immutable. In this lab you can use it just like an array without conversion.

Next, test your function with supplying `stop_words.ENGLISH_STOP_WORDS` as the second parameter.

In [13]:
bow = get_doc_from_docs(docs, stop_words.ENGLISH_STOP_WORDS)

print(bow)

{'bag_of_words': ['ironhack', 'cool', 'love', 'student'], 'term_freq': [[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1]]}


You should have seen:

```{'bag_of_words': ['ironhack', 'cool', 'love', 'student'], 'term_freq': [[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1]]}```