This code finds a specific word in a TEI document and returns the word in its context within the work. The size of the contextualized result is ultimately up to you, and will change depending on the type of work you are consulting and the TEI schema used. The example text, Homer's Iliad as translated by Alexander Pope [available from Project Gutenburg](https://www.gutenberg.org/ebooks/6130) uses paragraph divisions for the introduction and line divisions for the main text. Our inquiry will be within the main text so the code is written assuming line divisions. 

For more information on handling XML files in Python, consult the XML page.

In [7]:
# import the Natural Language Toolkit and the Beautiful Soup library
import nltk
from bs4 import BeautifulSoup

# store the text's filepath
filename = 'corpus/iliad.tei'

# read in the filename, store it temporarily as a variable called text.
with open(filename, 'r') as fin:
    text = fin.read()

# take the text, turn it into a BeautifulSoup object, and store in a variable called tei.
tei = BeautifulSoup(text, 'lxml')

Next, we will store the text divisions we are interested in according to the tags used by our TEI schema:

In [8]:
paragraphs = tei.find_all('l')

The next step is to call NLTK to tokenize the content of the tags: 

In [9]:
# make a blank list for lines
paragraph_tokens = []

# loop over the lines, tokenize their content, append the tokens to the blank list
for paragraph in paragraphs:
    sentences = nltk.sent_tokenize(paragraph.text)
    paragraph_tokens.append(sentences)

Store the word of interest:

In [10]:
word_of_interest = 'Apollo'

Store our contextual parameters and loop over token list: 

In [11]:
# make a blank list to store results
contexts_of_word_of_interest = []

# store context paramater
context = 10

# loop over paragraph_tokens list, retrieve the index and the vaule of each iteration of word_of_choice
for num, paragraph in enumerate(paragraph_tokens, start=1):
    for sentence in paragraph:
        if sentence.count(word_of_interest)>0:
            start = num - context
            end = num + context
            contexts_of_word_of_interest.append(paragraph_tokens[start:end])



For ease of reading, the following print statement is helpful:

In [12]:
for line in contexts_of_word_of_interest:
    print('======')
    print(line)

[['Whose limbs unburied on the naked shore,'], ['Devouring dogs and hungry vultures tore.Vultures: Pope is more accurate than the poet he translates,\nfor Homer writes "a prey to dogs and to all kinds of birds.', 'But\nall kinds of birds are not carnivorous.'], ['Since great Achilles and Atrides strove,'], ['Such was the sovereign doom, and such the will of Jove!—i.e.', 'during the whole time of their striving the will\nof Jove was being gradually accomplished.'], ['Declare, O Muse!', 'in what ill-fated hourCompare Milton\'s "Paradise Lost" i.', '6\n\n"Sing, heavenly Muse, that on the secret top\nOf Horeb, or of Sinai, didst inspire\nThat shepherd."'], ['"Sing, heavenly Muse, that on the secret top'], ['Of Horeb, or of Sinai, didst inspire'], ['That shepherd."'], ['Sprung the fierce strife, from what offended power'], ["Latona's son a dire contagion spread,—Latona's son: i.e.", 'Apollo.'], ["And heap'd the camp with mountains of the dead;"], ['The king of men his reverent priest defied