This code finds a specific word in a TEI document and returns the word in its context within the work. The size of the contextualized result is ultimately up to you, and will change depending on the type of work you are consulting and the TEI schema used. The example text, Homer's Iliad as translated by Alexander Pope [available from Project Gutenburg](https://www.gutenberg.org/ebooks/6130) uses paragraph divisions for the introduction and line divisions for the main text. Our inquiry will be within the main text so the code is written assuming line divisions. 

For more information on handling XML files in Python, consult the XML page.

In [1]:
# import the Natural Language Toolkit and the Beautiful Soup library
import nltk
from bs4 import BeautifulSoup

# store the text's filepath
filename = 'corpus/iliad.tei'

# read in the filename, store it temporarily as a variable called text.
with open(filename, 'r') as fin:
    text = fin.read()

# take the text, turn it into a BeautifulSoup object, and store in a variable called tei.
tei = BeautifulSoup(text, 'lxml')

Next, we will store the text divisions we are interested in according to the tags used by our TEI schema:

In [2]:
lines = tei.find_all('lg')

The next step is to call NLTK to tokenize the content of the tags: 

In [3]:
# make a blank list for lines
line_tokens = []

# loop over the lines, tokenize their content, append the tokens to the blank list
for line in lines:
    tokens = nltk.sent_tokenize(line.text)
    line_tokens.append(tokens)

Store the word of interest:

In [4]:
word_of_interest = 'Apollo'

Store our contextual parameters and loop over token list: 

In [5]:
# make a blank list to store results
contexts_of_word_of_interest = []

# store context paramater
context = 10

# loop over paragraph_tokens list, retrieve the index and the vaule of each iteration of word_of_choice
for num, line in enumerate(line_tokens, start=1):
    for token in line:
        if token.count(word_of_interest)>0:
            print("=======")
            print(token)
            print(line)
            # append (extend??) the contextualized index according to the context parameters
            start = num - context
            end = num + context
            contexts_of_word_of_interest.extend(line_tokens[start:end])

Apollo.
['\nDeclare, O Muse!', 'in what ill-fated hourCompare Milton\'s "Paradise Lost" i.', '6\n\n"Sing, heavenly Muse, that on the secret top\nOf Horeb, or of Sinai, didst inspire\nThat shepherd."', "Sprung the fierce strife, from what offended power\nLatona's son a dire contagion spread,—Latona's son: i.e.", 'Apollo.', "And heap'd the camp with mountains of the dead;\nThe king of men his reverent priest defied,—King of men: Agamemnon.", "And for the king's offence the people died."]
Suppliant the venerable father stands;
Apollo's awful ensigns grace his hands
By these he begs; and lowly bending down,
Extends the sceptre and the laurel crown
He sued to all, but chief implored for grace
The brother-kings, of Atreus' royal race—Brother kings: Menelaus and Agamemnon.
["\nFor Chryses sought with costly gifts to gain\nHis captive daughter from the victor's chain.", "Suppliant the venerable father stands;\nApollo's awful ensigns grace his hands\nBy these he begs; and lowly bending down,\nE

For ease of reading, the following print statement is helpful:

In [6]:
#for line in contexts_of_word_of_interest:
    #print('======')
    #print(line)