In [6]:
from arango import ArangoClient
import math
import string

In [7]:
client = ArangoClient(hosts='http://127.0.0.1:8529')
db = client.db('_system', username='root', password = 'openSesame')
db.collections()
pages = db.collection("Page")
people = db.collection("People")
places = db.collection("Places")
count_of_possible_solutions= math.factorial(100)

Cain's Jawbone contains exactly 100 pages, printed out of order. Before doing any analysis of any kind, there are exactly 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000 possible solutions to the puzzle. 

I think it would be fun to estimate how long it would take to read all possible combinations of the puzzle as we move forward, so I am going to calculate a word count for each page, as well as for the entire book. 

In [None]:
total_count=0
for i in range(1,100):
    _id = "Page/" +str(i)
    current_doc = db.document(_id)
    text = current_doc['content'].strip()
    x = [i.translate(i.maketrans("", "", string.punctuation)).isalpha() for i in text.split(" ")]  
    word_count=sum(x)
    current_doc['word_count'] = word_count
    db.update_document(current_doc)
    total_count+=word_count

In [None]:
def how_long_to_read(possible_solutions, wpm):
    #The averageperson reads 238 words per minute.
    minutes= possible_solutions*(total_count/wpm)
    years = minutes/525600
    millenia = years/1000
    return(millenia)


In [None]:
init_time = how_long_to_read(count_of_possible_solutions,238)
how_long_to_read(count_of_possible_solutions,1000000)

So before doing any analytics at all, it will take the average person 1.24E151 Millennia to read all possible combinations. Even at 1,000,000 words per minute, it would take 2.96E147 millenia. 

I believe the first big thing I can do to lower the number of possible connections is to describe the beginning and ending conditions that may or may not connect pages together. I am iterating through each page, and determining the first and last character. From there, I assign a value based on the type of ending each page has. For example, pages that end with poems can only be linked with pages that start with poems. 

There are now 720 possible combinations of how these pages can line up together, which is still a lot, but utililzing the context clues and other information will help narrow down the choices. 

The same is true for the pages that start and end in the middle of sentences. 

The pages that start with I are a wild card, since I must always be capitalized, it is difficult to know if the pronoun occurs at the beginning or middle of a sentence, however, it still limits the number of pages.

In [None]:
incomplete_starts = 0
incomplete_ends = 0
poem_starts = 6
poem_ends = 6
starts_with_i = 0
normal_starts = 0
normal_ends = 0
for i in range(1,100):
    _id = "Page/" +str(i)
    current_doc = db.document(_id)
    passage = current_doc['content'].strip()
    end = passage[-1]
    if i in [12, 23, 41,49,86,92]:
        current_doc['end_condition'] = 'poem starts on page'
    elif end not in ['.','?', '!', '”']:
        current_doc['end_condition'] = 'incomplete end'
        incomplete_ends+=1
        print(_id)
    else: 
        current_doc['end_condition'] = "None"
        normal_ends+=1

    if i in [13, 24, 42,50,87,93]:
        current_doc['start_condition'] = 'poem ends on page'
    elif passage[0].islower():
        current_doc['start_condition'] = 'incomplete beginning'
        incomplete_starts+=1
        print(_id)
    else:
        current_doc['start_condition'] = "None"
        normal_starts+=1
    db.update_document(current_doc)

# CHANGE TO ARANGO FUNCTION
The following function currently searches each page for a given string. I want to change this to an Arango Analyzer if possible

In [10]:
def find_lines_with_phrase(specific_string):
    specific_string=specific_string.lower()
    new_list = []
    idx=0
    page_found = []
    page_num =1
    for page in cains_jawbone:
        page_normalize = page.lower()
        if specific_string in page_normalize:
            new_list.insert(idx, page)
            page_found.insert(idx, page_num)
            idx+=1
        page_num+=1
        
    if len(new_list)==0:
       print("no")
    else:

        # displaying the lines 
        # containing given string
        lineLen = len(new_list)
        print("\n**** Lines containing \"" +specific_string+ "\" ****\n")
        for i in range(lineLen):
            print("Page ", page_found[i])
            print(end=new_list[i])
            print('\n\n\n')
        print()

We have 4 pages: 66, 67, 73, and 74, that are most likely paired with each other, and 12 poems: 12, 23, 41,49,86,92, 13, 24, 42,50,87,93 that are also likely paired together in some way. 

This means that there are only 2 combinations to arrange the incomplete pages, and 6! (740) combinations that the pages containing the poems can be arranged. 

This does not account for the pages directly before or after these pairs, only the pairs themselves, which can still appear anywhere in the novel. 

However, we have 9 pages that have a limited number of predecessors. 

I think we cand decrease the total number of pages to:

84! * 6! * 2! 

This has saved us a few millenia. We are now looking at 6.38E122 Millenia to read all possible combinations. Think of all the things you can get done now that you aren't wasting 
12470323560952561367028935412358888882412853902766811155097386706704162155279925109869007431432624370574548868995653362416626605197646505954304956825600.000000 millenia reading the wrong combination of Cain's Jawbone! 


In [None]:
After_end_analysis=math.factorial(84) * math.factorial(6)*2
after_end_analysis_time = how_long_to_read(After_end_analysis, 238)
time_saved = init_time-after_end_analysis_time
'{:f}'.format(time_saved)

# Who's Who? 
Each page mentions specific people. Some are characters, others are authors like Charlotte Bronte, others are animals, and some aren't people at all. Some people might go by multiple names, We aren't sure who is who yet. For now, I am iterating through all the pages and determining a list of characters. This will give us a place to start. The following code iterates through the pages (given a range of pages so I can run these in short bursts) and asks for an input of character names, it then adds a document for each new character, or updates as appropriate. Sometimes the new page doesn't print right away. If that's the case, just write blank. 

In [16]:
start = int(input("Start on What Page?"))
end = int(input("End on What Page?")) #last on 41
for i in range(start,end):
    _id = "Page/" +str(i)
    current_doc = db.document(_id)
    
    who = []
    print("---------", _id,"-----------")
    print(current_doc['content'])
    
    names_mentioned = input("Enter Names of individuals mentioned on this page, enter -1 to stop.\n")
    while names_mentioned != "-1":
        who.append(names_mentioned)
        names_mentioned = input("Another name? enter -1 to stop.\n")
    for person in who:
        lookup = "People/" + person.lower().replace(" ", "_")
        if db.has_document(lookup) is False:
            name = {
                '_id': lookup,
                'Page': [_id]
            }
            people.insert(name)
        else:
            current_person= db.document(lookup)
            current_person['Page'].append(_id)
            people.update(current_person)
            print(current_person)
    continue_prompt = input("Would you like to check another page? Y/N")
    if continue_prompt == "N":
        break
    else: 
        continue    

--------- Page/41 -----------
And then with horrid clearness I had seen a woman---not actually, if I could trust myself, there ; but aiming, directing, inspiring : slim, tawny, petulant, self-willed : wanton, but too calculated to be more than mistress of herself ; the kind that had made England terribly at sea. I looked back on my own youth ; I had been about a bit, as they say ; sometimes, to catch a whale, I had cast a sprat over the windmill. But it was not till my marriage with Henry that old Charles Goodfellow dared to hint that I was going gay. Poor lonely little Bat. But it was still the first dog, I couldn’t help realising that, after my husband’s training. Just as I could not help realising that, had I a mind to go there, I could now get moled and isled on the Selfridget side, though by no means in Bond Street. When I said means, I meant of course lawful ones. Then I remembered Henry’s favourite quotation : But M’Cullough ‘e wanted cabins with marble and maple and all And Bru