## Section Headings Curation and sentences spliter of Chilean Policies

In this notebook there are a series of dictionaries and methods to curate section headings of El Salvador policies. Policies from El Salvador have a rather definite structure, so that the law text is organized under section headings. There are two kinds of sections, the ones that are general and that can be often found in many policies, and the ones which are specific. The sections headings which are more general often come with a whole range of name variants which makes the task of machine recognition difficult.

The goal of this notebook is to group all pretreatment methods that would harmonize sections heading to make the further processing machine friendly.

In [1]:
from pathlib import Path
import boto3, json, operator, os, re, string
import numpy as np

### Dictionaries of particular vocabularies to help in the curation of section headings

In [127]:
# Most policies come with the final signatures. This is a piece of text that we want to be able to recognize. To make the
# detection of signatures easier, this dictionary contain the most common terms that can be found in these lines of text.
official_positions = {"ALCALDE" : 0,
"Alcalde" : 0,
"MINISTRA" : 0,
"Ministra" : 0,
"MINISTRO" : 0,
"Ministro" : 0,
"PRESIDENTA" : 0,
"Presidenta" : 0,
"PRESIDENTE" : 0,
"Presidente" : 0,
"REGIDOR" : 0,
"Regidor"  : 0,
"REGIDORA" : 0,
"regidora" : 0,
"SECRETARIA" : 0,
"Secretaria" : 0,
"SECRETARIO" : 0,
"Secretario" : 0,
"SINDICA" : 0,
"Sindica" : 0,
"SINDICO" : 0,
"Sindico" : 0,
"VICEPRESIDENTA" : 0,
"Vicepresidenta" : 0,
"VICEPRESIDENTE" : 0,
"Vicepresidente" : 0
}

end_of_file_tags = {
    "Anótese" : 0,
    "Anotese" : 0,
    "Publíquese" : 0,
    "Publiquese" : 0
}
# This dictionary contains some correspondences among different text headings. This is under development and needs further
# improvement.The idea is to merge in a single name all the headings that point to the same conceptual concept. For example,
# "Definiciones" is a heading that can come alone or together with other terms so it can appear as "Definiciones básicas" or
# "Definiciones generales". With the dictionary we can fetch all headings that contain the word "Definiciones" and change the
# heading to "Definiciones".
merges = {
    "CONCEPTOS" : "DISPOSICIONES GENERALES",
    "Considerando:" : "CONSIDERANDO",
    "DEFINICIONES" : "DISPOSICIONES GENERALES",
    "DISPOSICIONES FINALES" : "DISPOSICIONES GENERALES",
    "DISPOSICIONES GENERALES" : "DISPOSICIONES GENERALES",
    "DISPOSICIONES PRELIMINARES" : "DISPOSICIONES GENERALES",
    "DISPOSICIONES REGULADORAS" : "DISPOSICIONES ESPECIALES",
    "DISPOSICIONES RELATIVAS" : "DISPOSICIONES ESPECIALES",
    "DISPOSICIONES ESPECIALES" : "DISPOSICIONES ESPECIALES",
    "DISPOSICIONES TRANSITORIAS" : "DISPOSICIONES GENERALES",
    "DISPOSICIONES VARIAS Y TRANSITORIAS" : "DISPOSICIONES GENERALES",
    "DISPOSICIONES VARIAS" : "DISPOSICIONES GENERALES",
    "INCENTIVOS" : "INCENTIVOS",
    "INFRACCIONES" : "INFRACCIONES",
    "INFRACCION ES" : "INFRACCIONES",
    "OBJETIVO" : "OBJETO",
    "OBJETO" : "OBJETO",
    "DERECHOS" : "DERECHOS, OBLIGACIONES Y PROHIBICIONES",
    "DEBERES" : "DERECHOS, OBLIGACIONES Y PROHIBICIONES",
    "OBLIGACIONES" : "DERECHOS, OBLIGACIONES Y PROHIBICIONES",
    "OBLIGACIONE" : "DERECHOS, OBLIGACIONES Y PROHIBICIONES",
    "OBLIGACION" : "DERECHOS, OBLIGACIONES Y PROHIBICIONES",
    "OBLIGATORIEDAD" : "DERECHOS, OBLIGACIONES Y PROHIBICIONES",
    "PROHIBICIONES" : "DERECHOS, OBLIGACIONES Y PROHIBICIONES",
    "PROHIBICION" : "DERECHOS, OBLIGACIONES Y PROHIBICIONES",
    "DE LAS FORMAS DE AUTORIZACION" : "PERMISOS",
    "DE LOS PERMISOS Y LAS PATENTES" : "PERMISOS",
    "DE LOS PERMISOS" : "PERMISOS",
    "DE LAS SOLICITUDES DE PERMISOS" : "PERMISOS",
    "DEL OTORGAMIENTO DEL PERMISO" : "PERMISOS",
    "POR TANTO" : "POR TANTO",
    "POR LO TANTO" : "POR TANTO",
    "Decreto:" : "RESUELVO",
    "Resuelvo:" : "RESUELVO",
    "Se resuelve" : "RESUELVO",
    "S e  r e s u e l v e:" : "RESUELVO",
    "R e s u e l v o:" : "RESUELVO",
    "FISCALIZACION Y SANCIONES" : "SANCIONES",
    "DE LAS SANCIONES" : "SANCIONES",
    "Visto:" : "VISTO",
    "Vistos:" : "VISTO",
    "Vistos estos antecedentes:" : "VISTO",
    "--------------" : "HEADING"
}
section_tags = {
    "Considerando:" : "CONSIDERANDO",
    "Considerando\n" : "CONSIDERANDO",
    "Decreto:" : "RESUELVO",
    "Resuelvo:" : "RESUELVO",
    "R e s u e l v o:" : "RESUELVO",
    "Se resuelve" : "RESUELVO",
    "S e  r e s u e l v e:" : "RESUELVO",
    "Visto:" : "VISTO",
    "Vistos:" : "VISTO",
    "Vistos estos antecedentes:" : "VISTO",
    "--------------" : "HEADING"
}
merges_lower = {}
for key, value in merges.items():
    merges_lower[key.lower()] = value
# Eventhough the general gramar rule in Spanish is not to accent uppercase, there are many cases where a word in a heding might
# appear accented. This is a dictionary to armonize all headings without accents. The list is rather comprehensive, but there is
# still room for improvement.
# If we find some bug beyond simple misspelling which will be solved by spell checker, we can include it here. The example is in
# the first row with "ACTIVIDADESUSOS" which was found several times in headings.
bugs = {"ACTIVIDADESUSOS" : "ACTIVIDADES DE USOS"}

### Connection to the AWS S3 bucket
To effectively run this cell you need Omdena's credentials. Please keep them local and do not sync them in GitHub repos nor cloud drives. Before doing anything with this json file, please think of security!!

In [147]:
json_folder = Path("C:/Users/user/Google Drive/Els_meus_documents/projectes/CompetitiveIntelligence/WRI/Notebooks/credentials/")
# json_folder = Path("C:/Users/jordi/Google Drive/Els_meus_documents/projectes/CompetitiveIntelligence/WRI/Notebooks/credentials/")
filename = "Omdena_key_S3.json"
file = json_folder / filename

with open(file, 'r') as f:
    cred = json.load(f) 

for key in cred:
    KEY = key
    SECRET = cred[key]

s3 = boto3.resource(
    service_name = 's3',
    region_name = 'us-east-2',
    aws_access_key_id = KEY,
    aws_secret_access_key = SECRET
)

### Regular expressions

In [111]:

# To clear html tags (here is basically to remove the page tags)
cleanr = re.compile(r'<.*?>')

# To catch accents and dictionary to change them
accents_out = re.compile(r'[áéíóúÁÉÍÓÚ]')
accents_dict = {"á":"a","é":"e","í":"i","ó":"o","ú":"u","Á":"A","É":"E","Í":"I","Ó":"O","Ú":"U"}

# To remove special characters
clean_special_char = re.compile(r'(\*\.)|(\”\.)')

# To search for acronyms
clean_acron = re.compile(r'(A\s*\.M\s*\.)|(\bart\s*\.)|(\bArt\s*\.)|(\bART\s*\.)|(\bArts\s*\.)|(\bAV\s*\.)|(\bDr\s*\.)|(\bIng\s*\.)|(\bLic\s*\.)|(\bLicda\s*\.)|(\bLIC\s*\.)|(mm\s*\.)|(mts\s*\.)|(\bNo\s*\.)|(P\s*\.M\s*\.)|(prof\s*\.)|(profa\s*\.)|(sp\s*\.)|(ssp\s*\.)|(sr\s*\.)|(sra\s*\.)|(to\s*\.)|(ta\s*\.)|(var\s*\.)')  

# Remove extra white spaces
whitespaces = re.compile(r'[ ]{2,}')

# Regular expression to clear punctuation from a string
clean_punct = re.compile('[%s]' % re.escape(string.punctuation))
# Regular expression to clear words that introduce unnecessary variability to headings. Some regex still not work 100% we need
# to improve them.
clean_capitulo = re.compile(r'(APARTADO \S*)|(APARTADO\s)|(^ART\.\s*\S*)|(^ART\.\s*)|(^Art\.\s*\S*)|(^Art\.\s*)|(^Arts\.\s*\S*)|(Capítulo \S*)|(CAPITULO \S*)|(CAPITULO\S*)|(CAPÍTULO \S*)|(CAPITULÓ \S*)|(CAPITULOS \S*)|(CAPITUO \S*)|(CATEGORIA\b)|(CATEGORÍA\b)|(SUBCATEGORIA\b)|(SUBCATEGORÍA\b)|(TITULO\s\S*)|(TÍTULO\s\S*)')
clean_bullet_char = re.compile(r'\b[A-Za-z]\s*\.|\b[A-Za-z]\s*\.\s*|\b[A-Za-z]\s*\-\s*|\b[A-Za-z]\s*\)\s*|\.\s*\b[B-Za-z]\b|\b[A-Z]{1,4}\s*\.|^\d+\s*\.\s*\D+|\d+\)')
clean_bullet_point = re.compile(r'^-\s*')

### Functions

In [153]:

# Function to calculate the uppercase ratio in a string. It is used to detect section headings
def uppercase_ratio(string):
    if len(re.findall(r'[a-z]',string)) == 0:
        return 1
    else:
        return(len(re.findall(r'[A-Z]',string))/len(re.findall(r'[a-z]',string)))

def end_of_heading(line, flag, content, counter):
    if "URL" in line and "https:" in line:
        flag = False
        content = False
        counter = 0
        return flag, content, counter
    else:
        return flag, content, counter

def is_section(line):
    section = False
    for key in section_tags:
        if key in line:
            section = True
            break
    return section
            
def end_of_document(line):
    end_of_file = False
    for key in end_of_file_tags:
        if key in line:
            end_of_file = True
            break
    return end_of_file

# Te function to clear html tags
def clean_html_tags(string):
  return cleanr.sub('', string)
    
def is_por_tanto(line):
    if "POR TANTO" in line:
        return True
    else:
        return False

# Function to remove the last lines of a document, the ones that contain the signatures of the officials. It depends on the
# dictionary "official_positions"
def remove_signatures(line):
    signature = False
    for key in official_positions:
        if key in line:
            signature = True
            break
    return signature

# Function to change accented words by non-accented counterparts. It depends on the dictionary "accent_marks_bugs" 
def remove_accents(string):
    for accent in accents_out.findall(string):
        string = string.replace(accent, accents_dict[accent])
    return string

# Function to merge headlines expressing the same concept in different words. It depends on the dictionary "merges"
def merge_concepts(line):
    for key in merges:
        if key in line:
            line = merges[key]
            break
    return line

def clean_bugs(line):
    for key in bugs:
        if key in line:
            line = line.replace(key, bugs[key])
    return line

def clean_special_characters(line):
    char = clean_special_char.findall(line)
    for item in char:
        for character in item:
            if character != '':
                line = line.replace(character, "")
    return line

def clean_acronyms(line):
    acro = clean_acron.findall(line)
    for item in acro:
        for acronym in item:
            if acronym != '':
                line = line.replace(acronym, clean_punct.sub('', acronym))
    return line

def clean_whitespace(line):
    if whitespaces.sub(' ', line).rstrip().lstrip() != None:
        return whitespaces.sub(' ', line).rstrip().lstrip()
    else:
        return line

decimal_points = re.compile(r'(\b\d+\s*\.\s*\d+)')
def change_decimal_points(line):
    dec = decimal_points.findall(line)
    for decimal in dec:
        if decimal != '':
#             print(decimal)
            line = line.replace(decimal, clean_punct.sub(',', decimal))
    return line
                
# Function sentence
def clean_sentence(string):
    string = clean_capitulo.sub('', string)
    string = clean_bullet_char.sub('', string).rstrip().lstrip()
    string = clean_bullet_point.sub('', string).rstrip().lstrip()
#     string = clean_punct.sub('', string).rstrip().lstrip()
    if string != "":
        return string
    else:
        return ""    
    
# points = re.compile(r'(\b\w+\s*\.\s*\b[^\d\W]+)')
# def check_points(line):
#     return points.findall(line)
#     print(points.findall(line))

points = re.compile(r'(\b\w+\b\s*){3,}')
def check_sentence(line):
    if points.findall(line):
        return True
    else:
        return False

def split_into_sentences(line, sep):
    sentence_list = []
    for sentence in line.split(sep):
        if check_sentence(sentence):
            sentence = sentence.rstrip().lstrip()
            sentence_list.append(sentence)
    return sentence_list

# Function to add items to the dictionary with duplicate removal
def add_to_dict(string, dictionary, dupl_dict):
    if string in dupl_dict or string == None:
        pass
    else:
        dupl_dict[string] = 0
        if string in dictionary:
            dictionary[string] = dictionary[string] + 1
        else:
            dictionary[string] = 1
    return dictionary
def full_cleaning(line):
    line = clean_html_tags(line)
    line = remove_accents(line)
    line = clean_special_characters(line)
    line = clean_bugs(line)
    line = clean_acronyms(line)
    line = clean_whitespace(line)
    line = clean_sentence(line)
    return line

In [30]:
test_string = "Que el Art. 204 Ordinal 3*. y 5”. de la Constitución, regula. A. Hola, em dic Jordi. B. No sé massa perquè l'Art *. 22 conté 22.34€. Tanmateix sembla que la Licda. una cosa. voldria  55.22. no fotis"
# test_string = "Prova senzilleta per veure què passa si no hi ha punt"
test_string = clean_sentence(test_string)
test_string = clean_special_characters(test_string)
test_string = clean_acronyms(test_string)
test_string = clean_whitespace(test_string)
test_string = change_decimal_points(test_string)
print(test_string)
sentences = []
[sentences.append(sentence) for sentence in split_into_sentences(test_string)]
print(sentences)
# print(sentences)

# if check_sentence(test_string):
#     

Que el Art 204 Ordinal 3 y 5 de la Constitución, regula. Hola, em dic Jordi. No sé massa perquè l'Art 22 conté 22,34€. Tanmateix sembla que la Licda una cosa. voldria 55,22. no fotis
['Que el Art 204 Ordinal 3 y 5 de la Constitución, regula', 'Hola, em dic Jordi', "No sé massa perquè l'Art 22 conté 22,34€", 'Tanmateix sembla que la Licda una cosa']


### Pipeline to process files from S3 bucket
By executing this cell you will go through all policies in El Salvador and process section headings that will be saved in a dictionary. This should be merged with the notebook that builds up the final json files out of plain txt files.

In [17]:
in_folder = "text-extraction/"
out_folder = "JSON/"
counter = 0
name4 = {}
name5 = {}
name6 = {}
name7 = {}
for obj in s3.Bucket('wri-latin-talent').objects.all().filter(Prefix='text-extraction'):
    if in_folder in obj.key and obj.key.replace(in_folder, "") != "":# and filename in obj.key   # Un comment the previous string to run the code just in one sample document.
        file = obj.get()['Body'].read().decode('utf-8')  #get the file from S3 and read the body content
        lines = file.split("\n") # Split by end of line and pipe lines into a list
        file_name = obj.key.replace(in_folder, "").replace('.pdf.txt', '')        
        name4[file_name[0:4]] = 0
        name5[file_name[0:5]] = 0
        name6[file_name[0:6]] = 0
        name7[file_name[0:7]] = 0

        counter += 1

In [None]:
print(counter)
print(len(name4))
print(len(name5))
print(len(name6))
print(len(name7))

In [155]:
in_folder = "Chile/full/"
out_folder = "JSON/"
# filename = "00a55afe4f55256567397a68df5d7f97e642480b" # This is only if you want to test on a single file
# bag_of_words = {}
# sentences = []
# sentences_dict = {}
json_file = {}
for obj in s3.Bucket('wri-latin-talent').objects.all().filter(Prefix='Chile/full/'):
    if in_folder in obj.key and obj.key.replace(in_folder, "") != "":# and filename in obj.key   # Un comment the previous string to run the code just in one sample document.
        file = obj.get()['Body'].read().decode('utf-8')  #get the file from S3 and read the body content
        lines = file.split("\n") # Split by end of line and pipe lines into a list
        key = obj.key.replace(in_folder, out_folder).replace('pdf.txt', 'json')
        filename = key.replace('.json', '').replace(out_folder, '')
        print(filename)
        json_file[filename] = {}
        line_counter = 0
        heading_flag = True
        heading_content = False
        has_section = False
        json_file[filename]["HEADING"] = {"tags" : [], "sentences" : {}}
        for line in lines:
            line = clean_whitespace(line)
            # Processing document heading
            if heading_flag:
                if "Tipo Norma" in line:
                    heading_content = True
                if heading_content:
                    line = full_cleaning(line)
                    if line != None:
                        if ":" in line:
                            line_counter += 1
                            sentence_id = filename[0:7] + '_' + str(line_counter)
                            json_file[filename]["HEADING"]["sentences"][sentence_id] = {"text" : line, "labels" : []}
                            json_file[filename]["HEADING"]["sentences"][sentence_id]["text"] = line
                        else:
                            json_file[filename]["HEADING"]["sentences"][sentence_id]["text"] = json_file[filename]["HEADING"]["sentences"][sentence_id]["text"] + " " + line
#                 print("**", line)
                heading_flag, heading_content, line_counter = end_of_heading(line, heading_flag, heading_content, line_counter)

            # Breaking when document signatures are found    
            elif end_of_document(line):
#                 print(line)
                break
            # Getting section headings
            elif (uppercase_ratio(line) == 1 and len(line) > 10 and line_counter > 0) or is_section(line):
#                 print("line--", line)
                line = remove_accents(line)
                line = clean_bugs(line)
                line = clean_sentence(line)
                if line == None:
                    continue
                else:
                    has_section = True
                    section = merge_concepts(line)
#                     print("**", section)
                    json_file[filename][section] = {"tags" : [], "sentences" : {}}
                    bag_of_words = add_to_dict(section, bag_of_words, duplicates_dict)
                    if section == "VISTO" and len(split_into_sentences(line, ":")) > 1:
                        visto = split_into_sentences(line, ":")
                        for sentence in split_into_sentences(visto[1], ";"):
                            line_counter += 1
                            sentence_id = filename[0:7] + '_' + str(line_counter)
                            json_file[filename][section]["sentences"][sentence_id] = {"text" : sentence, "labels" : []}

            elif has_section:
                full_cleaning(line)
                if line == None:
                    continue                    
                else:
                    line = change_decimal_points(line)
                    for sentence in split_into_sentences(line, "."):
                        line_counter += 1
                        sentence_id = filename[0:7] + '_' + str(line_counter)
                        json_file[filename][section]["sentences"][sentence_id] = {"text" : sentence, "labels" : []}
#         s3.Object('wri-latin-talent', key).put(Body = str(json.dumps(json_file)))#This will save all the contents in the string variable "content" into a txt file in the Pre-processed folder
        


00029986237f77d713d2cf6451fb9b4a88eeeddd
001d742a7ee8b2e5b2583e0683d5e8dcb59b65f7
00251ab539abbc4c046593236e65d462983888ae
002c53058e85d383b057fa4cc25a6eb8e7d401e3
0031d55c90473158c09acded547d67d44be22325
003bebe9c5240e82e6de09f37b43cd4fdfdc0aeb
00434763867d6101054003829bfcba002581e042
0058c31b175befc120f2d0bebeabc42f02fbee3b
0066d2d2807026ad0439315e32fb6f8ab5d94af4
00a4528ab74327e2e52371a04741a6b677d80889
00abdb97553fd6745ba97425ff9a30fa8eb370c1
00b277afb2e446fc065815fb3dc14b114ab802f1
00d7509c395ad525fa4155a252fb8defaf1e4272
00d91bcac2667290e3fb42452f5241be28b031c2
00dce06383ae811f931dc45cba2264415df5b72e
010c03f3b734f9ee20c368573841516f798b242e
0117f4f5f4e792bb00aaa18da79d0562a0f3d3cd
011c604576f0a76dc3f485eae864546c899dccf2
011ef5183b901654ae82d86d43e5127ae982d032
01203a974410a65782afca6ff2c3bdb24a84b158
012fe65d33e065e1c381601f69644869ce4ce3dd
013a174010d4b8922489c85ccb2eb1ddf75070d4
0143181b7331514c749b91578a0dcc895a99f6e7
014c682777557d5a7333f178459e4959460dc9d2
01521829cda7b792

0afbe62a106b6bb8f3e428ebcede722b814bcb45
0b147b03fae8b3749c7c87cf71cc60cf65555972
0b1eeb72b07be710a741905d7e881d25104573a4
0b3e572f38a4b12c3ac0bc39b2dcdbe1b1899ab1
0b4f8a204086fdbee9d94e4787ed4922d50f4f3b
0b70130c5bf376068fddd600abd4283e0ac890b5
0b7b9a648430862c1f3ceed43975d5e829bc249e
0b7e6112a08b5ef659d584cfe7f0fe44a5fa0fbd
0b930613a9e2edc4aa74459158571cf18cf88369
0ba73e1e07593c7fb0d57e51d7ac098e92900f43
0bb3ec20217cfb7c7f284233a4ea8c7bd4306139
0bbb09327383c3cc7800f7f967ac5db35824ef3f
0bc0265fc3791c224e2749c5618f17f840efa1f6
0bc88d272496f9c607746e42f645dd66d70c1211
0bd631ef85785f3c67f82887616169a3e03c5a69
0bd79229d053d3b541b44dda3e3b8d998862acd0
0bdf6b16840d1dd541c3f88d11790d0d12c5f1a5
0be53663d79736adbc1afb2233d158575565065b
0c00321d185c0509cf523e231771f8b8ecc8fcda
0c0c7cb4bd67b3d02f470c79f920b3976ae236d4
0c0d05034c0ee120e3dd21907aafb223afbfb82b
0c10275a68d2cf7289543541b1b39c2e70a59b11
0c111bc04f96eb74cff0994324409601622a6825
0c27bf104dfd31a1126085587e8cbcf56897406b
0c424a0294c1a155

1596ac9fa31dd922455d9c730538beecfff9afff
15b1a56ab1616c0846d260420b17c569ed9ddcc0
15b7d11e1fab5f2db6d7b13f23db3e33abf5dba9
15c0f430477578acf87795d6eb36620d73900d25
15c8f2bff9b47dfec8c8513dec7e92c949ba1979
15d8bce39a38022f9d9cb563c30c66be4f6be71b
15da18582d618b1e09998b2283e6ff986b71650c
15df6c5e0c3fe155c9a0eea3217e30390e9878e6
15e0984ec98dbfebf76e91556519aff4dc5c0b0e
15e41ebdd10026ca4c70c71a525fa8a3b5c822d7
15f9c97d26b82e73b86c1dd20ee8e588cba7817f
161bb8da30f82d3d23f64e77f923671daadcd8e0
16241a798f82f2960b5b89eef789381d98beac3e
1629d99e4086f9f75bf4b4d7cd210932d825fe3a
162a419f9574d1bf372d4e586e278f8a2adc5043
1644d19808a33b90204c96f2f5f4787fc5976223
1649ae6176f2a64d126f8d799d5a2e78bdb611b6
165b846d1d03b08cdea1f0f83e0fb11c41b3aea9
16bae5eb0275dbc651661d2dbb967bf0f053e6ee
16c1d7d8f903469fd942f51460911b9e97f8ea75
16c3aedaf53c7de400fd54be2143cae6d2d61153
16f76a8bfcff5756e5b57bd4ebb15247eeeac011
16fa7cbff33cb72ce847855deb1d431612484d80
1702d377f29248f59e95dfc1b2203f664f2d426c
171015c688631f18

1fff589f6e0de0d3739f878ea52e2dee0d71f0da
2004d939d73d31ca3da4069bbaac2bc161712490
200a0b8450e9d6e4c07f7828d70b452434c5b8cc
20182ce9b91a0e110390dd78560cbd4405718fa0
20300512a4377a648ce07c0a0ed6aae2b271e9f1
2039ea0496d29da563ae5ebeb922690c84080fcd
205444f92ce5586b56fb44802d79c9b39c71b0bb
206b87117af8b34a2b779275cab10f3aa96609a0
2074a5a00f1be1fbceb9688c94d39ade8860ac20
20819709c71559440ae86ece949ad139f816ea70
2082854a36e9f45e20bd99541f093a7bf3725298
209284856707d68714acf58b5bd4fdcea4282525
2097e0b117531f12b73997232256fe228a57ab16
20a0c3f42b50203747f80c9a0fd3de60c208a24c
20a5fce72f0645100c1eafa992896048838af495
20e6870557b54215027cb60f2b6fe71dc10a4100
20e94b6b690d67993b4dd1822af3297030662867
20f41072c01d206616be7bad40520216c364fd78
20ff6eba5ef01991b2fbd0fafddec143d02c8f37
2103cfab4871b6b26b0d3ddad5971fdbb23ee492
2117a8fdcbf7e10003e59eb35b8476d124396eca
212756172aac8df20fd718550e07e15469c02206
2130715008c6edafbd690ab85ea922f0f8e56ab9
2135356d9f8b75b126d953cfe0d03027af0e202d
214b071b61664c62

2a8141b63350fce271ce17c6d9009e1f043769aa
2aa00ceff4eb74bfb259fcde5f06bffb3d6fd9b6
2aa245f482221c82d9011a54560b4449f60a032f
2adab34f756739ba912ab4c1998467e8ca3def64
2ae3a6db21ec34e666fb3fdfbb3f5c33ed3ae21a
2b1a208e19cafe508f909da2a416aa16a688555e
2b1d8026730e5d196c9a0069a79247647e10ab13
2b3fac988e164b88eec201c5f88f64603987d8eb
2b5530919aea16bcf1d5202fd469a1b7f7a054ce
2b58d535273d818ea346d26343a472825559e110
2b664938b7f2e6fb84edf02e76037bfadf31f19e
2b7b845c43e66d49fd5eb0c3633fd9db431a8568
2b957f2a2d0d6020006a7ed9cc1e7ab24d6aee7d
2bb3805c4c725d479752b504669c33b8769d43fb
2bb5cf10d8316fdc3e1bb4a9cb875b62e70bdbe6
2be356ad88a451ac2f9ae3786bb1a3b713f005cf
2be6f61014066c27b3257c27c3fd413ab67079bd
2beb09d71a71fa676ea59f5b4f8b4ff54163da89
2bec74e97c27fab39d06ffd46bbadf6f153f952b
2bf5c54dde29e9992fda0055f1e858f9bd9ebc72
2bf6e9571a715b29c16595f3547aba4b09a1fce7
2bfbda94ce2633ad65d4007450a1c9754beb1101
2c298d4de54ba7f98d4857edfdbb43ed046bf440
2c2ec8c610314597fed11bfc813b5c056f8c7a89
2c3301d18ff12232

3500b9274a4fc4efa29bdaf7d6a690ca30e2ff9d
3502444c237fcd6ebf3f1182179211f64597a800
350fe98d57777c43fdb20a14eb1405995184ae0a
351dcf4faa45603936e0dd0eb9da58a2d59c0d34
35213261242a00ef59cdee2477295f5bc0df9178
35387e5210de79ce39facae812925ee8dfadf051
355591ff0daa2edcb4e9cd30af0c8e696e9545cf
355628892f57d938f87a206e0858aff31dd212ad
35655b6fd3143b5b34f29d0d2deed3965a6669ce
356bb9abce14f17d09af1da6cb6599f75213f6cb
3572e8a7ac73d0cb58729fdb78ec4b959f66930c
357c7e54b1be25bb2d5e000c9108a6a18114002c
3584f48a5946019222a4bd3f169f974d358f55eb
3593243135d39990bc0d84e7c3679796b2e229ce
35961027584dbe46f18ab98998a799c52c7c00ec
35a1cf25a698e50a7814b6b28b58f69534980f0a
35af8cc74cc77bbd32293ad9f83aeedff2de33d8
35c0f8602c080b3fc8f97a350ee7d782983ead7a
35c31f6c828275218c188e609a3880731fcb4971
35c630734a9b96e53193a1e353d5204fa7e528d2
35c8ecd965014b2d936dde976aa5080ce30e6b0e
35d6c9913771ed8965eddd2d54dbacc7a51df941
35dd1529f888290eabae39556d73281edc284ade
35f24ec833e7aafe3ef04f714ce7b0587632f369
35f3e1fdde8764f0

3d181ac74fb0a8845ffd3352a2f6f092724660db
3d1ebdcee916fe2160104ae8f681dad7305f3201
3d29ab416ecd567f9c44e9e31b2d3426107e0aad
3d2bed34aba161da93839d5243532b4648cfb01c
3d39f637abafdeaba08337668117845a596b32a1
3d41334084d5cf83995094e75a2dfce0c5192815
3d45819447c4cff0667789e00d4c7eadb29c360d
3d497cba57c433fedc02ff960856b594e4d3de3c
3d669cad57d949c30993f5cf393b4c8773071fe4
3d75a978a21cf776f6bb2e96b8428f962513dae9
3d793de4cf88370034034a65ada6d5e1c3573221
3d7c74cf305c4b4912aa0d440c9bffc2b2c0a50f
3da552e6511ed42ae0fda23924bd6ffd490ff1ed
3db424a75cad6f28bd5c7ffdf63a24321949f269
3dbf0a100dc24656a95a485722ed2efbf5f2b8cd
3dc675ab073ec4da60efff6a176fde1f5d49ef3e
3dc9b3f08b0e7c8fba955844c6667291b345db04
3dd46487a1688fbdcb096e23d274763acb9abe15
3dd4e87dd048ec5d56164ac58526177f089d3ee9
3de7460cd503eb378ccfcdc1f6afccde8b899ce5
3deaff3710ad4538119fefcf658e568212d80ec2
3df1effc961dd8ecc8340c23cf7972d63043dd5a
3df999f2c1141de129a767de2b057918a5dd0aaf
3e0158de2fa24f2294323740d511858229736111
3e0bd802bd116d6c

4719e979cab66208e50d869dd7a0026cb35dcaea
471acdf4f214130b551b30f1b8eec2c31d5d0b50
471e30e5a66734a75ca29156056dfe526de63112
473c3e05ce27a7fad79b5cb5fdf928b30e647f22
475898f6cd6f7c7be58580745577c5db79425cd3
4765af15f44c1ab2e7ab94cebd80b74caba5801b
4778f24c1d7381e66dc8765b2ef813cef2af3c93
4781b045b23748518fe9803c8576ba0fb9961635
478aacf0517210ba0526b81a8fbe25d19991f63b
478b11a69fdda9b1bfff75451b0b540bfcc12e89
47926f32d28bd958fde6d031886c10b75b7b1950
47938d699855ba1beff8f1579737641178fa2f25
47955b2f33ff0450e1ef5805c08eecdbc7b83202
47a3cbd9f89ca1983fe7638135bf7ea508a1b1ab
47b517edd1c6e761751e70bbd1eaa295be6406b2
47c7b39ab4e751b39ac88fa276cd5151e25d6db3
47f8a02405e1dc4919b79e21243655a00e8da920
482155e46ad81264df5ace2d8d7819ba4fb8570d
482c29198c8afd4576660e6031845877203b6c34
48322e812b12d184bc627a4021fdf1a1fe52a535
483e124dac64193afea61d247d3a68e12f5982cd
4846c8f7003421c9ceb65fbd78c5c297ae03b4e5
4861b118e82ca4cc7f80a0a3114d6689a261502c
486392f2542a41e6bca0da2cd768b4ee4305bbb2
4873e53e5cb389e7

516cfdf7a8100e06df0e91b2e2fe77b0e1e7e317
51728af9eee0eba314b8a8c0dd75b7dfbc5dd4c3
5189c00aead75b983c1d41b56d2fcf0456fc732b
5198b3aaafbc09488b1faee1eccb60cb9f5bf3e0
51a26837781e2f640703b430b482d089d9963d50
51b872631fe92a52877126747f8b9722f583834a
51b977938a337d82b95a6550fdb842904c88d807
51bf5af40fcdd3172decd15fca8389eafec4c4cd
51ce2fc40408819197e31e8f932287e75b03106f
51d334eb3e2c272cf9510db0d3f694474fdb2318
51ee6db49fa320e4324a9f66d8c08416a9970ed6
521ddcfc6295465acbdc5f8aa3f3e06d33bc4abe
5228b48949130da8acb20dd0dbf6abe04d55a969
522fea89babeeb86ee67e5a430fff1a56935bd83
5235ba6a92cc92aa9c4d164771874a53aaa30ad2
5249fed351d7ebffb26a4ca2062d261c44b3aaea
5272161e02132828f73ff1a4329d5fe11722b8c8
527f891b1200acb16478578f829fb33989e89b49
52837d02854634c6808dcfbf9d3d4828589bc752
528558f8347f00b0b1905296d019a0d3f8f34761
528d4269549508716ad06db6b59ec13c8ee1a6e8
529427b75847fb6b671b836fad926cfc4ae4cd6b
52a3a10d9521f28d52a82d24173947778bd1b319
52b2adcaa6b50f68feb453fa0cb127144cfda5ed
52c0753a8c0099a1

5c65261f0f60d7c73aa493c60d2737a95c69b3e7
5c6e9766fb3c711c211ec97f713921fab19ad829
5c7c4631015dab80cd9d96f008ede7164eebe7c1
5c8832112743c39cf24c3027a058f9a1c0814f25
5c8c21b4316c1890955595ba71ff5c7516a7e003
5ca477f673044c0d63dd5a9a84fb311adfb9e636
5cb269f189c09464710d37811e0217789cbe7397
5cc328bcf2549f5d89a3dc4144069849178a3d91
5ce1f44f5cef47a4620fb44bdbaf070d4964c280
5cf62c7bfcf6f46e5fc24d8c19e7d7333de93ed5
5cfd321f4a4f0600038ab80d5608838524920d6b
5d056d1eb94edccde26a4d5e216b895efd85a198
5d1476f3cccc39da0e87b2dd8f4841e27fe1dd0c
5d15c61e7487be266a77f73340468c4f4a2f210f
5d19b82afcf19b5a064fa62769ab28f47b2a078d
5d265ab9bf58f599ead592052427bda1993968e9
5d2c9d2d85c4f91b6d19e1a95d574412b05458c9
5d3f46c99c8bf24a8a6d9aaf6498a5855fb8350e
5d4a4aff40e6e7d509c7c87de32194f00e30e4be
5d4e032b1c28347a6be5d58be2d27f136f7dcd4f
5d6917e565716fa656040040f6230315cf01d774
5d6b122dd5df1fbac1006e237aacc85e04ff8b0d
5dadd2b5c5518ad5c8be64df034dd5230cb2ee63
5dc4f26e0c2e3d70d17f351061c1293f432b7592
5de3f652d9b8ec82

66f9a26f627b8cf81940895ba22158b7cbcb8cde
67098192ea159930dc26af58dbbc19c546561c74
67240bec069de5d091f60b44093cd07c2b3836ac
6736e82dc5dd15678ede467d015fac06bf7ee924
673b007df08dcb657d40ed8ae0d2e53a835b1cf1
673e4e1e9c1b6a92b84a6256121d5473802b216b
6751e6416438415a3fe89f78fe128cbe7b75126b
675f3ce1dbeeca713d983ba73036fd4d5b5bb3e0
678e913327134da4e8fc1ede83632b0aa119fce9
67934b3ded728287b9be45ad97c6f542107e7b7d
679601f0006cf4eec6c4705146471c9cba5a3d40
679ac085617badfdf76ae800458c88e6432f436c
67aa0b2f5e24d8543cfa2eb5a1b368c508d7a9cd
67ceade4ae75d771090efcdd922981164c1ab295
67e430185003fe7c23ef4e652ba0d7f9c7c83899
67e9257568018d3d9b0a120e6c9cb3a8630e978b
67f07dc5d885b1ca2d907d5210a9430d329beb6f
67f81eaa0204d07b518fb1ecdbcb1451550fce15
67fe0f3fcfe8b215875869f03c1ed07a7bcc1e37
680396bdd36a2184de8296cbdcb89ffba99a5fdc
680d2f70f4f95a679ddef303a8a7b066d5805861
68254285d8c32faf5919e5cab647ab23a4ed16a4
682d13446290731ebb9512a89c281721b68c2f9e
682f7071fc731dda73d7428b82f4129e47a643a9
683d0b057a57a8fa

721c5761d25b0fb6c3b59d1d4257afbbc38a8578
721c829936565a0d278fab7727516ec6385feb52
721cb0014c27977b370cb512f73b6b9714e72b96
721dc5530e29278f5a23d15a6ff5d0d71f6d2433
722ff55d33ec84acb4f9928d73d006bbd05bd44f
7238bd1f220a8699ce09303e82324d4c009404c8
723d1bcd06e889e8b98088e87b6a88fde7c446ee
7246e6d24429e23c4027209909ad478dfe67beb7
7252587f58f52aa5a5ddc195b95db7987024fb5d
7253f0017e0c5e2339b01dd35ef9b5ff519f9eca
725c8efa454cc47b6f97ab974cfee6bca6755868
726fe7c374879bdc8556bf7352ed77428c2c7c17
72885aba2cf8a8c76f1b7456cd8dc1109c8c4b4c
728994926c5b4098443d3796af73525fa5c9ceae
72b391fcff6205498580c2a5f06f9ca1a58b843c
72b3a139a34a244eaa7d928235fa720058a391f6
72b50bbf8385a5204b7d14ff05823c24b06b0594
72bf3251880ab28fe112cc74ef793f283f44767a
72c5684fa5e83c9ad0572533437e85129574b058
72ce09a823e05958552486444251896bdd48a155
72cff83e22c17ac8d6404e66770fa7f449c0a41b
72fb5bcf70ec36664b9b1e35f6cfa945c2659dea
73071c6148019d489106c130f850d527bcc3146f
73158f8b681942774672af964c2a95b3546c59bc
7315a9f683946662

7d3906fdc6062fb71ee4d6639951cc8289967ff7
7d39f79015499ad093f78a984682539aa6e80739
7d429acbafd30ddbd16cefc9a19117531a7f3734
7d43803e4fc7cf0cf8df5c8737fe23f184a407d8
7d445c1d6e3c424220ebb6c4e5ceef77ad06c483
7d4df689b4bfc9e4c0050b9caeabf7eec0f0388f
7d4e8b28f2698c3935e5910cf3009e404a71b98b
7d5bc5c31f1fd64147986754f894af416f5e6b18
7d5faaa5371c3df20f4a1e11891de24975ce4167
7d6ce76775e117803ac4419440b6bf5c128bcfdf
7d7b29c889203f0a4971558db8bcf72b781fc9fb
7d7cc0a3fb19c3d0732ad5f29f8c9d3abce2443f
7d80577d5de1e8d93e0beb6a70cd70dc747602f2
7d848b49a8db5045426e08edb013261f63fe5aa0
7d9fca81bd0e47b6ed694386d1f895ba448e1706
7dac33302c41d2172559878c65bec93f1a472cbe
7dac3bcb47eb3be4f3f18da17ddb2102878fab05
7dac8db97fef00703afb3f812f310c0a2b97e1ad
7dac9fa1f84e1f91630ce6665d7d50a0d22b2ab6
7dacf383be0a4b1cd6bce96a9a076f905b65b90d
7dbb62173366ca613ea2a2433e8554096dcc9a3d
7dce94fc201131369f8aabf6085c7253291b8700
7dd27cbbbe92d1127278d8956e80527050951774
7dd45bc6d45f28ad766679fdcae6a1ce37ace267
7dd936a2686d3b82

87705406812069cd73911b3e3eb580ac1e27895d
8775f75191458a957eba87f92510ba86c180f53b
8784054be5aa5ffd8845ca00e5fceb4389495820
8784d84cb830fbdf7093f74ec1d00aaa5e42fabf
878ffcff2c07213d9b20af3f3059158bf697a363
87a9122764425768063242039b86e0e70319b1e5
87cb9b28f0676da27a5df534c9b8e24ed9bfebe6
87d0af4e7374f9eac2d64a8fdaeef397936d003c
87dba98926123fdffde15e4db1503fee46649749
87e072fca429890ca5f7ab44969fd57ec82fa36b
87f3a5d5d86e3e7ed869c7b04d45fb2fc6be9b66
880fead600084726d70c63c8051fa36ef31c2811
88156111205e61737c136c123409bda2ce3a073c
881987fb4c7b4f81bce36461e8f5ad3a7e5c2149
882ae85b4ae25db3c60f923cfe948b0ed028a045
882f78ba12ea83c69edb3864fe1a32d41401c780
8848bd50a787e6b59c7d3c3b6b32ffe7daf5fd8f
8867a3df3511045529e85f80340d6bb7dd8902eb
886853870bb93b4c4ee9ecf1cb74e97ae62dbf18
88992c5f231b4b121ac2048a274c4efcd84c8bcf
8899a6ea0467d24060229c41e9e96e8df1db4b83
88a7a8a6384aa823c20e7db1b62478f86448a165
88b7941e9a0548e0f8578c966b91a96d08b7c93c
88b7e096ad977c9a50bb89a9d7d5ed26d76c9bb8
88ca9c3f71c0946c

9226ff422ce354ccd2fb19b8888f8794e5cbf0b3
9229a779b9ef30e12de9b867d1a4336e9637cd69
922fb2c2784b679cbd44847f0f939ae91340cc61
922fdcb6bba36f428a034eec39aa45588719d53b
925071e546e42bae3899f358e9172919e935259c
92554b4981c50b84c21360879c5a822d9c4579cc
925e34a2e28f514fd630a0178bce8f26eb262047
926a28a2460583abd7f6894aee57e2c26ff80664
926df0e915ca6d1f084f7de91bf3e8f76e7f3644
9276e7ea44a791b12324c8439c5418ee9e276c0b
9277893c7f5f4e8bd34e0aa8e4294bffcc501fc2
928158b1fa410b8b8e3d82c2f6e357c378b0ac89
92827d2f1ece8505512d58cb718b31ed7bedd869
92888bd8b9775bd8c881d3ce735467c32e2cb53d
9297b050559af87c06efdc7c305ff5dea2015166
92a4f74e202b45b5f192d671d35664e236684347
92a73097f87a1dfa13eca923cc4d6d3cc17c3c44
92aa1d22488c44fac0595d109fcd95798d7a8f88
92b1ce42535216637e0662c7f0baaaa9c5db1b59
92b8f9b3de8895dc11e0045fa819ca9e1c4099da
92bfebceb81ece1ffe8296a6aecd7118b02401be
92cb4265cc1942746e10ab98d8e573607d0d1a5f
92d62973e211188b10cdf909b5abfc1bfe415b98
92ecf28f6896fbc76fa696cd21560144a13b78ce
92ed32841a1ada8b

9e78d7c1e8ee79c1bc66ab8285f51ea61f49f4b2
9e7b3617f0710d3cee6376eb3cac9843df9d3e63
9e7b8ab6765b4f019cf0821ee552b49c210dfa50
9e90546222f1c53f77fd1c3ee94743932a90b1c7
9e9729b9bf5592add02d09c2d1b908a9f0636f14
9ea84cb2ac9202084898c7bc138b65e278374cec
9eb1dcf8bb49125d6e837f00d938840405e35fe6
9ebf6b09b0fb94aaa61f6525078aa810b75f4209
9ec895f12830ec72176fa7009c9c091bda42c599
9ed06ae16590dac6b515f9c087fe91024a9f72c7
9ed2f923b08d7956eb19e5170bb4a833ed7585a2
9efa366735dc640c474cebbdac011f0fba0ded55
9f0265f3bf6e66e8bf39f823f1a487ff6384821e
9f07f3c871845c7a8354027ea8fd36d0c642f56e
9f23ce692e9b1fbbf13243b910a8086d7455bf7e
9f2a54a12d44d68ce4083ed074f9691383dc7912
9f3fbebd503b58c602ce2b82b2a53564b73f7811
9f459a4225a5e1c5430fb928a393eccd0b5cdfdb
9f4f5654722b27afacdec3bf71943daa3892ccda
9f557ea4989987432c4404f58caf7a44f4398f48
9f55bb88a736a19f6d01d6e1cf5566775c15a3bd
9f905a8ac43e269cf26a298eab27326743b08458
9f99392749e45dfd0d3b9ee8f524601eaa64a46f
9fa50f9c1143b794a8b07868e2c0bd1c061e3bbc
9facaa5103f41650

a963dd22a888562e9005cc1557c10c4c7e0851b2
a9657ce1c79e5980dd85368732665bdcae2e83e4
a98c7c945b6e0d8c57aabc66dbda0f558c67b07c
a99cba03d436ee913feefa04da844ce18eae092a
a9acbd11395e900a0419dc6e7e43db2a9c34177a
a9afe11e007c831641c28ba448a95d9611d62f0c
a9b3a47b1c1851ce64bf9c441bc9ba40977c11b1
a9b60cf764fdf5bf857cb48c6e8b0d60db8079a8
a9d53cba8c63187b963771a6f95638a625682567
a9d990a18eeca0652c9685e30c8ae7336875f27e
a9ef1f188246ec69eaad31274c3998df27f78271
a9f027000519a973366c1321842c73ea6abaa938
a9f272ad75b67f172c894228cfa5b49d41106dff
a9f5b0bac53caae4ce2ac8bf9b00accb1936259c
aa19c3151c996b045505675f39e42a575e0c64e5
aa1e39cefd7972d373531fc6e39441192c991f78
aa20db896a9cd9b400e8723821d7dfa85ca620c5
aa2335b63010f0913d3cbcbe735a9ff9405c4bb3
aa26ff96ff13093490d12a0efad34bc5a3d983c9
aa2a459b43913224b1b442299e87b1dd262c044b
aa3774700e87acb2b59a211930f9e50a0590d0e5
aa458247827a73dda9699719d235bfe8bb87bc8f
aa75f3718f7bb7238b81699421c6eca215b8a38b
aa785d32f12c00166e18c102f2092275ae804c83
aa7c2b5183e259fc

b43440e1998c7621bdb97d65207e1991f1762e41
b43ed535b56577390ed2ba5fd62e53be68a32cf9
b45045f9c5a912abb25b281f0e02d59df87c1dc4
b45d4bde10629422e369454d2592454b28b34449
b45ea4899412907fa2e281c0673888956110a720
b469f00d5a7ec5ef8ea7aa3221ac4f006d2487ae
b46c649309e15279a6529acc30ed69af74f81bdb
b49916752fd354d2c4cdb276a3616dc8a4a18ec7
b49a8e7d94bdbca9ec395b4f4d8a5208573a7b4e
b49c1197659366075417f77557bd22b3f30607be
b49e3b0fd8af5d0211aa2f50223e13fb17f893b4
b49fcd3fa6fc5d547e7f90e61848c45c32dc9206
b4a3a9f1615d036a852d0a84d4c3cc97dd11a600
b4baf62aca29d9dd50378a20f03b146e16dc93f4
b4e1293871781750a979d6876170593cec90694a
b4eb9c00749c896b87857af341be186d03e3432c
b4f06bb5079c46907a9c5911530968e694456c8e
b4f92f4555d3f732158f8875044ff6a7d85153a8
b4fb31ff721f20f4196e690b644ce3b6ebce34c8
b500ad762291ae42640758cc9b42af095474baa5
b514aab7881fc8ea917fbabca7b7bb7afea25540
b537c532268c331b524ac802821c88ed45c4d3f8
b53cdc803fe654cc676cca993957b60f6f0eff9b
b54063d048cdb825cc550b8bc7ee19dd34cca735
b54800e6b40cf767

bf8a6b665434d97fe0e3a2caa4ea936aa34a7cbf
bf8a92e71a7569545d792d7e5821a17c73d56f78
bf91d92c717457caabab34c72d59f17ae53d21b8
bfa97ec44e50cdafdbb6ed7c37de5ea7da9133da
bfb7b846b757b44a63ca7fdcb1d3a5aeb3a28208
bfe4f8020de28d0f4edba8c96f84728898af36ef
c01138ac6850ddc2cb3067abd8debd8dba441240
c01865d654dffd9a0046fe54f5fc736a9e887386
c01a91f466f55113c07479ccf565e5de7532d8c5
c02138d745e30a30dbc5102b14a50ef53d300ccb
c02e0afe37a6097a2c0f06413f73ab1e3a827daa
c02ea14841e5059b727f4698f9a2541f94ad32b1
c036b4b324de2cd2c9351da9d139faa6fd04343b
c0387f42c5b16b560504b5978b890cc176171811
c03c1eeeefed218e41a305f292abd3dd3ebb8809
c03f486b43ac5057cb07b0f69fad70755a25b544
c04cb7d010b845a6e7530d6cd3a91af36f16a615
c051694e9d48125299a9474fb85d430249f66bcc
c0536a1cca51bce5e13ab901c88172e91e50de78
c05ad89214ee8c38b765d0af52c04add21f30000
c06ca9223c3548831f423abcfff7e37ee9c8699b
c0762d3fbaec94583ad4c7df0e1055049a9a1bf8
c07e19be76b50554342c10792b8709503e5c056e
c09d1ec8e414477b3c7d57a01b258d6ea7e0c7e1
c0b794abc2c71568

c9c557dcd5c71cf848afb9dcb786d27eb0d2fb95
c9c82225009e6a1ab23ea9b7de35fe8fad427f30
c9f7f0950beb9b3929408d0d316128f8a05d3865
ca0b64c00c6719a68e25fe124aa90b2cfe764d6c
ca19204068f26285bb107cdedcf3488efdddb4af
ca245258ed4063a114e0e86e670f47f92ab905f3
ca2c78b103a8e26e99c5b3f6cb96b60b35abad2a
ca327ced11072a97193e62eb46206b380202a9f5
ca46e44f3baa63385497a5ba788b749a872387f4
ca545274e85e4b61c44f9eb678c12febc35b52e0
ca55203e157141b5fac62de577e7e48db09eaa4a
ca8dbfabe35b0a02797c2d91c7be60b69c1223eb
ca9b9bfeecb978da4f624f6d3bc31da40d68708d
ca9db5571d184dca0dc4c5f1c31d497d8a8ad3f9
cabcbd7bbc42b439f1e0286b383da2944a027e52
cac22969bdd35c72fcb60dc2e761a4a0cfe586b5
cad19851b20b07c5ac45b130020c1319e005d10f
cadc84a1f578858c207df7ee24d7a25ee8ccc409
cadc91fd7965eca60cd77f03b3d25f8e5c397819
caea11a482b3240112739688c287e85444bf6741
caf9539489e46de82865f90d8d2ce84534f92a1f
cb01d51bbe4b866c4203489de18936e50212d137
cb06754c5973b8a0ccb36d399a56857714253714
cb0ac8267c869fb999a44f2cffb96b320603eb8c
cb0cd3a712ee37a9

d31b134f962d582e9c3c42b3db806c7ed456ec20
d3312a64cd8641470dc2e452bf8a9dc2455bd6c1
d337f9cc325e66ba2ed7ba75797cd2ea857e8ef6
d34730f5e027f53a2df500b1c2e1dee33d1cfb95
d36104aa0d3122d64e8c81a6655aff40eccfefd5
d361938f5a993b909a3fe8207d67187142911a2a
d3859dcccf3088a9fb1bdba2da540ea3ffcaf4c1
d38c3d28151ee4d7f34b2039a7fa55c2e30c4493
d3a6d4044c32077bb8f2aa03bccf5a62fc547781
d3b3f42b6046531254b1c4102906f1e14ad113be
d3c297530d3a98e203507c3f3b7cb94f4a474a31
d3cba73fbed3026c9d009aa037ac316593234fb1
d3e9a2f07ac8b28f77ce5d6b6ef400491ffe3900
d3eaa9b0e1aae0f8e101ed442a7451d71285fe8a
d3f6004c29eec6a80bfb2dc545c023daee23b745
d4084074108f3d0107b1219610874ec0f4508083
d40daefbb3edc67a57d39fd058f0844f6e1fa57f
d4157b2b75458cc6805899199a88554712367dc6
d433680264068b05b132f1031e0e9b33d8260485
d435607b2a8739059f711edf213b1caccbe880eb
d43b2689fc7c12881d6b789c04ca9a7b58622bef
d440c6e9a22573d32b99f740d4e01f48cf5e116d
d441a4b2584d8a161669081d4526198ad12bcf83
d44310de745a6ce8652de712a84f82c3e957773d
d446850b37946bfc

dcfb7ee187173ef687d176221c18ec955fc9ede3
dd0948daa0bca0d60c7a58e82719132bd428cbbc
dd0c698fac1bdba159a33656a909b6940ac4f031
dd11a652d43b46592ae466537b9858a879ddf9f1
dd45cc80debe8dd1b8badddc07593cfd9cf0ff7b
dd486985418af10c142b659a234f92bcdc726d57
dd5b850804b6206dd8cdd2f6fb559aea6bd1b555
dd619c0e9aad1c51a51e631a3708673b310cc930
dd7c26fec1fa34ad881afe94f6586526b8828b8b
dd7ef46c60014c25eab8f4bab72c508beae063e1
dd89f53a2c4d6ce04d4572f3e5c8593b1647ccef
dda974c65cc55a8a8ba0e9c400564506121cf5ed
ddc3a907aa7d7e68a96c619d32e1fd4d4061abb1
ddc629832c7570e12df2ccf04e9fc094a020be10
ddca44d61d78593024f277d57bccfc9444bef360
ddcd83b71bbb408eef35723e13b214aa59fc8cf1
dde4cd9fbf554f84ca95c0928b429ec9849855ef
dde82aa19ee1d94be77f013864a6bfb1daabb124
ddf83577cc3666fd1e5db40743006923c2775cc9
ddfb3cea315a03821cd311d40ef7ab74650bd34d
de09e0c71f8ddf1df0134a6c7c47f958a4e92c55
de25f2053a80780051c64dfd868c51e8e3e3012f
de284c1b02194392d4d2bbfb9bb6b3aaab506629
de2ce59bf8ef978874e6828b9788780c0f11e902
de5bbf712ffd10ea

e7e4dedb4d5fad9d5dc64e2c15e4f0c0a42a63de
e7e5efd6eebf1f0566f9f2319e703ece123dd1c2
e7f0ce5fe8e588406a113719e1a193470461c5c1
e7f407ab25b417c5e4e50e5340d5b75e7b312b5c
e81f419ebe6622a9c3f6708738deec04eb8e3b63
e867f03077d1ef1cd30eb834465ed6bee0c06605
e8751f1a573df3e8bd1df8c425bc4af81109ffa8
e87df731def7af40f9b8f94dfab40e8717e920f9
e885f4fc3c59d169dfb281b350f5b68833ca9af1
e8885a2dd816f8fab8e1d67f33410b4658d75963
e89032703b44b61f77d17c4fe1d6065604ba4f78
e8b0d9e7b7e4b6b59c46cd42fa1acd9b6b7decb3
e8b2c382465c15c072eb8638bf4a7e99affbd4ee
e8c8d22177643ee1b4e41881370d63cbcef02711
e8de88cedde97b2e38803e5150f64a169c4e1fd5
e8e34e6c9254366372ad169bd87179c85a5e1464
e8e509c951397320509eabd2b6278ee729068253
e8edf6f237c3e3667ad6f6fc4dd5a546817cc081
e908b790221de67d0f750f13c3306fe9fab0eb70
e91bf6f4bae89b9ea747df080091e32a74eb2014
e920537980a4b3c53c223c3a0ae60f7980f6b9eb
e92c221860f9c3b2274bc7ad0a81ebc390b5fea7
e92fd6721fc50d7e1280a38c7cbc6b6f7ea8e9d8
e93279e4a41f24e2ac7395b7c5b4285a441b9875
e943474df988e6e9

f3062d06860c538a4b7a740b05e99312dd1673cd
f31878b4d92d4de51aa4bd982b51b0848afad547
f330ba08e43ac0118dfc2faa1cec794080cf4661
f3354c40ed24188b1fa6d292205333b27335634c
f33e1826abd15134fe3bbf6140261fb4eb9bf718
f33fed112e8a030bd07a13e9fc02f7b68e50f3e8
f35a169f7363156828a1da8ac8f7429c4b55b22b
f35b1af732077d24b7a7e418905629690368b297
f368ec819a974f8d6e0066a9d631b539d27d265a
f36df0e98d810adba988e7fb9c0aa957cd03102a
f37719fe52ea3949c1a95c935d0f8e3c1925b2c4
f39efc49f98b9ce394ce7f7d52929b67006cd0a5
f3ae1eac7a2469f1f20bc6536e292fa3b57756b3
f3b1280cc6116009332c23cecdd2fbc7298453b2
f3b770d76eb0bd0fcec3a7e9b041616efed1e9a7
f3be0dcaacb01fcfd75ed53b15b902037ca89c59
f3d93b84b90c2779e5bfc03f48579ae7d99476b8
f3e020854dbeb8864490984a9b3b7d24f4387218
f3e8d33fdcce0ee8c21b76013cd3423c45dd0c2b
f40b451eca6de4ff1777a616037f0d3b26a59bb1
f437aecf4c332a5020afa5ee68383196328191e1
f43e876a15f4ccfdf54f34de867e33516854d4f8
f443f153b88aa1a9ac34984119d8a2f99bf18373
f4499b12fdb33a0e8458bcbb8475aa1c24a552fa
f450aebed81c98d5

fd8e20f565c57287de44408e7b5f1bd03dcc408a
fda005a1591e6aa75dda4150fe02fb2394776183
fdaadada9109ba753f1df0953552395784372e21
fdada39d10a091bbcb4f9481f0be166f6226f74a
fdbd98d8627a419630e82da4ad8843df4bcf82a0
fdde983e9b2bb85cf4fd403c54c3f79712b892d1
fde98a52fd71ab73bd353bb851a371a23054824b
fe00a5f20ecf5104aaaec44b58f28480f59c229f
fe11fb0d70b1ad7cb875e782343e190f45d6b3c3
fe147e50e19ad5438cd09d8c25daffc20da65b85
fe291402904376acf661a7c4f67cfd25bee04365
fe84ef92c6c506512e015bacb0d45838cfe5860c
fe9c066d13147a82cb622c0834e7187114207f3b
fea652850b9200ea101d36d7cff3f53b74e879f4
fea8c090c9489d330b39a53dacf9e35444df829f
fead5b8bfcda40bfea54fac01886689a9decf44d
fec8a3b9ab2111e6fc98f8338070b6bac7af6906
feea09a701972e0d8525d3e431ec9cce34fdc325
fef726284d31a0c38d89e37d02daff86e1a5cc47
ff17ca39ea5cbf969067aec81d1f56f6ed418e06
ff1f8cf311f66824ad5ae27d051446e63350b284
ff2920420e8f8e166e368fcf8dff502822999127
ff662c9716c37bfe936c6bc42fbd3eb39244e082
ff6cdccf62923f94bac18add8875f4b799f0adb0
ff853913c32c409a

In [156]:
out_folder = Path("C:/Users/user/Google Drive/Els_meus_documents/projectes/CompetitiveIntelligence/WRI/Notebooks/Data/Processed/")
filename = "Chile.json"
file = out_folder / filename
with open(file, 'w') as fp:
    json.dump(json_file, fp, indent=4)

In [157]:
print(len(json_file))
# for k in sorted(sentences_dict):
#     print(k, ":", sentences_dict[k])

4842


#### Short summary

In [None]:
print("After preprocessing there are {} different headings in El Salvador policies".format(len(bag_of_words)))
print("{} documents have been processed".format(i))
print("There are {} lines of text as sentences".format(len(sentences)))

#### Dictionary items sorted by occurrence

In [None]:
dict( sorted(bag_of_words.items(), key=operator.itemgetter(1),reverse=True))

#### Dictionary items sorted by heading text

In [None]:
for k in sorted(bag_of_words):
    print(k, ":", bag_of_words[k])

#### Saving sentences as csv

In [None]:
print(sentences[0:2])

In [None]:
# path = Path("C:/Users/user/Google Drive/Els_meus_documents/projectes/CompetitiveIntelligence/WRI/Notebooks/Data/")
path = Path("C:/Users/jordi/Google Drive/Els_meus_documents/projectes/CompetitiveIntelligence/WRI/Notebooks/Data/")
filename = "sentences.npy"
file = path / filename
np_sentences = np.array(sentences)
with open(file, 'wb') as f:
    np.save(f, np_sentences)

### Pipeline to process one file from HD folder
This is a pipeline to process a test file in a local folder.

In [75]:
path = "C:/Users/user/Google Drive/Els_meus_documents/projectes/CompetitiveIntelligence/WRI/Documents_de_mostra/Chile/"
files = os.listdir(path)
print(files[0])

002c53058e85d383b057fa4cc25a6eb8e7d401e3


In [154]:
path = "C:/Users/user/Google Drive/Els_meus_documents/projectes/CompetitiveIntelligence/WRI/Documents_de_mostra/Chile/"
data_folder = Path(path)
filename = "00a55afe4f55256567397a68df5d7f97e642480b.pdf.txt"


files = os.listdir(path)

bag_of_words = {}
json_file = {}

i = 0
for filename in files:
#     if i == 0:
    file_ = data_folder / filename
    with open(file_, 'r', encoding = 'utf-8') as file:
        lines = file.readlines()
#         print("\n", filename, "\n")
        json_file[filename] = {}
        line_counter = 0
        heading_flag = True
        heading_content = False
        has_section = False
        json_file[filename]["HEADING"] = {"tags" : [], "sentences" : {}}
        for line in lines:
            line = clean_whitespace(line)
            # Processing document heading
            if heading_flag:
                if "Tipo Norma" in line:
                    heading_content = True
                if heading_content:
                    line = full_cleaning(line)
                    if line != None:
                        if ":" in line:
                            line_counter += 1
                            sentence_id = filename[0:7] + '_' + str(line_counter)
                            json_file[filename]["HEADING"]["sentences"][sentence_id] = {"text" : line, "labels" : []}
                            json_file[filename]["HEADING"]["sentences"][sentence_id]["text"] = line
                        else:
                            json_file[filename]["HEADING"]["sentences"][sentence_id]["text"] = json_file[filename]["HEADING"]["sentences"][sentence_id]["text"] + " " + line
#                 print("**", line)
                heading_flag, heading_content, line_counter = end_of_heading(line, heading_flag, heading_content, line_counter)

            # Breaking when document signatures are found    
            elif end_of_document(line):
#                 print(line)
                break
            # Getting section headings
            elif (uppercase_ratio(line) == 1 and len(line) > 10 and line_counter > 0) or is_section(line):
#                 print("line--", line)
                line = remove_accents(line)
                line = clean_bugs(line)
                line = clean_sentence(line)
                if line == None:
                    continue
                else:
                    has_section = True
                    section = merge_concepts(line)
#                     print("**", section)
                    json_file[filename][section] = {"tags" : [], "sentences" : {}}
                    bag_of_words = add_to_dict(section, bag_of_words, duplicates_dict)
                    if section == "VISTO" and len(split_into_sentences(line, ":")) > 1:
                        visto = split_into_sentences(line, ":")
                        for sentence in split_into_sentences(visto[1], ";"):
                            line_counter += 1
                            sentence_id = filename[0:7] + '_' + str(line_counter)
                            json_file[filename][section]["sentences"][sentence_id] = {"text" : sentence, "labels" : []}

            elif has_section:
                full_cleaning(line)
                if line == None:
                    continue                    
                else:
                    line = change_decimal_points(line)
                    for sentence in split_into_sentences(line, "."):
                        line_counter += 1
                        sentence_id = filename[0:7] + '_' + str(line_counter)
                        json_file[filename][section]["sentences"][sentence_id] = {"text" : sentence, "labels" : []}
        i += 1
    #     data = file.read().replace('\n', '')


 002c53058e85d383b057fa4cc25a6eb8e7d401e3 

** Biblioteca del Congreso Nacional
** --------------------------------
** 
** 
** 
** 
** Tipo Norma :Decreto 3157 EXENTO
** Fecha Publicacion :16-09-2016
** Fecha Promulgacion :18-08-2016
** Organismo :MUNICIPALIDAD DE PANQUEHUE
** Titulo :APRUEBA "ORDENANZA PARA LA EXTRACCION DE ARIDOS EN CAUCES
** Y ALVEOS DE CURSOS NATURALES DE AGUA QUE CONSTITUYEN BIENES
** NACIONALES DE USO PUBLICO Y EN POZOS LASTREROS DE PROPIEDAD
** PARTICULAR EN LA COMUNA DE PANQUEHUE" Y SUS RESPECTIVOS
** ANEXOS
** Tipo Version :Unica De : 16-SEP-2016
** Inicio Vigencia :16-09-2016
** Id Norma :1094879
** URL :https://www.leychile.cl/N?i=1094879&f=2016-09-16

 0031d55c90473158c09acded547d67d44be22325 

** Biblioteca del Congreso Nacional
** --------------------------------
** 
** 
** 
** 
** Tipo Norma :Resolucion 447 EXENTA
** Fecha Publicacion :27-06-2018
** Fecha Promulgacion :21-06-2018
** Organismo :MINISTERIO DE ENERGIA; COMISION NACIONAL DE ENERGIA
** Titul

In [140]:
json_file

{'002c53058e85d383b057fa4cc25a6eb8e7d401e3': {'HEADING': {'tags': [],
   'sentences': {'002c530_1': {'text': 'Tipo Norma :Decreto 3157 EXENTO',
     'labels': []},
    '002c530_2': {'text': 'Fecha Publicacion :16-09-2016', 'labels': []},
    '002c530_3': {'text': 'Fecha Promulgacion :18-08-2016', 'labels': []},
    '002c530_4': {'text': 'Organismo :MUNICIPALIDAD DE PANQUEHUE',
     'labels': []},
    '002c530_5': {'text': 'Titulo :APRUEBA "ORDENANZA PARA LA EXTRACCION DE ARIDOS EN CAUCES Y ALVEOS DE CURSOS NATURALES DE AGUA QUE CONSTITUYEN BIENES NACIONALES DE USO PUBLICO Y EN POZOS LASTREROS DE PROPIEDAD PARTICULAR EN LA COMUNA DE PANQUEHUE" Y SUS RESPECTIVOS ANEXOS',
     'labels': []},
    '002c530_6': {'text': 'Tipo Version :Unica De : 16-SEP-2016',
     'labels': []},
    '002c530_7': {'text': 'Inicio Vigencia :16-09-2016', 'labels': []},
    '002c530_8': {'text': 'Id Norma :1094879', 'labels': []},
    '002c530_9': {'text': 'URL :https://www.leychile.cl/N?i=1094879&f=2016-09-16',

#### Dictionary items sorted by occurrence

In [141]:
dict( sorted(bag_of_words.items(), key=operator.itemgetter(1),reverse=True))

{'VISTO': 14,
 'RESUELVO': 13,
 'CONSIDERANDO': 11,
 'PERMISOS': 2,
 'SANCIONES': 2,
 'DISPOSICIONES GENERALES': 2,
 'ORDENANZA PARA LA EXTRACCION DE ARIDOS EN CAUCES Y ALVEOS DE CURSOS NATURALES DE AGUA QUE CONSTITUYEN BIENES NACIONALES DE USO PUBLICO Y EN POZOS LASTREROS DE PROPIEDAD PARTICULAR, DE LA COMUNA DE PANQUEHUE': 1,
 'OBJETO': 1,
 'DERECHOS, OBLIGACIONES Y PROHIBICIONES': 1,
 'SISTEMA DE TRATAMIENTO INTEGRAL DE LAS AGUAS SERVIDAS': 1,
 'DE LA CIUDAD DE PUERTO MONTT SEGUNDA PARTE': 1,
 'ESTUDIO DE IMPACTO AMBIENTAL': 1,
 'NORMAS GENERALES': 1,
 'DE LOS PROYECTOS QUE INGRESAN AL SEIA': 1,
 'DE LA EXTRACCION': 1,
 'DEL TRANSPORTE': 1,
 'DE LA RECUPERACION': 1,
 '" BIS': 1,
 '"PLAN REGIONAL DE DESARROLLO URBANO - REGION DE ARICA Y PARINACOTA': 1,
 '1 LINEAMIENTOS DE DESARROLLO URBANO REGIONAL': 1,
 'UT-10, 11 y 12.': 1}

#### Dictionary items sorted by heading text

In [None]:
for k in sorted(bag_of_words):
    print(k, ":", bag_of_words[k])