### Load Libraries

In [1]:
import pdfplumber
import itertools
import json
import re
import spacy

from os import path
import csv
import pickle
import pandas as pd
import numpy as np

### Load Sample

In [2]:
pre_data = """
SECTION 02 41 20 – SELECTIVE BUILDING DEMOLITION
PART 1 - GENERAL
1.1
SUMMARY
A.
Section Includes:
1.
Systematic removal of portions of buildings and structures.
2.
Salvage of existing items for reuse.
3.
Salvage of construction materials for recycling.
4.
Supplementary components and accessories normally furnished or necessary for a
complete installation, whether or not such items are indicated on the Drawings or
included in the Specifications.
1.2
REFERENCES
A.
Definitions:
1.
Remove: Means to detach from existing construction and legally dispose off-site.
2.
Demolish: Means the same as “remove”.
3.
Dispose: Means to get rid of by throwing away; or by giving or selling to someone
else.
4.
Reuse: Means to use again for the same function without re-processing.
5.
New-Life Reuse: Means to use again for a different function without re-processing.
6.
Remove and Salvage: Means to detach from existing construction, prepare for reuse
or storage as applicable, and then deliver to the Owner.
7.
Remove and Reinstall: Means to detach from existing construction, prepare for reuse,
and reinstall where indicated.
8.
Recycle: Means to detach from existing construction, break down into raw materials,
and then process the materials to make new items.
9.
Existing-to-Remain: Means existing items that are not removed, reused, or recycled.
1.3
ADMINISTRATIVE REQUIREMENTS
A.
Coordination: Demolition drawings are diagrammatic and show existing conditions with
information developed from field surveys and to generally show the extent and type of
demolition required. The Owner will maintain conditions existing at the time of
inspection for bidding purposes as far as practicable.
1.
Before beginning demolition, make a detailed survey of existing conditions indicated
below in Part 3 of this specification Section, and report discrepancies or conflicts
between Drawings and actual conditions in writing to the Architect for clarifications
and instructions.
2.
Do not proceed, when such conflicts or discrepancies occur, before receipt of the
Architect's instructions.
B.
Pre-Demolition Meeting:
1.
To review methods and procedures related to the work of this specification Section,
hold a meeting at the project site after submittal approval and at least 10 business
days before beginning installation. At a minimum, the Contractor, demolition
subcontractor, and Architect must attend the meeting.
2.
During the meeting, review the Contract Documents, submittals, project conditions,
and demolition sequence and methods, including special details and conditions that
might affect demolition.
a.
Review and discuss existing conditions survey indicated below in Part 3 of this
specification Section.
b.
Inspect and discuss condition of construction to be selectively demolished.
c.
Review structural load limitations of existing structure.
d.
Review and finalize selective demolition schedule and verify availability of
materials, demolition personnel, equipment, and facilities needed to make
progress and avoid delays.
e.
Review requirements of work that rely on substrates exposed by selective
demolition operations.
f.
Review areas where construction is existing-to-remain and requires protection.
3.
Identify and discuss adverse or unfavorable conditions detrimental to protecting or
demolishing construction. Resolve each condition.
4.
Finalize construction schedule.
5.
Record significant discussions and distribute meeting minutes. Do not begin
demolition until disagreements are successfully resolved to the satisfaction of all
parties.
1.4
SUBMITTALS
A.
Informational Submittals:
1.
Schedule of Selective Demolition Activities: Indicate the following.
a.
Detailed sequence of selective demolition and removal work, with starting and
ending dates for each activity. Ensure Owner's on-site operations are
uninterrupted.
b.
Interruption of utility services. Indicate how long utility services will be
interrupted.
c.
Coordination for shutoff, capping, and continuation of utility services.
d.
Use of elevator and stairs.
e.
Locations of proposed dust- and noise-control temporary partitions and means
of egress.
f.
Coordination of Owner's continuing occupancy of portions of existing building
and of Owner's partial occupancy of completed work.
g.
Means of protecting existing-to-remain items in the path of waste removal.
2.
Inventory: After selective demolition is complete, submit a list of items that have
been removed and salvaged.
3.
Pre-Demolition Photographs or Videos: Submit videos or photographs showing
existing conditions of adjoining construction and site improvements, including finish
surfaces that might be misconstrued as damage caused by selective demolition
operations.
1.5
QUALITY ASSURANCE
A.
Quality Standards: Comply with the safety requirements of both American National
Standards Institute/ American Society of Safety Engineers publication ANSI/ASSE A10.6.
“Safety Requirements for Demolition Operations” and National Fire Protection Association
publication NFPA 241, “Standard for Safeguarding Construction, Alteration, and
Demolition Operations”.
1.6
PROJECT CONDITIONS
A.
Hazardous Materials: Hazardous materials may be encountered in the building or at the
project site. If materials suspected of containing hazardous materials are encountered,
then do not disturb; promptly notify the Architect and Owner.
PART 2 - PRODUCTS (NOT USED)
PART 3 - EXECUTION
3.1
EXAMINATION
A.
Oversight: Ensure adequate supervision practices are followed at the project site before
demolition work begins and at all times during installation.
B.
Survey: Engage a professional engineer to survey condition of building to determine
whether removing any element might result in structural deficiency or unplanned
collapse of any portion of structure or adjacent structures during selective demolition
operations.
1.
Survey existing conditions and correlate with requirements indicated to determine
extent of selective demolition required.
2.
Provide means to have digital molds created to repair ornate items in case of loss
(e.g., laser cloud point scan).
3.
Inventory and record the condition of items removed and reinstalled and removed
and salvaged.
4.
When unforeseen mechanical, electrical, or structural elements are encountered that
conflict with intended function or design, investigate and measure the nature and
extent of conflict. Promptly submit written report to Architect.
5.
Perform surveys as the work progresses to detect hazards resulting from selective
demolition activities.
3.2
PREPARATION
A.
Site Protection: Protect existing-to-remain sitework against damage and soiling during
demolition.
1.
Do not begin selective demolition work until temporary partitions, barricades,
warning signs, and other forms of protection are installed.
2.
Protect trees, plants, utilities, and existing improvements that are not to be removed
from injury or damage. Replace damaged landscaping, improvements, and utilities in
kind.
3.
During demolition, provide safeguards for protection of the public, Contractor's
employees, and existing improvements existing-to-remain, including warning signs
and lights, barricades, and the like.
4.
Provide and maintain shoring, bracing, and structural supports required to preserve
stability and prevent movement, settlement, or collapse of existing-to-remain
construction and finishes; and to prevent unexpected or uncontrolled movement or
collapse of construction being demolished.
B.
Building Protection: Protect existing-to-remain building construction against damage
and soiling during selective demolition.
1.
Do not begin selective demolition work until temporary building bracing, barricades,
and other protection necessary to prevent injury to people and damage to adjacent
existing-to-remain facilities.
2.
Do not allow water to enter existing-to-remain wall or roof insulation. Replace
insulation when it is wetted.
C.
Utilities, Services, and Building Systems Protection:
1.
Maintain existing-to-remain utility services and mechanical and electrical systems
and protect them against damage during selective demolition operations.
2.
Locate, identify, disconnect, and seal or cap off indicated utility services and
mechanical and electrical systems serving areas indicated for demolition.
a.
Arrange with utility companies to shut off indicated utilities.
b.
If building systems or mechanical and electrical systems are indicated as
removed, relocated, or abandoned, provide temporary services and systems that
bypass demolition areas and maintain continuity of services and systems to other
parts of building before proceeding with selective demolition.
c.
Cut off pipe or conduit in walls or partitions to be removed. Cap, valve, or plug
and seal remaining portion of pipe or conduit after bypassing.
3.3
DEMOLITION
A.
General Demolition Requirements:
1.
Coordinate demolition to assure the proper sequence, limits, methods, and time of
performance. Schedule demolition to impose minimum of hardship on present
facility operations and performance of the work.
2.
Conduct selective demolition and debris-removal operations to ensure the least
interference with roads, streets, walks, walkways, and other adjacent occupied and
used facilities.
3.
Demolish and remove existing construction only as shown and to the extent required
by new construction. Use methods necessary to complete the work within indicated
or specified limitations.
a.
Maintain existing building structure (including structural floor and roof decking)
and envelope (exterior skin and framing, excluding window assemblies and
nonstructural roofing material) not shown as demolished; do not demolish
existing construction beyond indicated limits.
b.
Maintain existing interior nonstructural elements (interior walls, doors, floor
coverings, and ceiling systems) not shown as demolished.
c.
Do not demolish existing construction beyond indicated limits.
4.
Neatly cut openings and holes plumb, square, and true to dimensions required. Use
cutting methods least likely to damage existing-to-remain construction or adjoining
construction.
5.
Use hand tools or small power tools designed for sawing or grinding, not hammering
and chopping, to minimize disturbance of adjacent surfaces.
6.
Cut or drill from the exposed or finished side into concealed surfaces to avoid
marring existing finished surfaces. Verify condition and contents of hidden space
before starting cutting operations.
7.
Do not use cutting torches until after work areas are cleared of flammable materials.
Maintain portable fire-suppression devices during flame-cutting operations.
8.
Temporarily cover existing-to-remain openings.
9.
Locate selective demolition equipment and remove debris and materials so as not to
impose excessive loads on supporting walls, floors, or framing.
10. Do not remove any item in a manner that that results in any warranty or guarantee
becoming void.
B.
Special Techniques:
1.
Removed and Salvaged Items:
a.
Clean salvaged items.
b.
Pack or crate items after cleaning. Identify contents of containers.
c.
Store items in a secure area or location until delivery to the Owner.
2.
Removed and Reinstalled Items:
a.
Clean and repair items to functional condition adequate for intended reuse. Paint
equipment to match new equipment.
b.
Pack or crate items after cleaning and repairing. Identify contents of containers.
c.
Protect items from damage during transport and storage.
d.
Reinstall items in locations indicated. Comply with installation requirements for
new materials and equipment. Provide connections, supports, and miscellaneous
materials necessary to make items functional for use indicated.
3.
Existing-to-Remain Items:
a.
When permitted by the Architect, items may be removed to a suitable, protected
storage location during selective demolition and cleaned and reinstalled in their
original locations after selective demolition operations are complete.
3.4
CORRECTION AND REPAIR
A.
Damaged existing-to-remain work must be patched and repaired. Correct and repair as
necessary, without limitation, including arranging all correction and repair work and
paying all correction and repair costs without reimbursement from Owner, until
accepted in writing by the Architect.
B.
Corrective and repair work must be performed in conformance with a correction and
repair plan submitted to and accepted in writing by the Architect before correction or
repair work begins. At a minimum, correction and repair plans must include
1.
written descriptions of non-conforming, damaged, and defective work;
2.
supporting sketches, diagrams, photographs, and other visual depictions of non-
conforming, damaged, and defective work; and
3.
similar written descriptions and visual depictions of Contractor-proposed
corrections and repairs.
C.
Do not correct, repair, or replace any item in a manner that that results in any warranty
or guarantee becoming void.
D.
Arrange and pay costs without reimbursement from Owner for removing and replacing
work that cannot be corrected or repaired to the Architect’s acceptance.
3.5
CLEANING
A.
Except for recycled, reused, salvaged, and reinstalled items and other existing-to-remain
items on Owner's property, remove demolished materials from the project site and
legally dispose off-site. Do not burn demolished materials.
B.
Removed items not indicated for reuse, reinstallation, or salvage are the property of the
Contractor and must be cleared from the project site.
1.
Continuously clean up and clear these items; do not allow them to accumulate in the
building or at the project site.
2.
Material and equipment may not be viewed by prospective purchasers nor sold on
the site.
3.
The Owner is not responsible for the condition, loss, or damage to removed items.
C.
Waste Management: After completing the work of this specification section, leave work
END OF SECTION
©AWCWEST 20A01 All rights reserved.
RSTLNI, TAGTIS, TAGDSC, ELMTF, CLSF3030, IDNO1

CONCRETE
"""


### Preprocessing for Section

In [3]:
pre_data = re.sub(' +', ' ', pre_data)

In [4]:
regex_end = r'\bEND\s*OF\s*(SECTION|DOCUMENT)\b'
if(re.search(regex_end, pre_data)):
    pre_data = pre_data[0: re.search(regex_end, pre_data).end():]

In [6]:
start_index = re.search(r'SECTION|DOCUMENT', pre_data).start()
end_index= pre_data.rindex("PART 1")
first_part_index = pre_data.index("PART 1")

section_details = pre_data[start_index:first_part_index]
section_details

'SECTION 02 41 20 – SELECTIVE BUILDING DEMOLITION\n'

In [7]:
section_details_to_skip = [item.strip() for item in section_details.split("\n") if item.strip() != ""]

In [8]:
section_details = re.sub(' +', ' ', section_details.replace("\n", " "))
section_details

'SECTION 02 41 20 – SELECTIVE BUILDING DEMOLITION '

In [9]:
nlp2 = spacy.load("Spacy Custom NER Dump/")

spec_number = ""
spec_name = ""
flag1, flag2 = False, False
section_data = nlp2(section_details)
for sent in section_data.ents:
    if(sent.label_ == 'section_name'):
        spec_name = str(sent)
        flag1 = True

if(not flag1):
    spec_name = "Not Found"

print("Section Name - {}".format(spec_name))

Section Name - Not Found


### Data Preprocessing For Mapping

In [10]:
# Select Lines From Last Occurance of Part 1
data = pre_data[end_index:]

In [12]:
# Remove Unwanted Lines, Skip Section Details and Mapp into String
final_data = ""
head_flag = True
for index, line in enumerate(data.splitlines()):
    if("END OF SECTION" in line or "END OF DOCUMENT" in line.strip()):
        continue
    elif(len(line.strip()) == 0):
        continue
    elif([ele for ele in section_details_to_skip if(line.strip().startswith(ele))]):
        continue
    else:
        final_data = final_data + line.strip() + "\n"

In [13]:
# Arrange Lines to Proper Pointers
final_lines = []
index = -1
for i, line in enumerate(final_data.splitlines()):
    line = line.strip()
    if(line.strip().startswith("PART")): 
        final_lines.append(line)
        index = index + 1
    elif(re.search(r"^[0-9]\.[0-9]", line)):
        final_lines.append(line)
        index = index + 1
    elif(re.search(r"^[A-Za-z]\.", line)):
        final_lines.append(line)
        index = index + 1
    elif(re.search(r"^[0-9]+\.", line)):
        final_lines.append(line)
        index = index + 1
    elif(re.search(r"^[0-9]+\)", line)):
        final_lines.append(line)
        index = index + 1
    elif(re.search(r"^[a-z]+\)", line)):
        final_lines.append(line)
        index = index + 1
    elif(final_lines[index].strip().startswith("PART")):
            continue
    elif(line.strip().isupper()):
        final_lines[index] = final_lines[index] + " " + line
    elif(re.search(r"^[0-9]+\.[0-9]", final_lines[index].strip())):
        final_lines.append(line)
        index = index + 1
    else:
        final_lines[index] = final_lines[index] + " " + line
            

In [14]:
# All Heading
heading = []
flag = True
for line in final_lines:
    if(re.search(r"^[0-9]+\.[0-9]", line.strip()) or line.strip().startswith("PART")):
        heading.append(line)
        flag = False
    elif(re.search(r"^[A-Z]\.", line.strip()) and flag):
        heading.append(line)

print(heading)

['PART 1 - GENERAL', '1.1 SUMMARY', '1.2 REFERENCES', '1.3 ADMINISTRATIVE REQUIREMENTS', '1.4 SUBMITTALS', '1.5 QUALITY ASSURANCE', '1.6 PROJECT CONDITIONS', 'PART 2 - PRODUCTS (NOT USED)', 'PART 3 - EXECUTION', '3.1 EXAMINATION', '3.2 PREPARATION', '3.3 DEMOLITION', '3.4 CORRECTION AND REPAIR', '3.5 CLEANING']


In [15]:
# Heading Pairs
res = list(map(list, zip(heading, heading[1:])))
index_data = []
heading_list = []
for i, data in enumerate(res):
    if(i == 0):
        heading_list.append("PART 1 - GENERAL")
    if("SUBMITTAL" in data[0]): 
        heading_list.append(data)
    if("PART" in data[1]):
        heading_list.append(data[1])

heading_list

['PART 1 - GENERAL',
 ['1.4 SUBMITTALS', '1.5 QUALITY ASSURANCE'],
 'PART 2 - PRODUCTS (NOT USED)',
 'PART 3 - EXECUTION']

In [16]:
# Submittal Lines Index
data_lines = []
if(len(heading_list) == 0):
    final_lines = []
else:
    for item in heading_list:
        if("SUMMITAL" in item or "SUBMITTAL" in item[0] or "SUBMITTALS" in item[0]):
            x, y = final_lines.index(item[0]), final_lines.index(item[1])
            data_lines.append((x, y))
        elif("PART " in item):
            data_lines.append(item)

print(data_lines)

['PART 1 - GENERAL', (34, 46), 'PART 2 - PRODUCTS (NOT USED)', 'PART 3 - EXECUTION']


In [17]:
# Submittal Records for Mapping
dataset = []
for pos in data_lines:
    if("PART" in pos):
        dataset.append(pos)
    else:
        for ll in range(pos[0], pos[1]):
            dataset.append(final_lines[ll])

print(dataset)

['PART 1 - GENERAL', '1.4 SUBMITTALS', 'A. Informational Submittals:', '1. Schedule of Selective Demolition Activities: Indicate the following.', "a. Detailed sequence of selective demolition and removal work, with starting and ending dates for each activity. Ensure Owner's on-site operations are uninterrupted.", 'b. Interruption of utility services. Indicate how long utility services will be interrupted.', 'c. Coordination for shutoff, capping, and continuation of utility services.', 'd. Use of elevator and stairs.', 'e. Locations of proposed dust- and noise-control temporary partitions and means of egress.', "f. Coordination of Owner's continuing occupancy of portions of existing building and of Owner's partial occupancy of completed work.", 'g. Means of protecting existing-to-remain items in the path of waste removal.', '2. Inventory: After selective demolition is complete, submit a list of items that have been removed and salvaged.', '3. Pre-Demolition Photographs or Videos: Submit

## Test

In [16]:
mapp_dataset = pd.DataFrame(columns = ["SECTION", "SECTION NAME", "PART", "SUBSECTION", "SUBSECTION NAME", "DESCRIPTION"])

# Map Records Into CSV
for index, line in enumerate(dataset):
    if(line.strip().startswith('PART')):
        part_name = line.strip()
    elif(re.search(r"^[0-9]+\.[0-9]+", line.strip())):
        subsection = line.split()[0]
        if(subsection.strip().endswith(".")):
            subsection = subsection[:3]
        subsection_name = " ".join(line.split()[1:])
        subsection_flag = True
    elif(re.search(r"^[A-Z]\.", line.strip())):
        heading_flag = True
        subsection1 = line.strip()[0]
        mapp_dataset.loc[len(mapp_dataset)] = [spec_number, spec_name, part_name, subsection +"-"+ subsection1, subsection_name, line.strip()]
    elif(heading_flag):
        if(re.search(r"^[0-9]+\.", line.strip())):
            mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] = mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] + " \n" + line.strip()
        else:
            mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] = mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] + " " + line.strip()
    else:
        if(subsection_flag):
            mapp_dataset.loc[len(mapp_dataset)] = [spec_number, spec_name, part_name, subsection, subsection_name, "\n" + line.strip()]
            subsection_flag = False
        elif(r"^[0-9]+\.[0-9]+", line.strip()):
            mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] = mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] + "\n" + line.strip()
        else:
            mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] = mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] + " " + line.strip()

mapp_dataset

Unnamed: 0,SECTION,SECTION NAME,PART,SUBSECTION,SUBSECTION NAME,DESCRIPTION
0,,TAGGING AND IDENTIFICATION,PART 1 - GENERAL,1.4-A,INFORMATION SUBMITTALS,A. Action Submittals: \n1. Product Data: Manuf...


In [15]:
# Mapp to Data Frame
mapp_dataset = pd.DataFrame(columns = ["SECTION", "SECTION NAME", "PART", "SUBSECTION", "SUBSECTION NAME", "DESCRIPTION"], dtype = str)
subsection_flag = False
subsection = "Not Found"
subsection1 = "Not Found"
subsection_name = "Not Found"
part_name = "Not Found"
heading_flag = False
for index, line in enumerate(dataset):
    if(line.strip().startswith('PART')):
        part_name = line.strip()
    elif(re.search(r"^[0-9]+\.[0-9]+", line.strip())):
        subsection = line.split()[0]
        if(subsection.strip().endswith(".")):
            subsection = subsection[:3]
        subsection_name = " ".join(line.split()[1:])
        subsection_flag = True
    elif(re.search(r"^[A-Z]\.", line.strip())):
        heading_flag = True
        subsection1 = line.strip()[0]
        mapp_dataset.loc[len(mapp_dataset)] = [spec_number, spec_name, part_name, subsection +"-"+ subsection1, subsection_name, line.strip()]
    elif(heading_flag):
        if(re.search(r"^[0-9]+\.", line.strip())):
            mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] = mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] + " \n" + line.strip()
        else:
            mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] = mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] + " " + line.strip()
    else:
        if(subsection_flag):
            mapp_dataset.loc[len(mapp_dataset)] = [spec_number, spec_name, part_name, subsection, subsection_name, "\n" + line.strip()]
            subsection_flag = False
        elif(r"^[0-9]+\.[0-9]+", line.strip()):
            mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] = mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] + "\n" + line.strip()
        else:
            mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] = mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"] + " " + line.strip()


In [16]:
mapp_dataset

Unnamed: 0,SECTION,SECTION NAME,PART,SUBSECTION,SUBSECTION NAME,DESCRIPTION
0,,TAGGING AND IDENTIFICATION,PART 1 - GENERAL,1.4-A,INFORMATION SUBMITTALS,A. Action Submittals: \n1. Product Data: Manuf...


In [50]:
mapp_dataset.DESCRIPTION[1].splitlines()

KeyError: 1

In [102]:
# Access Last Element of Description
# mapp_dataset.loc[len(mapp_dataset) - 1, "DESCRIPTION"]

"Submit the following items for Owner approval: 1. Product Data: Manufacturer's catalog cut sheets and other published technical data for each of the following: a. Nameplates, instructions plates, signs and labels. b. Fasteners. 2. Samples: Provide samples of each color, lettering style, and other graphic representation required for identification materials. Provide samples of labels and signs. No material is to be ordered without this approval. 3. Provide a listing of proposed names, abbreviations and other designations used in identification. Provide an electronic copy of the schedule of proposed tags, nameplates and engraving for Owner approval. No material is to be ordered without this approval. 4. Provide a final and complete, electronic listing of all applied tags, nameplates and engravings. 5. Provide a Hand Valve schedule as an electronic version in Microsoft Excel. Mark valves which are intended for emergency shut-off and similar special uses, by special flags, in margin of sc

In [51]:
# Handle Preceeding Zero of Section
mapp_dataset.SECTION = mapp_dataset.SECTION.apply('="{}"'.format)

In [52]:
# Generate CSV and Append
big_spec_name = "XOXOXO"
big_spec_name = big_spec_name + ".csv"
file_status = path.exists(big_spec_name)

if(file_status):
    dataset = pd.read_csv(big_spec_name, dtype = str)
    dataset = dataset.append(mapp_dataset, index)
    dataset.to_csv(big_spec_name, index = False)
else:
    mapp_dataset.to_csv(big_spec_name, index = False)

In [98]:
## Load Saved Model, Vectorizer and Encoder
#
with open("ML Model/vectorizer.pickle", 'rb+') as file:
    vectorizer_saved = pickle.load(file)

with open("ML Model/label_encoder.pickle", 'rb+') as file:
    encorder_saved = pickle.load(file)

with open("ML Model/type_classifier.pickle", 'rb+') as file:
    classifier_saved = pickle.load(file)

In [100]:
da = vectorizer_saved.transform(["NA"])
classifier_saved.predict(da)

array([4])

In [18]:
## Load Prepated Data Data
#
new_dataset = pd.read_csv("YYY.csv")
new_dataset.head()

Unnamed: 0,SECTION,SECTION_NAME,PART,SUB SECTION,SUB SECTION HEADING,DECRIPTION
0,27 05 29,HANGERS AND SUPPORTS FOR COMMUNICATIONS SYSTEMS,PART 1 - GENERAL,1.03 A,SUBMITTALS,A. Refer to Section 27 05 00 for requirements ...
1,27 05 29,HANGERS AND SUPPORTS FOR COMMUNICATIONS SYSTEMS,PART 2 - PRODUCTS,2.02 A,STRUCTURAL SUPPORT SYSTEMS SUBMITTALS,A. Slotted strut supports \n1. Acceptable manu...


In [19]:
description_vector = vectorizer_saved.transform(new_dataset['DECRIPTION'])
predictions = classifier_saved.predict(description_vector)
new_dataset['TYPE'] = encorder_saved.inverse_transform(predictions)
new_dataset = new_dataset[['SECTION', 'SECTION_NAME', 'PART', 'SUB SECTION', 'SUB SECTION HEADING', 'TYPE','DECRIPTION']]
new_dataset.to_csv("YYY_Updated.csv", index = False)

In [2]:
# !jupyter nbconvert --to script "submittal_extraction_v9.ipynb"

[NbConvertApp] Converting notebook submittal_extraction_v8.ipynb to script
[NbConvertApp] Writing 10120 bytes to submittal_extraction_v8.py


In [43]:
# !jupyter nbconvert --to PDFviaHTML "submittal_extraction_v9.ipynb"


[NbConvertApp] Converting notebook submittal_extraction_v9.ipynb to PDFviaHTML
[NbConvertApp] Writing 189718 bytes to submittal_extraction_v9.pdf


In [104]:
if(False):
    pass
# Else
else:
    pass