# Elog Tagging

The goal is to try and tag elog entries with the correct tag. In order to do this we need to:
* Scrape the data from the entries off the elog
* Also get the corresponding tag for entries
* Run through NLP algorithm to try and train the entries (NOTE: Can't really do this rn because all of our tagging is kinda horseshit so we'd be training it on poop data

In [2]:
import pandas as pd
import numpy as np
import requests
from io import StringIO


In [3]:
# api-endpoint 
URL = "https://mccelog.slac.stanford.edu/elog/dev/mgibbs/dev_elog_display_json.php"

# defining a params dict, for now doing something simple
PARAMS = {'logbook': 'MCC', 'shifts': 6} 
  
# sending get request and saving the response as response object 
r = requests.get(url = URL, params = PARAMS) 
  
# extracting data in json format 
data = r.json()

# Turning list of json objects into dataframe
df = pd.DataFrame.from_records(data)

In [4]:
print(df.shape)
df.head()

(56, 14)


Unnamed: 0,elogid,title,text,logbook,author,eventTime,shift,parent,tag,attachments,children,supersedes,superseded_by,highPriority
0,1003611,* Re: Step 1: we are going to see if it is pos...,I'm going to drop this and proceed with the se...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1449, 'username': 'loney', 'first...",1581790285,"Day Shift, Sat, 15-Feb-20",1003608.0,LCLS,,,,,
1,1003610,We are able to get both chains of NIT presets ...,It isn't clear why this didn't work last night...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1449, 'username': 'loney', 'first...",1581788774,"Day Shift, Sat, 15-Feb-20",,LCLS,,,,,
2,1003609,We have a BTHW PPS Camera!,I doubt this is new today but it's news to me!,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1449, 'username': 'loney', 'first...",1581788124,"Day Shift, Sat, 15-Feb-20",,LCLS,"[{'attachmentid': 386105, 'url': 'https://mcce...",,,,
3,1003608,Step 1: we are going to see if it is possible ...,,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1449, 'username': 'loney', 'first...",1581787080,"Day Shift, Sat, 15-Feb-20",,LCLS,,[1003611],,,
4,1003607,We have two current Linac East PPS Logs in the...,One has an ODDE key in the Red Keysafe (the ne...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1449, 'username': 'loney', 'first...",1581785690,"Day Shift, Sat, 15-Feb-20",,LCLS,,,,,


### Now we have a method to store the data in the data frame but there still is a lot of useless data here. Let's get rid of useless columns

* logbook (all mcc)
* author, eventTime, shift, parent, children, attachments, supersedes (irrelevant) 

This leaves the following columns left: `title`, `text`, `elog_id`, `tag`, and `superceded_by`
* `superceded_by` is useful because any column where this is not Nan, we can drop this. The reasoning behind this is that there basically are duplicate entries (when an entry is superceded) and we only want to keep one copy (the correct one). So we can drop the original entries, i.e. the entries where `superceded_by` is not Nan, and then delete this column

Finally we'd be left with: `title`, `text`, `elog_id`, `tag` <br>
<b> Questions </b> 
* NLP algorithm should really only be working on the `titl`e of the entries right? Like that's great if there are more key words in the body, but the title should be enough to tag the location (in my head). If this is true than there's no need for `text`
* Do I really need `elog_id` for anything...? If I keep title and text I definitely need to, if not then I see no need

In [6]:
# Dropping all columns where superceded_by is not null to essentially drop duplicates. Then drop superceded_by column
df = df[df['superseded_by'].isnull() == True]
df = df.drop(['superseded_by'],axis = 1)

In [7]:
print(df.shape)
df.head()

(49, 13)


Unnamed: 0,elogid,title,text,logbook,author,eventTime,shift,parent,tag,attachments,children,supersedes,highPriority
0,1003611,* Re: Step 1: we are going to see if it is pos...,I'm going to drop this and proceed with the se...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1449, 'username': 'loney', 'first...",1581790285,"Day Shift, Sat, 15-Feb-20",1003608.0,LCLS,,,,
1,1003610,We are able to get both chains of NIT presets ...,It isn't clear why this didn't work last night...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1449, 'username': 'loney', 'first...",1581788774,"Day Shift, Sat, 15-Feb-20",,LCLS,,,,
2,1003609,We have a BTHW PPS Camera!,I doubt this is new today but it's news to me!,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1449, 'username': 'loney', 'first...",1581788124,"Day Shift, Sat, 15-Feb-20",,LCLS,"[{'attachmentid': 386105, 'url': 'https://mcce...",,,
3,1003608,Step 1: we are going to see if it is possible ...,,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1449, 'username': 'loney', 'first...",1581787080,"Day Shift, Sat, 15-Feb-20",,LCLS,,[1003611],,
4,1003607,We have two current Linac East PPS Logs in the...,One has an ODDE key in the Red Keysafe (the ne...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1449, 'username': 'loney', 'first...",1581785690,"Day Shift, Sat, 15-Feb-20",,LCLS,,,,
