# Elog Tagging

The goal is to try and tag elog entries with the correct tag. In order to do this we need to:
* Scrape the data from the entries off the elog
* Also get the corresponding tag for entries
* Run through NLP algorithm to try and train the entries (NOTE: Can't really do this rn because all of our tagging is kinda horseshit so we'd be training it on poop data

In [1]:
import pandas as pd
import numpy as np
import requests
import time
from datetime import datetime
from sqlalchemy import create_engine

In [2]:
def get_data(s,e):
    '''
    --- Imports data from Elog and stores it in a workable format ---
    INPUT
        s: start time as unix timestamp
        e: end time as unix time stamp
    RETURN
        df: dataframe of uncleaned data between selected time range
    '''
    
    # api-endpoint 
    URL = "https://mccelog.slac.stanford.edu/elog/dev/mgibbs/dev_elog_display_json.php"

    PARAMS = {'logbook': 'MCC', 'start': s, 'end': e} 

    # sending get request and saving the response as response object 
    r = requests.get(url = URL, params = PARAMS) 

    # extracting data in json format 
    data = r.json()

    # Turning list of json objects into dataframe
    df = pd.DataFrame.from_records(data)

    return df

In [99]:
# Just checking that things work as expected
s = datetime(2008, 1, 11, 0, 0).timestamp()
e = datetime(2009, 1, 11, 0, 0).timestamp()
df = get_data(s,e)
print(df.shape)
df.head()

(24284, 14)


Unnamed: 0,elogid,title,text,logbook,author,eventTime,shift,children,parent,attachments,superseded_by,supersedes,highPriority,tag
0,270417,"MCC Shift Change: Owl Shift, Sunday, 11-Jan-2009",.250 nC 13.6 GeV 10 Hz e- to main dump. Undula...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1053, 'username': 'spw', 'firstna...",1231660800,"Owl Shift, Sun, 11-Jan-09",,,,,,,
1,270419,SWING SHIFT SUMMARY,<table CellPadding=5 BORDER=1>\n\t\t <TR><TD><...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1161, 'username': 'jwarren', 'fir...",1231660799,"Swing Shift, Sat, 10-Jan-09",,,,,,,
2,270415,* RE: Frisch 6x6 misbehaving,Disabled BSY/LTU energy part of Frisch feedbac...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1161, 'username': 'jwarren', 'fir...",1231660530,"Swing Shift, Sat, 10-Jan-09",[270428],270413.0,,,,,
3,270412,Instructions for resetting BSOBTH02,Go to the large blue box on the <u>North</u> h...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1160, 'username': 'jab', 'firstna...",1231660060,"Swing Shift, Sat, 10-Jan-09",,,,,,,
4,270413,Frisch 6x6 misbehaving,LTU energy BPM DL1 oscillating about 2mm. Pag...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1161, 'username': 'jwarren', 'fir...",1231659900,"Swing Shift, Sat, 10-Jan-09",[270415],,,,,,


### Now we have a method to store the data in the data frame but there still is a lot of useless data here. Let's get rid of useless columns

* logbook (all mcc)
* author, eventTime, shift, parent, children, attachments, supersedes (irrelevant) 

This leaves the following columns left: `title`, `text`, `elog_id`, `tag`, and `superceded_by`
* `superceded_by` is useful because any column where this is not Nan, we can drop this. The reasoning behind this is that there basically are duplicate entries (when an entry is superceded) and we only want to keep one copy (the correct one). So we can drop the original entries, i.e. the entries where `superceded_by` is not Nan, and then delete this column

Finally we'd be left with: `title`, `text`, `elog_id`, `tag` <br>
<b> Questions </b> 
* NLP algorithm should really only be working on the `titl`e of the entries right? Like that's great if there are more key words in the body, but the title should be enough to tag the location (in my head). If this is true than there's no need for `text`
* Do I really need `elog_id` for anything...? If I keep title and text I definitely need to, if not then I see no need

In [307]:
def clean_data(df):
    '''
    --- Cleans data frame ---
    INPUT
        df: dataframe (not cleaned)
    RETURN
        df: dataframe (cleaned)
    '''
    # Dropping rows without any tags (these rows are useless for us)
    df = df[df.tag.notnull() == True]
    
    # Dropping useless columns
    important_cols = {'title', 'text', 'elogid', 'tag', 'superseded_by'}
    list1 = df.columns.tolist()
    list1 = [ele for ele in list1 if ele not in important_cols]
    for column in df.columns.tolist():
        if column in list1:
            df = df.drop(column,axis = 1)

    # Dropping all columns where superceded_by is not null to essentially drop duplicates. Then drop superceded_by column
    df = df[df['superseded_by'].isnull() == True]
    df = df.drop(['superseded_by'],axis = 1)
    
    # Reset the index
    df = df.reset_index(drop=True)
    
    return df

In [101]:
# Just checking that things work as expected
df = clean_data(df)
print(df.shape)
df.head()

(49, 4)


Unnamed: 0,elogid,title,text,tag
0,265530,Restart LCLS Magnet ChannelWatcher,I've restarted the lcls magnet channel watcher...,LCLS
1,259842,BYKIK pulse width change,Tony Beukers and I chagned the BYKIK pulse wid...,LCLS
2,252459,* Re: SW: Reboot BC1 Bunch Length IOCs-,Greg Dallt from the Klystron Group is working ...,LCLS
3,252453,SW: Reboot BC1 Bunch Length IOCs-,Rebooted Bunch Length Monitor EPICS IOC in li2...,LCLS
4,252399,Fallout from 120Hz Testing: BCS: Gun SBI (20-5...,"Hello,\n\nAfter the 120Hz testing, after the c...",LCLS


In [98]:
# Checking to see the number of tags present
df.tag.value_counts()

LCLS    49
Name: tag, dtype: int64

<b> Now lets save the data in a way that we can easily access </b>

In [80]:
# Function to save the data into sql database
def save_data(df, database_filename):
    engine = create_engine('sqlite:///'+database_filename+'.db')
    df.to_sql(database_filename, engine, index=False)
    
    '''
    if only_tags == True:
        df.to_sql(database_filename, conn, if_exists='replace', index = False)
    
    
    if only_tags == False:
        df_big = np.array_split(df, n)
        chunk_list = list(range(0,n))
        for i in chunk_list:
            df_big[i].to_sql(database_filename+str(i), conn, if_exists='replace', index = False)
    '''
    

### Important changes that still need to be made:
* What time frame is a good time frame to capture all needed data???
> Looks like you want to capture data up till 2011. Perhaps the most efficient way to do this would be either by month or year and then process this data individually and recreate a giant dataframe. You would likely have to add more methods to your main() class and add a function that incorporates this

In [267]:
def main():
    '''
    Will go through all the necessary steps to extract the data from the elog, clean it, and save the data
    in an SQL database
    '''
    s = datetime(2009, 1, 11, 0, 0).timestamp()
    e = datetime(2010, 1, 11, 0, 0).timestamp()
    df = get_data(s,e)
    df = clean_data(df)
    save_data(df,'elog_data')

In [126]:
# Running this will save the data that we want to collect
main()

### Due to what was found in the cells below, we realize that there are duplicates
We'll need to rewrite out `clean_data` function to incorporate a few things:
* This function should drop duplicates

Also will need to write another function that does the following (call it `join_data_2011`):
* Uses `get_data` and collects data in one month intervals
* Cleans these individual months using the new `clean_data` function
* Joins the months together
* Drops duplicates if there is any overlap

In [309]:
# Creating dummy dataframe
s = datetime(2011, 4, 1, 0, 0).timestamp()
e = datetime(2011, 8, 1, 0, 0).timestamp()
df1 = get_data(s,e)
print('Number of entries in this dataframe: ' + str(df1.shape[0]))

# Printing out duplicates, just so that we have the visual proof
print('The duplicates for this one month period of time are shown below. Must be something wrong with the query')
bad_ids = df1[df1.elogid.duplicated() == True].elogid.tolist()
print('Number of entries in this duplicates dataframe: ' + str(df1[df1['elogid'].isin(bad_ids) == True].shape[0]))
df1[df1['elogid'].isin(bad_ids) == True]

Number of entries in this dataframe: 8663
The duplicates for this one month period of time are shown below. Must be something wrong with the query
Number of entries in this duplicates dataframe: 128


Unnamed: 0,elogid,title,text,logbook,author,eventTime,shift,tag,parent,children,superseded_by,supersedes,attachments,highPriority
105,515398,FACET Summary:,* Recover beam to FACET dump & scav ext. l...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1301, 'username': 'mgibbs', 'firs...",1312009104,"Swing Shift, Fri, 29-Jul-11",FACET,,,,515397.0,,
106,515398,FACET Summary:,* Recover beam to FACET dump & scav ext. l...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1062, 'username': 'cfh', 'firstna...",1312009104,"Swing Shift, Fri, 29-Jul-11",FACET,,,,515397.0,,
587,514588,MD summary,1) Cathode cleaning\n2) CQ/SQ01 scans on IN20 ...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1242, 'username': 'cmelton', 'fir...",1311717711,"Day Shift, Tue, 26-Jul-11",LCLS,,,514591.0,514587.0,"[{'attachmentid': 250859, 'url': 'https://mcce...",
588,514588,MD summary,1) Cathode cleaning\n2) CQ/SQ01 scans on IN20 ...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1374, 'username': 'aegger', 'firs...",1311717711,"Day Shift, Tue, 26-Jul-11",LCLS,,,514591.0,514587.0,"[{'attachmentid': 250859, 'url': 'https://mcce...",
1193,511531,FACET summary,<b>Program</b>\n1) OTR size v. sext\n2) etax/e...,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1242, 'username': 'cmelton', 'fir...",1311087473,"Owl Shift, Tue, 19-Jul-11",FACET,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8142,484955,* Re: Vacuum degraded in LI25,,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1051, 'username': 'hvs', 'firstna...",1305161247,"Swing Shift, Wed, 11-May-11",,484935.0,,484956.0,,"[{'attachmentid': 234939, 'url': 'https://mcce...",
8143,484956,* Re: Vacuum degraded in LI25,,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1003, 'username': 'stanek', 'firs...",1305161247,"Swing Shift, Wed, 11-May-11",,484935.0,,,484955.0,"[{'attachmentid': 234940, 'url': 'https://mcce...",
8144,484956,* Re: Vacuum degraded in LI25,,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1051, 'username': 'hvs', 'firstna...",1305161247,"Swing Shift, Wed, 11-May-11",,484935.0,,,484955.0,"[{'attachmentid': 234940, 'url': 'https://mcce...",
8265,483960,* Re: switched to 60Hz BGRP for K Kim.,Switched back to 120 Hz for the moment.,"{'logbookid': 122, 'name': 'MCC'}","{'authorid': 1265, 'username': 'alsberg', 'fir...",1304537663,"Day Shift, Wed, 04-May-11",,483959.0,,,,,


In [310]:
print('Number of total tags:\n' + str(df1['tag'].value_counts())+ '\n')
print('Number of tags on duplicates:\n' + str(df1[df1['elogid'].isin(bad_ids) == True].tag.value_counts()))

Number of total tags:
LCLS     698
FACET    331
Name: tag, dtype: int64

Number of tags on duplicates:
FACET    12
LCLS      2
Name: tag, dtype: int64


In [311]:
# Apparently there are some entries with more than 2 duplicates
df1[df1['elogid'].isin(bad_ids) == True].elogid.value_counts()[:5]

488157    6
496626    6
488162    6
488164    4
488518    2
Name: elogid, dtype: int64

In [312]:
# Dropping the duplicates
df11 = df1.drop_duplicates(subset ="elogid", keep = 'first')
print('After dropping the duplicates, the total length is now: ' +str(df11.shape[0]))
df11[df11['elogid'].isin(bad_ids) == True].shape[0]

After dropping the duplicates, the total length is now: 8592


57

In [313]:
# Will use later to make sure functions are working correctly
df11 = clean_data(df11)
print(df11.shape[0])
df11['tag'].value_counts()

802


LCLS     554
FACET    248
Name: tag, dtype: int64

### Writing second versions of functions down below to make things clearer

In [5]:
def clean_data(df, only_tags = True):
    '''
    --- Cleans data frame ---
    INPUT
        df: dataframe (not cleaned)
    RETURN
        df: dataframe (cleaned)
    '''
    # Checks to make sure there are even entries with a tag in the specified month
    if only_tags == True:
        if 'tag' not in df.columns:
            return 0

        # Dropping rows without any tags (these rows are useless for us)
        df = df[df.tag.notnull() == True]
    
    # Dropping useless columns
    important_cols = {'title', 'text', 'elogid', 'tag', 'superseded_by'}
    list1 = df.columns.tolist()
    list1 = [ele for ele in list1 if ele not in important_cols]
    for column in df.columns.tolist():
        if column in list1:
            df = df.drop(column,axis = 1)

    # Dropping all columns where superceded_by is not null to essentially drop duplicates. Then drop superceded_by column
    df = df[df['superseded_by'].isnull() == True]
    df = df.drop(['superseded_by'],axis = 1)
    df = df.drop_duplicates(subset ="elogid", keep = 'first')
    
    # Reset the index
    df = df.reset_index(drop=True)
    
    return df

In [9]:
# Checking to see if new clean function works
s = datetime(2012, 1, 1, 0, 0).timestamp()
e = datetime(2012, 2, 1, 0, 0).timestamp()
df2 = get_data(s,e)
df2 = clean_data(df2, only_tags = False)

In [31]:
print(df2.shape[0])
df2.tag.value_counts()

1768


LCLS     766
FACET     31
Name: tag, dtype: int64

In [34]:
df2.fillna(value={'tag': '0'})
df2.head()

Unnamed: 0,elogid,tag,text,title
0,567842,,Continue HXRSS studies\nTune-up 40 pC,"MCC Shift Change: Owl Shift, Wednesday, 01-Feb..."
1,567843,,"<table CellPadding=""5"" BORDER=1>\n<tr>\n<th>Co...",SWING SHIFT SUMMARY
2,567834,LCLS,But tuning was at a -40 MeV vernier. Franz-Jos...,SASE tuning - 600uJ!
3,567822,LCLS,And seeing results in the seeding (baseline co...,Starting Decker vernier scans
4,567816,LCLS,No problems indicated during the switch or at ...,Using Matlab UND launch fbk now. Switched with...


### Now create a `join_data_2011` function that will use `get_data` and `clean_data` to aquire the data up until the end of 2011

In [424]:
def join_data_2011():
    '''
    --- Builds one giant dataframe by concating data frames together one month at a time ---
    RETURN
        df: Cleaned dataframe of tagged entries from April 2007 - December 2011.    
    '''
    year_list = [2007,2008,2009,2010,2011]
    month_list = list(range(1,13))
    df = pd.DataFrame(columns=['elogid', 'title', 'text', 'tag'])
    for year in year_list:
        for month in month_list:
            if (year == 2007 and month < 4):
                continue
            elif (month == 12):
                s = datetime(year, month, 1, 0, 0).timestamp()
                e = datetime(year+1, 1, 1, 0, 0).timestamp()
                df_temp = get_data(s,e)
                df_temp = clean_data(df_temp)
            else:
                s = datetime(year, month, 1, 0, 0).timestamp()
                e = datetime(year, month+1, 1, 0, 0).timestamp()
                df_temp = get_data(s,e)
                df_temp = clean_data(df_temp)
            
            # Checks to make sure cleaned dataframe actually has any tags
            if isinstance(df_temp, pd.DataFrame) == True:
                print(str(month)+'/'+str(year) + ':  ' + str(df_temp.shape[0]))
                df = pd.concat([df,df_temp], ignore_index = True)
    return df

In [425]:
# Testing to make sure the above function works
df_prac = join_data_2011()

4/2007:  2
5/2007:  3
8/2007:  2
12/2007:  3
1/2008:  11
2/2008:  11
3/2008:  15
4/2008:  5
6/2008:  3
7/2008:  1
8/2008:  5
11/2008:  1
12/2008:  1
1/2009:  3
2/2009:  10
4/2009:  2
5/2009:  19
6/2009:  51
7/2009:  30
8/2009:  39
9/2009:  25
10/2009:  18
11/2009:  32
12/2009:  16
4/2010:  28
5/2010:  22
6/2010:  20
7/2010:  15
8/2010:  17
9/2010:  12
10/2010:  22
11/2010:  9
12/2010:  12
1/2011:  21
2/2011:  13
3/2011:  5
4/2011:  4
5/2011:  2
6/2011:  14
7/2011:  782
8/2011:  1616
9/2011:  1246
10/2011:  1240
11/2011:  1043
12/2011:  462


In [426]:
df_prac.shape[0]
df_prac.tag.value_counts()

LCLS     5631
FACET    1282
Name: tag, dtype: int64

In [436]:
search = '<table cellpadding'
df_prac[df_prac['text'].str.startswith(search) == True]

Unnamed: 0,elogid,title,text,tag
26,203623,LCLS Swing Shift Summary,"<table cellpadding=1 border=2 align=""center"">\...",LCLS
32,219142,LCLS Owl Shift Summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS
35,218481,LCLS Owl Summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS
39,215969,LCLS Swing Summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS
42,214092,LCLS Swing Shift Summary,<table cellpadding=2 border=2 align=center>\n<...,LCLS
44,213976,LCLS Swing Shift Summary,<table cellpadding=2 border=2 align=center>\n\...,LCLS
49,223087,LCLS day shift summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS


In [441]:
title = 'Summary'
df_prac[df_prac['title'].str.endswith(title) == True]

Unnamed: 0,elogid,title,text,tag
26,203623,LCLS Swing Shift Summary,"<table cellpadding=1 border=2 align=""center"">\...",LCLS
32,219142,LCLS Owl Shift Summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS
35,218481,LCLS Owl Summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS
39,215969,LCLS Swing Summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS
42,214092,LCLS Swing Shift Summary,<table cellpadding=2 border=2 align=center>\n<...,LCLS
44,213976,LCLS Swing Shift Summary,<table cellpadding=2 border=2 align=center>\n\...,LCLS
990,511689,Facet Summary,Progress - Delivered 6.2 Hours\n\n * Turned...,FACET
1112,509268,FACET Summary,<b>Program</b>: Match LI02 following new steer...,FACET
1142,509122,FACET Summary,Delivered: 3 hrs e- to FACET; Down: 5 hrs (DR1...,FACET
1166,509015,Facet Summary,"Program:\nDelivered: 3.7hrs, Down: 4.3hr (TIU)...",FACET


In [443]:
df_prac.at[3741,'text']

'<b>Program</b>:\n\n1. BBA\n2. R12/34 measurements with x/ycors in LI17 and LI20, and B1/2/3\n3. IP WS beam size vs. QFF*\n\n<b>Progress</b>:\n\n1. Finished BBA on day shift\n2. Finished R12/34 scans with correctors and bends\n3. Started IP WS scans from QFF4-QFF6 - accepted best values for QFF5-6\n\n<b>Problems</b>:\n\n1. xcor 3086 got stuck in middle of shift. Trimmed ok. Some corrs in chicane are sluggish in CRR plots.'

### Now that we have all the data we want from 2011 in db file (`elog_data_2011`). Open it and try and clean up the title and text categories
Things to clean:
* Make everything lower case
* Tables in text column (tricky)

In [470]:
engine = create_engine('sqlite:///elog_data_2011.db')
df = pd.read_sql_table('elog_data_2011', engine)

In [471]:
print(df.shape[0])
df.tail()

6913


Unnamed: 0,elogid,title,text,tag
6908,554263,"Model manager hangs on ""gold"" stage",,LCLS
6909,554261,BBA Round 2 complete,,LCLS
6910,554258,Swapped 30-6 for 30-4,Will LEM the change in; seems to be alright so...,LCLS
6911,554257,1st Round BBA done,,LCLS
6912,554250,Emittance Measurements - 9491 eV (14.673 GeV...,<table class=emittanceTable>\n<tr><th></th><th...,LCLS


In [523]:
df['title_and_text'] = df['title'].str.cat(df['text'], sep =" ")
df.head()

Unnamed: 0,elogid,title,text,tag,title_and_text
0,137674,EPICS LCLS IOC heartbeat fault.,,LCLS,EPICS LCLS IOC heartbeat fault.
1,137670,RR monitor to 2 Hz,A. Prinz approves this change. Approval grante...,LCLS,RR monitor to 2 Hz A. Prinz approves this chan...
2,148112,PARANOIA Restart,PARANOIA was restarted with changes to handle ...,LCLS,PARANOIA Restart PARANOIA was restarted with c...
3,145825,Errorlog re: LCLS SOLN 121,15-MAY-2007 20:24:11 %CAU-E-EPICS_MSG_PEP CM...,LCLS,Errorlog re: LCLS SOLN 121 15-MAY-2007 20:24:1...
4,144311,Rack Location of LCLS BX01/BX02 Breaker,As one can see from Electrical Safety label th...,LCLS,Rack Location of LCLS BX01/BX02 Breaker As one...


In [524]:
df.tail()

Unnamed: 0,elogid,title,text,tag,title_and_text
6908,554263,"Model manager hangs on ""gold"" stage",,LCLS,"Model manager hangs on ""gold"" stage"
6909,554261,BBA Round 2 complete,,LCLS,BBA Round 2 complete
6910,554258,Swapped 30-6 for 30-4,Will LEM the change in; seems to be alright so...,LCLS,Swapped 30-6 for 30-4 Will LEM the change in; ...
6911,554257,1st Round BBA done,,LCLS,1st Round BBA done
6912,554250,Emittance Measurements - 9491 eV (14.673 GeV...,<table class=emittanceTable>\n<tr><th></th><th...,LCLS,Emittance Measurements - 9491 eV (14.673 GeV...


In [452]:
df['title'] = df['title'].str.lower()
df['text'] = df['text'].str.lower()
df.head()

Unnamed: 0,elogid,title,text,tag
0,137674,epics lcls ioc heartbeat fault.,,LCLS
1,137670,rr monitor to 2 hz,a. prinz approves this change. approval grante...,LCLS
2,148112,paranoia restart,paranoia was restarted with changes to handle ...,LCLS
3,145825,errorlog re: lcls soln 121,15-may-2007 20:24:11 %cau-e-epics_msg_pep cm...,LCLS
4,144311,rack location of lcls bx01/bx02 breaker,as one can see from electrical safety label th...,LCLS


In [521]:
df['text'] = df['text'].replace('\n', ' ')
df.head(50)

Unnamed: 0,elogid,title,text,tag
0,137674,EPICS LCLS IOC heartbeat fault.,,LCLS
1,137670,RR monitor to 2 Hz,A. Prinz approves this change. Approval grante...,LCLS
2,148112,PARANOIA Restart,PARANOIA was restarted with changes to handle ...,LCLS
3,145825,Errorlog re: LCLS SOLN 121,15-MAY-2007 20:24:11 %CAU-E-EPICS_MSG_PEP CM...,LCLS
4,144311,Rack Location of LCLS BX01/BX02 Breaker,As one can see from Electrical Safety label th...,LCLS
5,175210,"LCLS laser maintenance, beam suppressed.",,LCLS
6,175209,LCLS Station L1-S feedback railed itself,Railed the I&Q values so L1-S Ampl went to zer...,LCLS
7,191763,21-2 X-Band Klystron Bottom Coil Master/Slave ...,They are off on control failure showing phase ...,LCLS
8,190602,Energy dispersion on OTR11,,LCLS
9,190126,LCLS laser off for the night,Outgassing due to processing 21-1 is causing V...,LCLS


In [456]:
title = 'summary'
df[df['title'].str.endswith(title) == True].head(50)

Unnamed: 0,elogid,title,text,tag
11,197011,lcls operator summary,just before 9:00 stefanie made a 117.6 ns timi...,LCLS
17,194151,lcls owl shift summary,performed emittance scans on otr12 using xcor ...,LCLS
26,203623,lcls swing shift summary,"<table cellpadding=1 border=2 align=""center"">\...",LCLS
32,219142,lcls owl shift summary,<table cellpadding=2 border=2>\n<tr>\n <td> lc...,LCLS
35,218481,lcls owl summary,<table cellpadding=2 border=2>\n<tr>\n <td> lc...,LCLS
39,215969,lcls swing summary,<table cellpadding=2 border=2>\n<tr>\n <td> lc...,LCLS
41,214443,lcls summary,performed free-fun weekend scans and laser mai...,LCLS
42,214092,lcls swing shift summary,<table cellpadding=2 border=2 align=center>\n<...,LCLS
44,213976,lcls swing shift summary,<table cellpadding=2 border=2 align=center>\n\...,LCLS
49,223087,lcls day shift summary,<table cellpadding=2 border=2>\n<tr>\n <td> lc...,LCLS


In [481]:
df.at[120,'text']

'<p style="background-color: rgb(0, 0, 0);">\n<span style="color: rgb(255, 255, 255);">      | EMITX | BMAGX | EX*BX | HOURS \n--------------------------------------</span>\n<span style="color: rgb(0, 255, 0);"> OTR2 | 0.577 | 1.035 | 0.597 | 3.949  </span>\n<span style="color: rgb(0, 255, 0);"> WS12 | 0.767 | 1.062 | 0.814 | 3.840  </span>\n<span style="color: rgb(0, 255, 0);"> LI28 | 1.143 | 1.002 | 1.145 | 3.072  </span>\n<span style="color: rgb(0, 255, 0);"> LTU1 | 1.447 | 1.050 | 1.519 | 1.868  </span>\n<span style="color: rgb(255, 255, 255);">      | EMITY | BMAGY | EY*BY | HOURS \n--------------------------------------</span>\n<span style="color: rgb(0, 255, 0);"> OTR2 | 0.633 | 1.079 | 0.683 | 3.949  </span>\n<span style="color: rgb(0, 255, 0);"> WS12 | 0.484 | 1.017 | 0.492 | 3.727  </span>\n<span style="color: rgb(0, 255, 0);"> LI28 | 0.700 | 1.016 | 0.711 | 3.007  </span>\n<span style="color: rgb(0, 255, 0);"> LTU1 | 0.993 | 1.012 | 1.004 | 1.016  </span>\n</p>'

In [517]:
search = ['SUMMARY','Summary']
search = 'Swing'
df[df['title'].str.contains(search) == True]

Unnamed: 0,elogid,title,text,tag
26,203623,LCLS Swing Shift Summary,"<table cellpadding=1 border=2 align=""center"">\...",LCLS
39,215969,LCLS Swing Summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS
42,214092,LCLS Swing Shift Summary,<table cellpadding=2 border=2 align=center>\n<...,LCLS
43,214059,Swing Shift Program,The goal tonight is to improve the emittance. ...,LCLS
44,213976,LCLS Swing Shift Summary,<table cellpadding=2 border=2 align=center>\n\...,LCLS
2461,520296,FACET Swing Shift Summary,No Beam at all the entire shift.\n\nFACET VVS'...,FACET
2783,517123,FACET Summary: Swingshift,\n Program: \n\n 1.) LI20 RAD. survey by ...,FACET
2808,517081,FACET Program: Swing,\n Program:\n\n 1.) LI20 RAD. survey by RF...,FACET
2863,516884,FACET Program: Swing,1.) Continued FACET commissioning with LI10...,FACET
2886,516751,FACET Summary: Swingshift.,Machine program was taken by Machine physics:...,FACET


In [497]:
search = '\n'
df[(df['text'].str.contains('>')) & (df['text'].str.contains('<'))].head(50)

Unnamed: 0,elogid,title,text,tag
26,203623,LCLS Swing Shift Summary,"<table cellpadding=1 border=2 align=""center"">\...",LCLS
32,219142,LCLS Owl Shift Summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS
35,218481,LCLS Owl Summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS
39,215969,LCLS Swing Summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS
42,214092,LCLS Swing Shift Summary,<table cellpadding=2 border=2 align=center>\n<...,LCLS
44,213976,LCLS Swing Shift Summary,<table cellpadding=2 border=2 align=center>\n\...,LCLS
49,223087,LCLS day shift summary,<table cellpadding=2 border=2>\n<tr>\n <td> LC...,LCLS
54,240287,SW: LCLS Magnet PS Controls Release,Yeserday during the ROD I booted the latest ma...,LCLS
57,252453,SW: Reboot BC1 Bunch Length IOCs-,Rebooted Bunch Length Monitor EPICS IOC in li2...,LCLS
67,280473,Magnet string slave units BACT out-of tol FIXED,The BACT for the following string slaves were ...,LCLS


In [498]:
['Swing']

'<p style="background-color: rgb(0, 0, 0);">\n<span style="color: rgb(255, 255, 255);">      | EMITX | BMAGX | EX*BX | HOURS \n--------------------------------------</span>\n<span style="color: rgb(0, 255, 0);"> OTR2 | 0.573 | 1.011 | 0.579 | 0.338  </span>\n<span style="color: rgb(0, 255, 0);"> WS12 | 0.623 | 1.035 | 0.644 | 0.338  </span>\n<span style="color: rgb(0, 255, 0);"> LI28 | 1.395 | 1.025 | 1.429 | 0.338  </span>\n<span style="color: rgb(0, 255, 0);"> LTU1 | 1.920 | 1.008 | 1.935 | 0.338  </span>\n<span style="color: rgb(255, 255, 255);">      | EMITY | BMAGY | EY*BY | HOURS \n--------------------------------------</span>\n<span style="color: rgb(0, 255, 0);"> OTR2 | 0.658 | 1.008 | 0.663 | 0.338  </span>\n<span style="color: rgb(0, 255, 0);"> WS12 | 0.600 | 1.035 | 0.621 | 2.771  </span>\n<span style="color: rgb(0, 255, 0);"> LI28 | 0.791 | 1.001 | 0.791 | 0.020  </span>\n<span style="color: rgb(0, 255, 0);"> LTU1 | 1.271 | 1.008 | 1.281 | 0.338  </span>\n</p>'

In [None]:
text.replace("WelCome","WelCome".toUpperCase());

In [None]:
if df['text'].str.startswith(search) == True:
    

In [461]:
dumb = df.at[32, 'text']
dumb

"<table cellpadding=2 border=2>\n<tr>\n <td> lcls </td>\n <td> delivered </td>\n <td> md </td>\n <td> tuning </td>\n <td> down </td>\n <td> off</td>\n</tr>\n<tr>\n <td> - </td>\n <td> 0 </td>\n <td> 0 </td>\n <td> 8 </td>\n <td> 0 </td>\n <td> 0 </td>\n</tr>\n</table>\n\nprogram: to move l2 phase to -20' and do a ws12 scan; optimize beam to pr55.\n\nprogress:  started shift with poor beam to pr55 and by end of shift\nimproved beam to pr55.  lem, minor tweaks on linac correctors, and adding\na klystron (30-4) to give fb31 enby fbk more room, and turning on the\nli26 enby fbk all helped improve the beam on pr55 and it's stability vs.\nthe rf issues throughout the night.\n\nproblems: ws12 failed all but one scan this evening (the latest x\nemittance scan shown below).  we were plagued by rf issues which have\nsince been artemized.  these included constant trigger loss for all\nklystrons in li25, water faults on 22-5 and 29-3, cable faults on 24-6\n(bad pad requiring amrf intervention), as

In [465]:
new_dumb = dumb.split('\n\nprogram:')[1]
new_dumb.replace('\n', ' ')

" to move l2 phase to -20' and do a ws12 scan; optimize beam to pr55.  progress:  started shift with poor beam to pr55 and by end of shift improved beam to pr55.  lem, minor tweaks on linac correctors, and adding a klystron (30-4) to give fb31 enby fbk more room, and turning on the li26 enby fbk all helped improve the beam on pr55 and it's stability vs. the rf issues throughout the night.  problems: ws12 failed all but one scan this evening (the latest x emittance scan shown below).  we were plagued by rf issues which have since been artemized.  these included constant trigger loss for all klystrons in li25, water faults on 22-5 and 29-3, cable faults on 24-6 (bad pad requiring amrf intervention), as well as intermittent cable faults on 28-7 and 28-8 - which have steadily improved over the evening thanks to some work by pem, and the 29-2 modulator faulted twice."

> Ultimately it get's too messy when deciding what you want to take out and what you don't. For example all emittance measurements are stored in tables but so are summaries.. Should I change that? Or leave as is?