## AIML Online Capstone - AUTOMATIC TICKET ASSIGNMENT
## The Real Problem
One of the key activities of any IT function is to “Keep the lights on” to ensure there is no impact to the
Business operations. IT leverages Incident Management process to achieve the
above Objective. An incident is something that is unplanned interruption to an IT service or
reduction in the quality of an IT service that affects the Users and the Business. The main goal
of Incident Management process is to provide a quick fix / workarounds or solutions that resolves the
interruption and restores the service to its full capacity to ensure no business impact. In most of the
organizations, incidents are created by various Business and IT Users, End Users/ Vendors if they have
access to ticketing systems, and from the integrated monitoring systems and tools. Assigning the
incidents to the appropriate person or unit in the support team has critical importance to provide
improved user satisfaction while ensuring better allocation of support resources. The assignment of
incidents to appropriate IT groups is still a manual process in many of the IT organizations. Manual
assignment of incidents is time consuming and requires human efforts. There may be mistakes due to
human errors and resource consumption is carried out ineffectively because of
the misaddressing. On the other hand, manual assignment increases the response and resolution times
which result in user satisfaction deterioration / poor customer service.
## Business Domain Value
In the support process, incoming incidents are analyzed and assessed by organization’s support teams to
fulfill the request. In many organizations, better allocation and effective usage of the valuable support
resources will directly result in substantial cost savings.
Currently the incidents are created by various stakeholders (Business Users, IT Users and Monitoring
Tools) within IT Service Management Tool and are assigned to Service Desk teams (L1 / L2 teams). This
team will review the incidents for right ticket categorization, priorities and then carry out initial
diagnosis to see if they can resolve. Around ~54% of the incidents are resolved by L1 / L2 teams. Incase
L1 / L2 is unable to resolve, they will then escalate / assign the tickets to Functional teams from
Applications and Infrastructure (L3 teams). Some portions of incidents are directly assigned to L3 teams
by either Monitoring tools or Callers / Requestors. L3 teams will carry out detailed diagnosis and resolve
the incidents. Around ~56% of incidents are resolved by Functional / L3 teams. Incase if vendor support
is needed, they will reach out for their support towards incident closure.
L1 / L2 needs to spend time reviewing Standard Operating Procedures (SOPs) before assigning to
Functional teams (Minimum ~25-30% of incidents needs to be reviewed for SOPs before ticket
assignment). 15 min is being spent for SOP review for each incident. Minimum of ~1 FTE effort needed
only for incident assignment to L3 teams.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
During the process of incident assignments by L1 / L2 teams to functional groups, there were multiple
instances of incidents getting assigned to wrong functional groups. Around ~25% of Incidents are
wrongly assigned to functional teams. Additional effort needed for Functional teams to re-assign to right
functional groups. During this process, some of the incidents are in queue and not addressed timely
resulting in poor customer service.
Guided by powerful AI techniques that can classify incidents to right functional groups can help
organizations to reduce the resolving time of the issue and can focus on more productive tasks.
## Project Description
In this capstone project, the goal is to build a classifier that can classify the tickets by analyzing text.
Details about the data and dataset files are given in below link,
https://drive.google.com/open?id=1OZNJm81JXucV3HmZroMq6qCT2m7ez7IJ

## Importing neccessary libraries

In [89]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly

import re

from bs4 import BeautifulSoup

from fuzzywuzzy import fuzz 
from fuzzywuzzy import process

import ftfy
from ftfy.badness import sequence_weirdness
import chardet
import googletrans
from googletrans import Translator
from textblob import TextBlob
import spacy
from spacy_langdetect import LanguageDetector
import spacy_langdetect
from spacy.tokens import Doc, Span
import string
import xml.etree

from emot.emo_unicode import UNICODE_EMO, EMOTICONS


## Reading the data

In [90]:
df = pd.read_excel("C:\GL AIML program\input_data.xlsx")

In [91]:
    df.head()

Unnamed: 0,Short description,Description,Caller,Assignment group
0,login issue,-verified user details.(employee# & manager na...,spxjnwir pjlcoqds,GRP_0
1,outlook,\r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail...,hmjdrvpb komuaywn,GRP_0
2,cant log in to vpn,\r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail...,eylqgodm ybqkwiam,GRP_0
3,unable to access hr_tool page,unable to access hr_tool page,xbkucsvz gcpydteq,GRP_0
4,skype error,skype error,owlgqjme qhcozdfx,GRP_0


In [92]:
df.shape

(8500, 4)

In [93]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8500 entries, 0 to 8499
Data columns (total 4 columns):
Short description    8492 non-null object
Description          8499 non-null object
Caller               8500 non-null object
Assignment group     8500 non-null object
dtypes: object(4)
memory usage: 265.8+ KB


In [94]:
df.describe().transpose()

Unnamed: 0,count,unique,top,freq
Short description,8492,7481,password reset,38
Description,8499,7817,the,56
Caller,8500,2950,bpctwhsn kzqsbmtp,810
Assignment group,8500,74,GRP_0,3976


## Handling missing value

In [95]:
df.isnull().sum()

Short description    8
Description          1
Caller               0
Assignment group     0
dtype: int64

In [96]:
df[df.isna().any(axis=1)]

Unnamed: 0,Short description,Description,Caller,Assignment group
2604,,\r\n\r\nreceived from: ohdrnswl.rezuibdt@gmail...,ohdrnswl rezuibdt,GRP_34
3383,,\r\n-connected to the user system using teamvi...,qftpazns fxpnytmk,GRP_0
3906,,-user unable tologin to vpn.\r\n-connected to...,awpcmsey ctdiuqwe,GRP_0
3910,,-user unable tologin to vpn.\r\n-connected to...,rhwsmefo tvphyura,GRP_0
3915,,-user unable tologin to vpn.\r\n-connected to...,hxripljo efzounig,GRP_0
3921,,-user unable tologin to vpn.\r\n-connected to...,cziadygo veiosxby,GRP_0
3924,,name:wvqgbdhm fwchqjor\nlanguage:\nbrowser:mic...,wvqgbdhm fwchqjor,GRP_0
4341,,\r\n\r\nreceived from: eqmuniov.ehxkcbgj@gmail...,eqmuniov ehxkcbgj,GRP_0
4395,i am locked out of skype,,viyglzfo ajtfzpkb,GRP_0


In [97]:
    df[df.values  == "the"].head()


Unnamed: 0,Short description,Description,Caller,Assignment group
1049,reset passwords for soldfnbq uhnbsvqd using pa...,the,soldfnbq uhnbsvqd,GRP_17
1054,reset passwords for fygrwuna gomcekzi using pa...,the,fygrwuna gomcekzi,GRP_17
1144,reset passwords for wvdxnkhf jirecvta using pa...,the,wvdxnkhf jirecvta,GRP_17
1184,reset passwords for pxvjczdt kizsjfpq using pa...,the,pxvjczdt kizsjfpq,GRP_17
1292,reset passwords for cubdsrml znewqgop using pa...,the,cubdsrml znewqgop,GRP_17


In [98]:
df['final_Description'] = df['Description'].map(str) +" " + df['Short description'].map(str)
df.head()

Unnamed: 0,Short description,Description,Caller,Assignment group,final_Description
0,login issue,-verified user details.(employee# & manager na...,spxjnwir pjlcoqds,GRP_0,-verified user details.(employee# & manager na...
1,outlook,\r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail...,hmjdrvpb komuaywn,GRP_0,\r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail...
2,cant log in to vpn,\r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail...,eylqgodm ybqkwiam,GRP_0,\r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail...
3,unable to access hr_tool page,unable to access hr_tool page,xbkucsvz gcpydteq,GRP_0,unable to access hr_tool page unable to access...
4,skype error,skype error,owlgqjme qhcozdfx,GRP_0,skype error skype error


## Data preprocessing and Cleaning

In [99]:
#Function for removing html
def html(text):
    return BeautifulSoup(text, "lxml").text

df['Description_clean'] = df['final_Description'].apply(html)

In [100]:
df['Description_clean'] = df['Description_clean'].replace(to_replace ='\r\n', value = " ", regex = True)

In [101]:
df['Description_clean'].iloc[1]

'received from: hmjdrvpb.komuaywn@gmail.com  hello team,  my meetings/skype meetings etc are not appearing in my outlook calendar, can somebody please advise how to correct this?  kind  outlook'

In [102]:
df['Description_clean'] = df['Description_clean'].replace(to_replace ="([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", value = 'emailid', regex = True)

In [103]:

# Function for removing emoticons
def remove_emoticons(text):
    emoticon_pattern = re.compile(u'(' + u'|'.join(k for k in EMOTICONS) + u')')
    return emoticon_pattern.sub(r'', text)
# applying remove_emoticons to 'Description'
df['Description_clean'] = df['Description_clean'].apply(remove_emoticons)

In [104]:
print("Description: \n", df['Description'][582])
print("Cleaned data: \n", df['Description_clean'][582])

Description: 
 hallo, kannst du einmal nachsehen, wo der e-mail button ist am drucker. er ist weg! :-) danke. uwe
Cleaned data: 
 hallo, kannst du einmal nachsehen, wo der e-mail button ist am drucker. er ist weg!  danke. uwe wk38 - qdxyifhj zbwtunpy


In [105]:
# Spell check using text blob for the first 5 records
#df['Description_clean'][:5].apply(lambda x: str(TextBlob(x).correct())) 


In [106]:
def is_valid_unicode_str(text):
    if not sequence_weirdness(text):
        # nothing weird, should be okay
        return True
    try:
        text.encode('sloppy-windows-1252')
    except UnicodeEncodeError:
        # Not CP-1252 encodable, probably fine
        return True
    else:
        # Encodable as CP-1252, Mojibake alert level high
        return False

df['Unicode'] = df['Description_clean'].apply(lambda x: is_valid_unicode_str(x))
df.loc[df['Unicode'] == False].head()

Unnamed: 0,Short description,Description,Caller,Assignment group,final_Description,Description_clean,Unicode
99,password expiry tomorrow,\n\nreceived from: ecprjbod.litmjwsy@gmail.com...,ecprjbod litmjwsy,GRP_0,\n\nreceived from: ecprjbod.litmjwsy@gmail.com...,received from: emailid\n\nmy system says my pa...,False
116,server issues,\r\n\r\nreceived from: bgqpotek.cuxakvml@gmail...,bgqpotek cuxakvml,GRP_0,\r\n\r\nreceived from: bgqpotek.cuxakvml@gmail...,"received from: emailid hello, i have been tr...",False
124,mobile device activation,"from: tvcdfqgp nrbcqwgj \nsent: friday, octobe...",tvcdfqgp nrbcqwgj,GRP_0,"from: tvcdfqgp nrbcqwgj \nsent: friday, octobe...","from: tvcdfqgp nrbcqwgj \nsent: friday, octobe...",False
164,æ’¤å›ž: ticket_no1564867 -- comments added,\n\nreceived from: abcdri@company.com\n\nwindy...,tycludks cjofwigv,GRP_0,\n\nreceived from: abcdri@company.com\n\nwindy...,received from: emailid\n\nwindy shi å°†æ’¤å›žé...,False
170,[urgent!!] delivery note creation request!!,\n\nreceived from: fbvpcytz.nokypgvx@gmail.com...,fbvpcytz nokypgvx,GRP_18,\n\nreceived from: fbvpcytz.nokypgvx@gmail.com...,"received from: emailid\n\nhello it team,\n\nco...",False


In [107]:
df['Description_clean'] = df['Description_clean'].apply(lambda x: ftfy.fix_text(x))
df.loc[df['Unicode'] == False].head()

Unnamed: 0,Short description,Description,Caller,Assignment group,final_Description,Description_clean,Unicode
99,password expiry tomorrow,\n\nreceived from: ecprjbod.litmjwsy@gmail.com...,ecprjbod litmjwsy,GRP_0,\n\nreceived from: ecprjbod.litmjwsy@gmail.com...,received from: emailid\n\nmy system says my pa...,False
116,server issues,\r\n\r\nreceived from: bgqpotek.cuxakvml@gmail...,bgqpotek cuxakvml,GRP_0,\r\n\r\nreceived from: bgqpotek.cuxakvml@gmail...,"received from: emailid hello, i have been tr...",False
124,mobile device activation,"from: tvcdfqgp nrbcqwgj \nsent: friday, octobe...",tvcdfqgp nrbcqwgj,GRP_0,"from: tvcdfqgp nrbcqwgj \nsent: friday, octobe...","from: tvcdfqgp nrbcqwgj \nsent: friday, octobe...",False
164,æ’¤å›ž: ticket_no1564867 -- comments added,\n\nreceived from: abcdri@company.com\n\nwindy...,tycludks cjofwigv,GRP_0,\n\nreceived from: abcdri@company.com\n\nwindy...,"received from: emailid\n\nwindy shi 将撤回邮件""tick...",False
170,[urgent!!] delivery note creation request!!,\n\nreceived from: fbvpcytz.nokypgvx@gmail.com...,fbvpcytz nokypgvx,GRP_18,\n\nreceived from: fbvpcytz.nokypgvx@gmail.com...,"received from: emailid\n\nhello it team,\n\nco...",False


In [108]:
def remove_punctuation(text):
  no_punct = "".join([c for c in text if c not in string.punctuation])
  return no_punct

In [109]:
df['Description_clean'] = df['Description_clean'].apply(lambda x: remove_punctuation(x))
df['Description_clean'].head()

0    verified user detailsemployee  manager name ch...
1    received from emailid  hello team  my meetings...
2    received from emailid  hi  i cannot log on to ...
3    unable to access hrtool page unable to access ...
4                            skype error  skype error 
Name: Description_clean, dtype: object

In [110]:
def uniquify(string):
    output = []
    seen = set()
    for word in string.split():
        if word not in seen:
            output.append(word)
            seen.add(word)
    return ' '.join(output)

df['Description_clean'] = df['Description_clean'].apply(lambda x: uniquify(x))
df['Description_clean'].head()


0    verified user detailsemployee manager name che...
1    received from emailid hello team my meetingssk...
2    received from emailid hi i cannot log on to vp...
3                         unable to access hrtool page
4                                          skype error
Name: Description_clean, dtype: object

In [111]:

nlp =spacy.load(r"C:\Users\vijpal\AppData\Local\Continuum\anaconda3\Lib\site-packages\en_core_web_sm\en_core_web_sm-2.2.5")
nlp.add_pipe(LanguageDetector(), name="language_detector", last=True)
text = '-verified user details.(employee# & manager name)-checked the user name in ad and reset the password.-advised the user to login and check.-caller confirmed that he was able to login.-issue resolved. login issue'
doc = nlp(text)
# document level language detection. Think of it like average language of document!
print(doc._.language['language'])
# sentence level language detection
for i, sent in enumerate(doc.sents):
    print(sent, sent._.language)

en
-verified {'language': 'af', 'score': 0.999995720574847}
user details.(employee# & manager name)-checked the user name in ad and reset the password.-advised {'language': 'en', 'score': 0.9999954685226186}
the user to login and check.-caller confirmed that he was able to login.-issue resolved. {'language': 'en', 'score': 0.9999957742918021}
login issue {'language': 'fr', 'score': 0.42856975269216213}


In [112]:
nlp =spacy.load(r"C:\Users\vijpal\AppData\Local\Continuum\anaconda3\Lib\site-packages\en_core_web_sm\en_core_web_sm-2.2.5")
nlp.add_pipe(LanguageDetector(), name="language_detector", last=True)

def detect_language(text):
    doc = nlp(text)
    # document level language detection. Think of it like average language of document!
    for i, sent in enumerate(doc.sents):
        return sent, sent._.language

In [113]:
df['Language'] = df['Description_clean'].apply(lambda x: detect_language(x))
df['Language'].head()


0    ((verified, user, detailsemployee, manager, na...
1    ((received, from, emailid, hello, team), {'lan...
2    ((received, from, emailid), {'language': 'en',...
3    ((unable, to, access, hrtool, page), {'languag...
4    ((skype, error), {'language': 'no', 'score': 0...
Name: Language, dtype: object

In [114]:
df.to_csv("validate.csv")

In [119]:
df['Description_clean'].iloc[223]

'hallo es ist erneut passiert der pc hat sich zum wiederholten male aufgehängt und mir lediglich einen blauen bildschirm mit weisser schrift präsentiert was können wir da machen probleme bluescreen'