# DNS Packets Analysis and Visualization
The following notebook will contain the analysis and visualization DNS packets

## Importing Libraries

In [1]:
import pandas as pd 
import numpy as np 

## Extracting the Dataset 
This Dataset has been extracted from a python script 

In [2]:
df = pd.read_excel("output.xlsx",)

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,TYPE,VERSION,PROTOCOL,SOURCE_IP,DESTINATION_IP,ID,SOURCE_PORT,DESTINATION_PORT,TRANSACTION_ID,...,ttl,rdlen.1,mname,rname,serial,refresh,retry,expire,minimum,AR_record_Section
0,0,2048,4,17,192.168.43.90,192.168.43.1,10670,60951,53,25491,...,,,,,,,,,,[None]
1,1,2048,4,17,192.168.43.90,192.168.43.1,10671,52778,53,12071,...,,,,,,,,,,[None]
2,2,2048,4,17,192.168.43.90,192.168.43.1,10672,57108,53,49497,...,,,,,,,,,,[None]
3,3,2048,4,17,192.168.43.90,192.168.43.1,10673,63852,53,50783,...,,,,,,,,,,[None]
4,4,2048,4,17,192.168.43.1,192.168.43.90,6991,53,60951,25491,...,,,,,,,,,,[None]


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2023 entries, 0 to 2022
Data columns (total 46 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Unnamed: 0         2023 non-null   int64  
 1   TYPE               2023 non-null   int64  
 2   VERSION            2023 non-null   int64  
 3   PROTOCOL           2023 non-null   int64  
 4   SOURCE_IP          2023 non-null   object 
 5   DESTINATION_IP     2023 non-null   object 
 6   ID                 2023 non-null   int64  
 7   SOURCE_PORT        2023 non-null   int64  
 8   DESTINATION_PORT   2023 non-null   int64  
 9   TRANSACTION_ID     2023 non-null   int64  
 10  FLAG_QR            2023 non-null   int64  
 11  FLAG_OPCODE        2023 non-null   int64  
 12  FLAG_AA            2023 non-null   int64  
 13  FLAG_TC            2023 non-null   int64  
 14  FLAG_RD            2023 non-null   int64  
 15  FLAG_RA            2023 non-null   int64  
 16  FLAG_Z             2023 

# Packet Length
All the packets use  <b>UDP protocol</b> , so the Length of the packets are less than 512 bytes

# Port Analysis
Here we check how the packets are travelling from source to destination ports
### Port 53 is the standard port for DNS
### Source Ports

In [5]:
source_port=df['SOURCE_PORT'].value_counts()
source_port = pd.DataFrame(source_port)
source_port.reset_index(level=0, inplace=True)
source_port.columns = ['source_port','count']
source_port

Unnamed: 0,source_port,count
0,53,987
1,63852,120
2,63853,6
3,64302,5
4,49566,4
...,...,...
816,50035,1
817,49864,1
818,53962,1
819,60109,1


If the Source port is 53 then it is a response from the DNS server and there are 987 responses given 

## Destination Ports

In [6]:
destination_port=df['DESTINATION_PORT'].value_counts()
destination_port = pd.DataFrame(destination_port)
destination_port.reset_index(level=0, inplace=True)
destination_port.columns = ['destination_port','count']
destination_port

Unnamed: 0,destination_port,count
0,53,1036
1,63852,116
2,63853,6
3,64129,3
4,56511,3
...,...,...
814,55988,1
815,53941,1
816,50575,1
817,60090,1


If the destination port is 53 then it is a request from the host sent to the server and there are 1036 requests sent

# Flag Analysis
Here we will analyse the flags in the DNS packet


## 1. FLAG_QR
0 and 1 are the flag bits <br>
0 : It represents that the packet is a query <br>
1 : It represents that the packet is a response for a particular query <br>

In [7]:
df['FLAG_QR'].value_counts()

0    1036
1     987
Name: FLAG_QR, dtype: int64

There are 1036 query packets <br>
There are 987 response packets

## 2. FLAG_OPCODE
There are totaly 4 flag bits in OPCODE<br>
0000 : It represents standard query <br>
0100 : It represents inverse  <br>
0010 and 0001 : It represents not used <br>
 

In [8]:
df['FLAG_OPCODE'].value_counts()

0    2023
Name: FLAG_OPCODE, dtype: int64

Here all 2023 packets are standard query 

## 3. FLAG_A
0 and 1 are the flag bits <br>
0 : It represents that the packet is a non authoritive DNS answer <br>
1 : It represents that the packet is a authoritive DNS answer <br>

In [9]:
df['FLAG_AA'].value_counts()

0    2023
Name: FLAG_AA, dtype: int64

Here all 2023 packets are non authoritive DNS answer

## 4. FLAG_TC
0 and 1 are the flag bits <br>
0 : It represents that the message is not truncated <br>
1 : It represents that the message is truncated <br>

In [10]:
df['FLAG_TC'].value_counts()

0    2023
Name: FLAG_TC, dtype: int64

Here all 2023 messages are not truncated

## 5. FLAG_RD
0 and 1 are the flag bits <br>
0 : It represents that the query is non recursive <br>
1 : It represents that the query is  recursive <br>

In [11]:
df['FLAG_RD'].value_counts()

1    2023
Name: FLAG_RD, dtype: int64

Here all 2023 queries are recursive

## 6. FLAG_RA
0 and 1 are the flag bits <br>
0 : It represents recursion not availble  <br>
1 : It represents recursion  availble  <br>


In [12]:
df['FLAG_RA'].value_counts()

0    1036
1     987
Name: FLAG_RA, dtype: int64

Server can not resolve 1036 queries recursively  <br>
Server can  resolve 987 queries recursively 

## 7. FLAG_Z and FLAG_CD
These flags are used for reservation of future use



## 8. FLAG_AD
0 : It represents that the authoritive part of the query is not authenticated by the server <br>
1 : It represents that the authoritive part of the query is  authenticated by the server <br>

In [13]:
df['FLAG_AD'].value_counts()

0    2023
Name: FLAG_AD, dtype: int64

Here all 2023 queries whose authoritive part is not authenticated by the server

## 9.FLAG_RCODE
This contains 4 bits <br> 
0000 : No error <br>
0100 : Format error in query <br>
0010 : Server failure <br>
0001 : Name does not exist <br>

In [14]:
df['FLAG_RCODE'].value_counts()

0    2023
Name: FLAG_RCODE, dtype: int64

Here all 2023 queries does not have any error

# Request-Response Analysis


## Types of Request-Response 
It defines how the value is interpreted

In [19]:
query_type=df['QUEARY_TYPE'].value_counts()
query_type = pd.DataFrame(query_type)
query_type.reset_index(level=0, inplace=True)
query_type.columns = ['Query_type','count']
query_type

Unnamed: 0,Query_type,count
0,A,1087
1,AAAA,936


In [20]:
response_type1=df['Type'].value_counts()
response_type2=df['type'].value_counts()
response_type=pd.DataFrame(response_type1.append(response_type2))
response_type.reset_index(level=0, inplace=True)
response_type.columns = ['Response type','count']
response_type

Unnamed: 0,Response type,count
0,CNAME,446
1,A,291
2,AAAA,137
3,SOA,184


In [21]:
fig = go.Figure(data=[
    go.Bar(name='Request type', x=query_type['Query_type'], y=query_type['count']),
    go.Bar(name='Response type', x=response_type['Response type'], y=response_type['count'])])
fig.update_layout(autosize=False,width=1250,height=1000,barmode='stack',
margin=dict(
        l=50,
        r=50,
        b=100,
        t=100,
        pad=4
    ))
fig.show()

The above graph tells us that there are 2 types of Requests and 4 types of Responses <br>
<br>
But there 6 common types of Requests and Responses<br>
<b>CNAME</b> - Interpretation of value defines an alias for the official name of a host<br>
<b>A</b> - Interpretation of value is 32- bit IPv4 address<br>
<b>AAAA</b> - Interpretation of value is IPv6 address<br>
<b>SOA</b> - Interpretation of value is marks the beginning of a zone<br>
<b>MX</b> - Interpretation of value is redirects mail to a mail server<br>
<b>NS</b> - Interpretation of value is Identifies the authoritative servers for a zone<br>




## Top Talkers
The traffic flows that generate the heaviest use of bandwidth are known as the Top Talkers. This report provides visibility for traffic analysis, security monitoring, accounting, load balancing, and capacity planning. It can include both optimized and pass-through traffic.

In [22]:
y=df['QUEARY_NAME'].value_counts()
Top_talkers=y.reset_index(level=None, drop=False, name=None)
Top_talkers.columns = ['Domain names','count']
Top_talkers

Unnamed: 0,Domain names,count
0,fonts.googleapis.com.,44
1,www.google.com.,43
2,www.gstatic.com.,38
3,www.facebook.com.,36
4,www.netacad.com.,31
...,...,...
165,cisco-tags.cisco.com.,4
166,forms.hsforms.com.,2
167,teredo.ipv6.microsoft.com.,2
168,tracking.crazyegg.com.,2


In [23]:
import plotly.express as px
fig = px.bar(Top_talkers, x='Domain names', y='count')
fig.update_layout(uniformtext_minsize=20, uniformtext_mode='hide',autosize=False,width=1250,height=1000)
fig.show()

From the above graph we can say that the Top Talker is "fonts.googleapis.com" with 44 request interaction

## Response-Request Count Analysis
Here we are analysing the number of requests sent and number of responses received for a particular domain name 


In [15]:
y=df['QUEARY_NAME'].value_counts()
Query_name=y.reset_index(level=None, drop=False, name=None)
Query_name.columns = ['Query_Name','count']
Query_name

Unnamed: 0,Query_Name,count
0,fonts.googleapis.com.,44
1,www.google.com.,43
2,www.gstatic.com.,38
3,www.facebook.com.,36
4,www.netacad.com.,31
...,...,...
165,cisco-tags.cisco.com.,4
166,forms.hsforms.com.,2
167,teredo.ipv6.microsoft.com.,2
168,tracking.crazyegg.com.,2


In [16]:
Response_Name=df['RR_name'].value_counts()
Response_Name = pd.DataFrame(Response_Name)
Response_Name.reset_index(level=0, inplace=True)
Response_Name.columns = ['Response_Name','count']
Response_Name

Unnamed: 0,Response_Name,count
0,fonts.googleapis.com.,20
1,www.google.com.,19
2,www.gstatic.com.,19
3,www.facebook.com.,18
4,fonts.gstatic.com.,16
...,...,...
164,hexagon-analytics.com.,1
165,static.zdassets.com.,1
166,tracking.crazyegg.com.,1
167,syndication.twitter.com.,1


In [18]:
import plotly.graph_objects as go
fig = go.Figure(data=[
    go.Bar(name='Request Name', x=Query_name['Query_Name'], y=Query_name['count']),
    go.Bar(name='Response Name', x=Response_Name['Response_Name'], y=Response_Name['count'])
])
# Change the bar mode
fig.update_layout(autosize=False,width=1250,height=1000,barmode='stack',
margin=dict(
        l=50,
        r=50,
        b=100,
        t=100,
        pad=4
    ))
fig.show()

The above graph represents that the number of requests are more than number of reponses for a particular domain