<a href="https://colab.research.google.com/github/thelencm/thelencm.github.io/blob/master/01a_Working_with_Bytes_Notetaking_Guide.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <center> Module 1a - Encoding Data
## <center> ENGR 580A2: Secure Vehicle and Industrial Networking
## <center><img src="https://www.engr.colostate.edu/~jdaily/Systems-EN-CSU-1-C357.svg" width="600" /> 
### <center> Instructor: Dr. Jeremy Daily<br>Fall 2020

## Lesson Outcomes
After completing this exercise, students should be able to:
1. Realize encoded data is not encrypted.
2. Carry-out data encoding for integers of different lengths.
3. Present data as text strings or numbers based on the desired encoding.
4. Develop Python programming skills to work with different types of encoding.

### Overview
1. Define a string of bytes
2. Explore integer encoding
3. Learn the struct library
4. Understand how to encode data as text with different codecs
5. Represent binary only using text using base64 encoding

In [1]:
# Given a series of bits
# This could represent some data frame in a CAN message
a = 0b0110000101110100011101000110000101100011011010110000110100001010

In [2]:
#What is a?
# The python default is an integer
print(a)
type(a)

7022365680606055690


int

In [3]:
# display integer as hex characters
print("{:X}".format(a))


61747461636B0D0A


In [None]:
#display integer as binary
print("{:064b}".format(a))

0110000101110100011101000110000101100011011010110000110100001010


In [None]:
# Python 3 also has a data type of bytes. 
# Most network traffic arrives as bytes
b = 
print(b)
type(b)

In [None]:
len(b)

In [None]:
# You can iterate through an array of bytes
for i in b:
    

In [None]:
# Let's print the hex characters
for i in b:
    

In [None]:
#Make a nice display of hex with raw bytes
# This uses an efficient coding concept called list comprehension


In [None]:
#Make a nice display of binary with raw bytes
" ".join(["{:08b}".format(i) for i in b])

In [None]:
# pretend the last 4 bytes is a source IP address. 
# Let's display the address


In [None]:
# Let's pretend the first four bytes are the destination IP address. 
# In Wireshark, this IP address would show up as hex:
" ".join(["{:02X}".format(i) for i in b[:4]])

### Decoding Options
There are many options for decoding the raw bytes. The struct module is most helpful.

https://docs.python.org/3.8/library/struct.html

In [None]:
import struct

In [None]:
# We might have 8 single byte, unsigned integers


In [None]:
# We might have 8 single byte, signed integers


In [None]:
# These are the same. Let's try another example
# The \x escape tells that this is a hex string
c = b'\xDA'
print(c)

In [None]:
# Unsigned Integer


In [None]:
# Signed Integer


In [None]:
# The return value is always a tuple.
type(struct.unpack('b',c))

In [None]:
# To get an ingeger, index the tuple
struct.unpack('b',c)[0]

In [None]:
type(struct.unpack('b',c)[0])

In [None]:
#What if we had 4 16-bit unsigned numbers?


In [None]:
# But the byte order matters,


In [None]:
struct.unpack(">HHHH",b)

In [None]:
#Big endian (the way humans read)


In [None]:
#Big endian (Motorola Format)
struct.pack(">H",24948)

In [None]:
#little endian (Reverse Byte order)
(24948).to_bytes(2,'little')

In [None]:
#Little endian (Intel Format)
# Note 24948 = 0x6174 

print(d)

In [None]:
# Notice the reverse byte order (Little Endian or Intel)
" ".join(["{:02X}".format(i) for i in d])

In [None]:
# Signed 2-byte integers


In [None]:
# Signed 2-byte integers
struct.unpack("<h",b'\xda\x55')

In [None]:
# notice the reverse byte order
print(0x55da)

### Characters

In [8]:
# We might have 8 single characters

print(e)

NameError: ignored

In [9]:
#Combine the bytes into a single string of bytes.


In [10]:
#This is the same as packing
f = struct.pack("cccccccc",b'a', b't', b't', b'a', b'c', b'k', b'\r', b'\n')
print(f)

NameError: ignored

In [None]:
# Using s enables the creation of a byte string without join


In [None]:
#Convert to a string
str(f)

In [None]:
# Not really what we want. We know this is ascii text


In [None]:
# Notice the extra line from the carriage return \r and new line \n.
print(f.decode('ascii'))

In [None]:
# More modern is utf-8


In [None]:
# There are many character sets
f.decode('latin-1')

In [None]:
b'attack\xc4\xfa'.decode('latin-1')

In [None]:
b'attack\xc4\xfa'.decode('ascii')

In [None]:
# We can ignore some of the non-ascii characters
b'attack\xc4\xfa'.decode('ascii','ignore')

In [None]:
b'attack\xc4\xfa'.decode('utf-8')

In [None]:
# This keeps from getting errors
b'attack\xc4\xfa'.decode('utf-8','replace')

In [None]:
# Here's a valid UTF string. It uses 2 bytes.
b'attack\xc4\x8a'.decode('utf-8','strict')

### Long Integers

In [None]:
struct.unpack('>LL',b)

In [None]:
# Notice the symmetry on the first four bytes: atta
struct.unpack('<L',b'ck\r\n')

In [None]:
struct.unpack('>L',b'ck\r\n')

In [None]:
struct.pack("<L",168651619)

In [None]:
#Hex as an int
0x0A0D6B63

In [None]:
# Reverse byte order


In [None]:
#Signed long integers
struct.unpack('>ll',b)

### Practical Example: Decoding Vehicle Miles
SAE J1939 has a message defined as PGN 65248: Vehicle Distance. It has two 32-bit integers in the 8 byte message. The first four bytes are SPN 244: Trip Distance and the second number is SPN 245: Total Vehicle Distance, or the Odometer reading. It is represented as the number of 0.125 km that have accumulated. This message can be found in many truck log files.

In [7]:
#First, print the PGN in hex
"{:X}".format(65248)

'FEE0'

In [5]:
# A CAN log file from a truck has this line corresponding to PGN 65248
log_text = "(012.102753)  can1  18FEE000   [8]  73 49 03 00 BC E0 33 00"

In [6]:
# Parse the line into a list:
entries = 
entries

SyntaxError: ignored

In [None]:
#Convert to bytes
data_bytes = 
data_bytes

In [None]:
# another way
data_bytes = b''
for i in entries[4:12]:
    data_bytes += 
data_bytes

In [None]:
# J1939 is in little endian (Intel) format
pgn_values = 
pgn_values

In [None]:
# Compute mileage
SPN245 = 0.125*pgn_values[1]/1.6071
print("The Total Vehicle Distance is {:0,.1f} miles.".format(SPN245))

In [None]:
# The long (bad) way
# Multiply the bytes by their place holder
value = 0
value += data_bytes[4]
value += data_bytes[5]*256
value += data_bytes[6]*256*256
value += data_bytes[7]*256*256*256
value

### 64-bit numbers

In [None]:
#Convert the double long integer a into bytes


In [None]:
# See how to convert back into a 64-bit integer


In [None]:
#Byte order is important
struct.unpack('<Q',b)

In [None]:
# Signed 64-bit integers
struct.unpack('>q',b)

In [None]:
# The first bit must be set to get a negative number
neg_num = struct.unpack('>q',b'\xd0ttack\r\n')
neg_num

In [None]:
# We can look at this as two floats


In [None]:
# We can look at this as two floats (endianness doesn't matter)
struct.unpack('<ff',b)

In [None]:
# We can look at this as a single double float


In [None]:
# The inverse
1/struct.unpack('>d',b)[0]

## Sending bytes as text only
Base64 encoding

https://docs.python.org/3.8/library/base64.html


This is how to send cryptographic bytes in e-mail.

In [None]:
import base64

In [None]:
#As bytes
g = 
print(g)

In [None]:
#As a string


In [None]:
len(g)

In [None]:
len(b)

While the length of the data increases by 1.5, it enables transmission by email or http. Base64 encoding is very common for storing cryptographic data.

In [None]:
b

In [None]:
#What about just converting to hex characters?
h = 
h

In [None]:
#Converting to printable hex doubles the length. Therefore base64 encoding is more efficient.
len(h)

In [None]:
# recall: display integer as hex characters
print("{:016X}".format(a))

In [None]:
#Decode
base64.b64decode('YXR0YWNrDQo=')

In [None]:
char_list = 
print(char_list)

In [None]:
#the alphabet:
j = 
print(j)

In [None]:
len(j)

In [None]:
len(j)/len(char_list)

In [None]:
len(char_list)

Note: Base64 encoded data is NOT encrypted. No additional information is needed to decode the data. There is no key.

## Crude Ciphers
### Simple XOR encryption
Given use an XOR operation to encrypt and decrypt.

In [None]:
plain_text =  "Fourscore and seven years ago our fathers brought forth, on this continent, a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived, and so dedicated, can long endure."

In [None]:
key = 
struct.pack('B',key)

In [None]:
#convert to a bytearray
plain_bytes = bytes(plain_text,'utf-8')
print(plain_bytes)

In [None]:
cipher_bytes = 
cipher_bytes

In [None]:
#decrypt is the same process:


In [None]:
min(plain_bytes)

In [None]:
# What if you don't know the key?
for k in range(256):
    canidate = bytearray(x ^ k for x in cipher_bytes)
    if : #Ascii       
        print(k)
        print(canidate)
        print()

In [None]:
# A Ceasar Shift Cipher
shifted_text = 
shifted_text

In [None]:
# A Ceasar Shift Cipher decipher
bytearray((x - key) for x in shifted_text)

In [None]:
#Compute the key based on frequency
#spaces are frequent, so
space_guess = ord('%')
print(space_guess)
space = ord(' ')
print(space)
key_guess = space_guess - space
print(key_guess)

In [None]:
# A classic ceaser shift cipher with a shift of 13 (half the alphabet)
import codecs
codecs.encode('aAbcdez', 'rot13',)

#### Examine different character encodings

In [None]:
#UTF-8
" ".join()

In [None]:
#Latin-1
" ".join([struct.pack('B',x).decode('latin-1','ignore') for x in range(0xff)])

In [None]:
[" ".join([struct.pack('B',x).decode('greek','ignore') for x in range(0xff)])]

## Concluding Remarks
* You should see and appreciate the different ways binary data can be encoded. 
* Communications rely heavily on common codecs. 
* Encoding is not encrypting.