In [15]:
import hashlib

# TLS Protocol

#### by Andon Gorchov (@thunderman913)

## 1 Fundamentals of Cryptography

Cryptography is a technique of securing information and communications through the use of codes so that only those persons for whom the information is intended can understand and process it. Thus preventing unauthorized access to information. The prefix “crypt” means “hidden” and the suffix “graphy” means “writing”. In Cryptography, the techniques that are used to protect information are obtained from mathematical concepts and a set of rule-based calculations known as algorithms to convert messages in ways that make it hard to decode them. These algorithms are used for cryptographic key generation, digital signing, and verification to protect data privacy, web browsing on the internet and to protect confidential transactions such as credit card and debit card transactions. [1]

It has the following features:
- Confidentiality: The communicated information can only be accessed by the person for whom it is intended and nobody else should be able to access it.
- Integrity: The received information must remain unaltered, accurate and exact.
- Non-repudiation: After sending the message/information, the sender cannot revert it later on. That provides evidence of the communication and is crucial for situations, where accountability and legal disputes arise.
- Authentication: This is a mechanism used to verify the identity of the user, system or entity.
- Interoperability: It allows for secure communication between different systems and platforms.
- Adaptability: Cryptography must continuously evolve and improve to stay ahead of any possible security threats, since one security breach could be fatal.

### 1.1 Encryption Types

- **Symmetric Encryption**

Symmetric-key algorithms are algorithms for cryptography that use the same cryptographic keys for both the encryption of plaintext and the decryption of ciphertext. The keys may be identical, or there may be a simple transformation to go between the two keys. The keys, in practice, represent a shared secret between two or more parties that can be used to maintain a private information link. [2]

<img src="./pictures/symmetric_encryption.png" alt="drawing" width="600"/>

- **Asymmetric Encryption**

Asymmetric encryption, also known as public-key encryption, is the field of cryptographic systems that use pairs of related keys. Each key pair consists of a public key and a corresponding private key. Key pairs are generated with cryptographic algorithms based on mathematical problems termed one-way functions. Security of public-key cryptography depends on keeping the private key secret; the public key can be openly distributed without compromising security.

In a public-key encryption system, anyone with a public key can encrypt a message, yielding a ciphertext, but only those who know the corresponding private key can decrypt the ciphertext to obtain the original message. [3]

<img src="./pictures/asymmetric_encryption.png" alt="drawing" width="600"/>

### 1.2. Key Concepts in Cryptography

### 1.2.1 Key Exchange Mechanisms

Key exchange mechanisms are essential protocols or algorithms, that enable two or more parties to securely establish shared keys over an insecure communication channel like the internet. They allow entries to share secrets keys safely without the need to havae physically xchanged anything beforehand.

One of the most popular mechanism is **Diffie-Hellman key exchange** as it is found in security protocols such as TLS, SSH and IPsec. To implement it, two end users (or client and server) mutually agree on positive whole numbers p and q, such that p is a prime number and q is a generator of p. The generator is a number that, when raised to positive whole-number less than p, never produces the same result for any two such whole numbers. p may be large, but q is usually small.
Once the users have agreed on p and q in private, they choose positive whole-number personal keys a and b. Both must be less than the prime number modulus p. Next both parties compute public keys a* and b*, based on their personal keys:

$$a^* = (q ^ a)\mod p$$
$$b^* = (q ^ b)\mod p$$

Then the users share the public keys a* and b* over the communication channel, thati s insecure. From these public keys, a number x can be generate by either user on the basis of their own personal keys. They use each others calculated number:

$$ x = (b^*)\mod p = (a^*)\mod p$$

That way both parties manage to get the same number, but it is not shared in the insecure environment and so are the private keys. Afterwards the users can safely communicate using encryption methods of their choise using the decryption key x.

Although it seems, that the algorithm can be easily reversed, knowing q, p and a*, it is not that simple. In large numbers and especially, when p is large enough (like 2048 bits), then the brute force attack and other attack methods are not computionally feasible. That is because it should solve Discrete Lograrith Problem, for which there is currently no efficient algorithm.

An implementation of the Diffie-Hellman Algorithm, can be found in the diffie-hellman.ipynb file.

TODO if have time -> go into RSA algorithm

### 1.2.2 Cryptographic Hash Functions

A cryptographic has function is a mathematical function used in cryptography. Typically they take inputs of variable lengths to return outputs of a fixed length. Each input should have a unique output. There are many different algorithms, but we will look into two of them - MD5, old and currently flagged as insecure and SHA-256, which is commonly used and marked as one of the best ones.

- MD5 

MD5 takes the message as input of any length and changes it into a fixed-length message of 16 bytes. It was commonly used for authentication. Let's look into how the algorithm works.

The first step of the algorithm is to add padding bits to the original message. The new length of the message should be 64 bits less than the exact multiple of 512, i.e. if we have 600 bits, we should add 360 bits, so that the length is 960, which is 1024 - 64. The added bits should be in the following order - the first is '1' and all the rest are zeroes.

The next step is to add the length of the original message as 64-bit. For example if the length is 24, we would add 00000000 seven times and then 00011000 (24 in binary). This is performed to increase the complexity of the function.

Then we initialize the 4 32-bit buffers, that are predefined in the MD5 algorithm. They are constant values, in order to ensure, that the starting point is always the same.

$$J = 0x67425301$$
$$K = 0xEDFCBA45$$
$$L = 0x98CBADFE$$
$$M = 0x13DCE476$$

The final step is to process each 512-bit block. A total of 64 operations are performed in 4 rounds, each containing 16 operations. A different function is applied each round. For each function 3 of the above mentioned buffers are used. We update the buffers on each operation. Here are the functions, that are performed:

\begin{align*}
    F(K, L, M) &= (K \land L) \lor (\neg K \land M) \\
    G(K, L, M) &= (K \land L) \lor (L \land \neg M) \\
    H(K, L, M) &= K \oplus L \oplus M \\
    I(K, L, M) &= L \oplus (K \lor \neg M)
\end{align*}

Finally after performing all the operations, we concatenate all the newly created buffers, which are 4 byte each and we end up with 16 byte hash.

After all MD5 seems like a good hashing algorithm, hard to crack, but there are a couple of disadvantages to it:
- Sometimes MD5 generates the same hash function for different inputs, which makes it not collision resistant
- MD5 allows for fast computation and that allows for easier and faster brute force attacks
- There are many rainbow tables with billions of hashes, which makes the most commonly hashed values easily accessible to attackers

TODO if have time investigate the disadvantages

In [16]:
inputstring = "Hello world"
output = hashlib.md5(inputstring.encode())

print("Hash of the input string:")
print(output.hexdigest())

Hash of the input string:
3e25960a79dbc69b674cd4ec67a72c62


- SHA-256

The SHA-256 algorithm plays a pivotal role in the modern digital security and data integrity. It produces a fixed-sized output of 32 bytes (256 bits). It's nature is very simillar to the MD5 algorithm, but uses more complex functions, more buffers and produces bigger result. That makes it's computation a bit slow, but it's security a lot higher.

During the algorithms, first the initial hash value (buffers) must be initialized. They are with constant values:

\begin{align*}
H_0 &= \texttt{0x6a09e667} \\
H_1 &= \texttt{0xbb67ae85} \\
H_2 &= \texttt{0x3c6ef372} \\
H_3 &= \texttt{0xa54ff53a} \\
H_4 &= \texttt{0x510e527f} \\
H_5 &= \texttt{0x9b05688c} \\
H_6 &= \texttt{0x1f83d9ab} \\
H_7 &= \texttt{0x5be0cd19}
\end{align*}

Then we prepare the initial input by appending '1' bit and then appending '0' bits until the message is 64 bits less than an exact multiple of 512. Finally simillar to the MD5, the length of the original message gets appended to the end as 64 bit integer.

Afterwards simillar to MD5, we divide the message into 512-bit blocks and each of them is further divided into sixteen 32-bit words. Then comes the algorithm of SHA-256, which is a lot more complex than the one of MD5. More information on the complex algorithm can be found in [9].

After the algorithm is finished, we concatenate all the buffers and end up with the final hashed product.


In [17]:
inputstring = "Hello world"
output = hashlib.sha256(inputstring.encode())

print("Hash of the input string:")
print(output.hexdigest())

Hash of the input string:
64ec88ca00b268e5ba1a35678a1b5316d212f4f366b2477232534a8aeca37f3c


### 1.2.3 Digital Signatures

Digital signatures are the public-key primitive of message authentication. In the real world, we use handwritten signatures and similarly, a digital signature is a technique, that binds an entity to the digital data. The binding can be independently verified by the receiver as well as any third party.Digital signatures represent a cryptographic value, hat is calculated from the data and a secret key known only by the signer.

They most commonly involve using a public-key cryptography. The signer has two keys at his disposal - private and public and only he knows what his private key is and shouldn't share it with anyone. Whenever he sends the data to another person, then the other person can verify him by using his public key.

<img src="./pictures/digital_signature_process.png" alt="drawing" width="600"/>

The signer prepares the data, that needs to be sent securely and uses a hashing function to create a hash value from the data. He encrypts the hash value with the private key using a signature algorithms. Then he sends the data and the public key signature.

The verifier receives the data nad the digital signature from the signer. He uses the same hashing function to generate a hash value from the received data. He uses the public key and the verification algorithm to verify the digital signature. It involves decrypting the signature with the public key to obtain the original hash value. Then he checks if the hash he received is the same as the decrypted one and if they are a match, then the signature is genuine.

### 1.2.4 Message Authentication Codes

TODO Explain what MAC's are

### 1.2.5 Ciphe suite

TODO explain cipher suite, what they are and how they combine most of the above

# 2. TLS Protocol

In order to establish a secure communication between two sides in an unsecure environment like the internet, we need to use all of the above mentioned principles and security measures.


Cryptographic protocols are sets of rules, that dictate how algorithms and cryptographic keys should be used to achieve secure communications. These protocols ensure the data is encrypted, transmitter securely and deccrypted only by the intended recipient.

## 2.1 Overview of the TLS Protocol

There are many different protocols, that are used for such purposes, but the most commonly seen is TLS (Transport Layer Security). It is a direct successor of SSL (Secure Sockets Layer), which is no longer used. TLS is widely used in the internet, since almost all of the websited, that use HTTPS rely on it. It encrypts the data, that is transmitted and no one can intercept it. It will only be able to be read from the intented recipient. TLS Uses all of the above mentioned key concepts - symmetric and asymmetric encryption, key exchange mechanisms (like Diffie-Hellman), cryptographic hash functions (like SHA-256) and digital signatures.

### 2.1.1 Introduction to TLS

TLS is a critical protocol for ensuring secure communication over the internet. It is developed as the successor to SSL and enchances the security of the data transmitted between web servers and clients. It is designed to prevent any kind of forgery in client-server communication. It is used in every single part of the internet - instant messaging, email, web browsing and everything, that uses the HTTPS protocol.

The protocol operates between the network and application layeer in the OSI model, enabling it to manager security independently of application protocols. This results in easy integration of the protocol by the developers without any changes to the application logic.

TLS is very significant in the modern digital landscape. It not only provides tools for secure communication, but also mechanisms for authenticating both the sender and the receiver. This ensures, that the data doesn't get intercepted or altered during the transmission and the parties in the communication are the ones, they claim to be.

### 2.1.2 Key Features of TLS

TLS is essential for safeguarding data transmitted over the internet, ensuring secure communication between web servers and clients. In order to guarantee the safety, it must implement several key features.

- Encryption

TLS can use both symmetric and asymmetric encryption to provide security for data in transit. Symmetric encryption uses a single key to encrypt and decrypt information, enabling fast and efficient data handling. One of the most popular such algorithms is AES (Advanced Encryption Standart). Asymmetric encryption, on theo ther hand, uses a pair of keys - public and private, adding an additional layer of security. A popular such algorithm is RSA (Rivest-Shamir-Adleman).

- Integrity

One key feature of a protocol like TLS is to have mechanisms to verify, that the data sent is the data received. It is managed through mechanisms like MAC (Message Authentication Codes) and cryptographic hash functions. They prevent any alteration of the data during transit, whether by accident or with malicious intent, thus enssuring the information remains unchanged from the sender to the receiver.

- Authentication

Authentication is a key feature of TLS by providing means to verify the integrity of the parties involved in the communicaiton. This is achieved through digital certificates, which are issued and verified by trusted Certificate Authorities. That way TLS ensures, that the users are actually communicating with the servers they believe are connected to, which prevents any impersonation attemps and fraudulent activities.

- Non-Repudiation

Non-repudiation is a security principle, that ensures a party in the transaction cannot deny the authenticity of their signature on a message, that they send. It is particularly iimportant in legal and financial environments, where proof of actions may be required. It is again achieved by using digital signatures.

### 2.1.3 TLS Protocol Structure and Layers

TLS is composed of several protocols, each serving a distinct purpose in the secure communicaiton between the parties.

- Record Protocol

  The Record Protocol layers on top of a reliable connection-oriented transport, such as TCP. It provides data confidentiality using symmetric key cryptography and data integrity using a keyed Message Authentication Checksum (MAC). The keys are generated uniquely for each session based on the security parameters agreed during the TLS handshake. The Record Protocol is also used for encapsulating various upper layer protocols – most notably the TLS Handshake Protocol – in which case it can be used without encryption or message authentication. It performs the following operations in order:

  1. Read messages for transmit.
  1. Fragment messages into manageable chunks of data.
  1. Compress the data, if compression is required and enabled.
  1. Calculate a MAC.
  1. Encrypt the data.
  1. Transmit the resulting data to the peer.

  On the receiver side, the same operations are performed, but in reverse order:

  1. Read received data from the peer.
  1. Decrypt the data.
  1. Verify the MAC.
  1. Decompress the data, if compression is required and enabled.
  1. Re-assemble the message fragments.
  1. Deliver the message to upper protocol layers.

- Handshake Protocol

  The Handshake Protocol is the most complex part of TLS. It is responsible for the initial negotiation between the client and server, when a connection is first established. It ensures, that both parties have agreed on the security parameters and have exchanged the necessary keys securely, in order to begin their reliable connection. This protocol will be looked in depth in section 2.1.4.

- Alert Protocol

  The Alert Protocol is there to allow signals to be sent between peers. These signals are mostly used to inform the peer about the cause of a protocol failure. Some of these signals are used internally by the protocol and the application protocol does not have to cope with them, and others refer to the application protocol solely. An alert signal includes a level indication which may be either fatal or warning. Fatal alerts always terminate the current connection, and prevent future re-negotiations using the current session ID. Under the newest version of TLS - 1.3, all alerts are fatal and terminate the connection.

  Some of the critical alerts are:
  - `unexpected_message`: An inappropriate message was received.
  - `bad_record_mac`: An incorrect MAC was received.
  - `decompression_failure`: The decompression function received improper input.
  - `handshake_failure`: Sender was unable to negotiate an acceptable set of security parameters given the options available.
  - `illegal_parameter`: A field in a handshake message was out of range or inconsistent with other fields.
  - `close_notify`: Notifies the recipient that the sender will not send any more messages on this connection. Each party is required to send a close_notify alert before closing the write side of a connection.
  - `no_certificate`: May be sent in response to a certificate request if no appropriate certificate is available.
  - `bad_certificate`: A received certificate was corrupt (e.g., contained a signature that did not verify).



- Change Cipher Spec Protocol

  The Change Cipher Spec Protocol is a critical component of the TLS suite, as it is designed to signal transitions in security specification during a TLS session. When used, it indicates a change in the encryption and MAC settings, that are used in the secure session. This change usually occurs after the handshake is done, but before the finalizaiton of the security settings. After sending/receiving the Cipher Spec Message, both the client and the server must update their encryption parameter to begin encryping and decrypting the messages using the negotiated settings.

  It marks the final part of the negotiation phase of a TLS session. Once all the information for the session has been exchanged and the handshake has been performed, then both the server and the client send the Change Cipher Spec message. Afterwards both parties send a 'Finished' message, which is the first encrypted data, sent using the new security settings. It serves to confirm, that the whole process has been successful and no tampering has occured.

### 2.1.4 The TLS Handshake Process

  We will look into TLS 1.2 handshake process. In TLS 1.3 the process is a bit simplified, having only one round trip between the client and the server. That is because TLS 1.2 uses a smaller number of cipher suites.

  <img src="./pictures/tls_handshake.png" alt="drawing" width="600"/>

  In order to begin a TLS communication, the client and server must go through a handshake process. It includes the following steps in the given order:

- **Client Hello** - The client sends the server information including the highest version of TLS that it supports and a list of the cipher suites that it supports. The cipher suite information includes cryptographic algorithms and key sizes.
- **Server Hello** - The server chooses the highest version of SSL and the best cipher suite that both the client and server support and sends this information to the client.
- **(Optional) Certificate** - The server sends the client a certificate or a certificate chain, typically beginning with the server's public key certificate and ending with the certificate authority's root certificate. This message is used whenever server authentication is required.
- **(Optional) Server Key Exchange** - The server sends the client a server key exchange message if the public key information from the Certificate is not sufficient for key exchange. For example, in cipher suites based on Diffie-Hellman (DH), this message contains the server's DH public key.
- **Server Hello Done** - The server tells the client that it is finished with its initial negotiation messages.
- **Client Key Exchange** - The client generates information used to create a key for symmetric encryption. For RSA, the client then encrypts this key information with the server's public key and sends it to the server. For cipher suites based on DH, this message contains the client's DH public key.
- **(Optional) Certificate Verify** - This message is sent by the client when the client presents a certificate as previously explained. Its purpose is to allow the server to complete the process of authenticating the client. When this message is used, the client sends information that it digitally signs using a cryptographic hash function. When the server decrypts this information with the client's public key, the server is able to authenticate the client.
- **Change Cipher Spec** - The client sends a message telling the server to change to encrypted mode.
- **Finished** - The client tells the server that it is ready for secure data communication to begin.
- **Change Cipher Spec** - The server sends a message telling the client to change to encrypted mode.
- **Finished** - The server tells the client that it is ready for secure data communication to begin. This is the end of the TLS handshake.

After the handshake is performed, now the cliend and server transfer encrypted data between one another. At the nd of the connection each side sends a close_notify alert to inform the peer, that the connection is closed.

  

## 2.2 Certificate Authorities and Trust Models

  A certificate authority (CA) is a trusted organization that issues digital certificates for websites and other entities. CAs validate a website domain and, depending on the type of certificate, the ownership of the website, and then issue TLS/SSL certificates that are trusted by web browsers like Chrome, Safari and Firefox. Thus, CAs help keep the internet a safer place by verifying websites and other entities to enable more trust in online communications and transactions.

  A certificate authority is a company or organization that acts to validate the identities of entities (such as websites, email addresses, companies, or individual persons) and bind them to cryptographic keys through the issuance of electronic documents known as digital certificates.

  A digital certificate provides:

  - Authentication, by serving as a credential to validate the identity of the entity that it is issued to.
  - Encryption, for secure communication over insecure networks such as the internet.
  - Integrity of documents signed with the certificate so that they cannot be altered by a third party in transit.

  These certificates allow secure, encrypted communication between two parties through public key cryptography. The CA verifies the certificate applicant’s identity and issues a certificate containing their public key. The CA will then digitally sign the issued certificate with their own private key which establishes trust in the certificate’s validity.

  When requesting a certificate from a CA, the applicant first generates a public and private key pair. The private key should remain under the applicant’s sole control and ownership. The applicant then sends a certificate signing request (CSR) containing their public key and other identifying details to the CA through an online form.

  Next, the CA will take steps to validate the applicant’s identity and the right to claim credentials such as domain names for server certificates or email addresses for email certificates in the CSR.   If validation is successful, the CA issues the certificate containing the details and public key from the CSR. The CA digitally signs the issued certificate with their own private key to confirm they verified the identity.


# 3. Implementation and comparison

## 3.1 Using Existing Code Libraries

- **Overview of TLS Libraries**: Description of commonly used libraries such as OpenSSL, BoringSSL, and others.
- **Advantages of Using Libraries**: Discuss the benefits including reliability, community support, and compliance with standards.
- **Integration Examples**: Show how these libraries can be integrated into existing projects.


## 3.2 Implementing Own TLS Components

- **Challenges of Implementation**: Discuss the complexities involved in developing custom cryptographic protocols.
- **Component Development**: Detailed process of developing key components such as encryption, key exchange, and certificate handling from scratch.


## 3.3 Comparison Between Using Libraries and Custom Implementation

- **Performance Analysis**: Compare the performance of existing libraries with the custom implementations in terms of speed and resource usage.
- **Security Assessment**: Evaluate the security strengths and vulnerabilities of each approach.
- **Use Case Suitability**: Analyze which approach is more suitable for different types of applications and environments.


# 4. Conclusion

# Bibliography


1. https://www.geeksforgeeks.org/cryptography-and-its-types/
1. https://en.wikipedia.org/wiki/Symmetric-key_algorithm
1. https://en.wikipedia.org/wiki/Public-key_cryptography
1. https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange
1. https://en.wikipedia.org/wiki/Elliptic-curve_Diffie%E2%80%93Hellman
1. https://www.tutorialspoint.com/cryptography/cryptography_hash_functions.htm
1. https://www.geeksforgeeks.org/what-is-the-md5-algorithm/
1. https://www.upgrad.com/blog/sha-256-algorithm/
1. https://blog.boot.dev/cryptography/how-sha-2-works-step-by-step-sha-256/
1. https://www.tutorialspoint.com/cryptography/cryptography_digital_signatures.htm
1. https://www.oreilly.com/library/view/the-ims-ip/9780470019061/9780470019061_tls_record_protocol.html
1. https://www.gnutls.org/manual/html_node/The-TLS-Alert-Protocol.html
1. https://www.ibm.com/docs/en/sdk-java-technology/8?topic=handshake-tls-12-protocol
