In [3]:
import hashlib

# SSL/TLS Protocols

#### by Andon Gorchov (@thunderman913)

## 1 Fundamentals of Cryptography

Cryptography is a technique of securing information and communications through the use of codes so that only those persons for whom the information is intended can understand and process it. Thus preventing unauthorized access to information. The prefix “crypt” means “hidden” and the suffix “graphy” means “writing”. In Cryptography, the techniques that are used to protect information are obtained from mathematical concepts and a set of rule-based calculations known as algorithms to convert messages in ways that make it hard to decode them. These algorithms are used for cryptographic key generation, digital signing, and verification to protect data privacy, web browsing on the internet and to protect confidential transactions such as credit card and debit card transactions. [1]

It has the following features:
- Confidentiality: The communicated information can only be accessed by the person for whom it is intended and nobody else should be able to access it.
- Integrity: The received information must remain unaltered, accurate and exact.
- Non-repudiation: After sending the message/information, the sender cannot revert it later on. That provides evidence of the communication and is crucial for situations, where accountability and legal disputes arise.
- Authentication: This is a mechanism used to verify the identity of the user, system or entity.
- Interoperability: It allows for secure communication between different systems and platforms.
- Adaptability: Cryptography must continuously evolve and improve to stay ahead of any possible security threats, since one security breach could be fatal.

### 1.1 Encryption Types

- **Symmetric Encryption**

Symmetric-key algorithms are algorithms for cryptography that use the same cryptographic keys for both the encryption of plaintext and the decryption of ciphertext. The keys may be identical, or there may be a simple transformation to go between the two keys. The keys, in practice, represent a shared secret between two or more parties that can be used to maintain a private information link. [2]

<img src="./pictures/symmetric_encryption.png" alt="drawing" width="600"/>

- **Asymmetric Encryption**

Asymmetric encryption, also known as public-key encryption, is the field of cryptographic systems that use pairs of related keys. Each key pair consists of a public key and a corresponding private key. Key pairs are generated with cryptographic algorithms based on mathematical problems termed one-way functions. Security of public-key cryptography depends on keeping the private key secret; the public key can be openly distributed without compromising security.

In a public-key encryption system, anyone with a public key can encrypt a message, yielding a ciphertext, but only those who know the corresponding private key can decrypt the ciphertext to obtain the original message. [3]

<img src="./pictures/asymmetric_encryption.png" alt="drawing" width="600"/>

### 1.2 Key Concepts in Cryptography

- **Key Exchange Mechanisms**

Key exchange mechanisms are essential protocols or algorithms, that enable two or more parties to securely establish shared keys over an insecure communication channel like the internet. They allow entries to share secrets keys safely without the need to havae physically xchanged anything beforehand.

One of the most popular mechanism is **Diffie-Hellman key exchange** as it is found in security protocols such as TLS, SSH and IPsec. To implement it, two end users (or client and server) mutually agree on positive whole numbers p and q, such that p is a prime number and q is a generator of p. The generator is a number that, when raised to positive whole-number less than p, never produces the same result for any two such whole numbers. p may be large, but q is usually small.
Once the users have agreed on p and q in private, they choose positive whole-number personal keys a and b. Both must be less than the prime number modulus p. Next both parties compute public keys a* and b*, based on their personal keys:

$$a^* = (q ^ a)\mod p$$
$$b^* = (q ^ b)\mod p$$

Then the users share the public keys a* and b* over the communication channel, thati s insecure. From these public keys, a number x can be generate by either user on the basis of their own personal keys. They use each others calculated number:

$$ x = (b^*)\mod p = (a^*)\mod p$$

That way both parties manage to get the same number, but it is not shared in the insecure environment and so are the private keys. Afterwards the users can safely communicate using encryption methods of their choise using the decryption key x.

Although it seems, that the algorithm can be easily reversed, knowing q, p and a*, it is not that simple. In large numbers and especially, when p is large enough (like 2048 bits), then the brute force attack and other attack methods are not computionally feasible. That is because it should solve Discrete Lograrith Problem, for which there is currently no efficient algorithm.

An implementation of the Diffie-Hellman Algorithm, can be found in the diffie-hellman.ipynb file.

TODO if have time -> go into RSA algorithm

- **Cryptographic Hash Functions**

A cryptographic has function is a mathematical function used in cryptography. Typically they take inputs of variable lengths to return outputs of a fixed length. Each input should have a unique output. There are many different algorithms, but we will look into two of them - MD5, old and currently flagged as insecure and SHA-256, which is commonly used and marked as one of the best ones.

- MD5 

MD5 takes the message as input of any length and changes it into a fixed-length message of 16 bytes. It was commonly used for authentication. Let's look into how the algorithm works.

The first step of the algorithm is to add padding bits to the original message. The new length of the message should be 64 bits less than the exact multiple of 512, i.e. if we have 600 bits, we should add 360 bits, so that the length is 960, which is 1024 - 64. The added bits should be in the following order - the first is '1' and all the rest are zeroes.

The next step is to add the length of the original message as 64-bit. For example if the length is 24, we would add 00000000 seven times and then 00011000 (24 in binary). This is performed to increase the complexity of the function.

Then we initialize the 4 32-bit buffers, that are predefined in the MD5 algorithm. They are constant values, in order to ensure, that the starting point is always the same.

$$J = 0x67425301$$
$$K = 0xEDFCBA45$$
$$L = 0x98CBADFE$$
$$M = 0x13DCE476$$

The final step is to process each 512-bit block. A total of 64 operations are performed in 4 rounds, each containing 16 operations. A different function is applied each round. For each function 3 of the above mentioned buffers are used. We update the buffers on each operation. Here are the functions, that are performed:

\begin{align*}
    F(K, L, M) &= (K \land L) \lor (\neg K \land M) \\
    G(K, L, M) &= (K \land L) \lor (L \land \neg M) \\
    H(K, L, M) &= K \oplus L \oplus M \\
    I(K, L, M) &= L \oplus (K \lor \neg M)
\end{align*}

Finally after performing all the operations, we concatenate all the newly created buffers, which are 4 byte each and we end up with 16 byte hash.

After all MD5 seems like a good hashing algorithm, hard to crack, but there are a couple of disadvantages to it:
- Sometimes MD5 generates the same hash function for different inputs, which makes it not collision resistant
- MD5 allows for fast computation and that allows for easier and faster brute force attacks
- There are many rainbow tables with billions of hashes, which makes the most commonly hashed values easily accessible to attackers

TODO if have time investigate the disadvantages

In [14]:
inputstring = "Hello world"
output = hashlib.md5(inputstring.encode())

print("Hash of the input string:")
print(output.hexdigest())

Hash of the input string:
3e25960a79dbc69b674cd4ec67a72c62


- SHA-256

The SHA-256 algorithm plays a pivotal role in the modern digital security and data integrity. It produces a fixed-sized output of 32 bytes (256 bits). It's nature is very simillar to the MD5 algorithm, but uses more complex functions, more buffers and produces bigger result. That makes it's computation a bit slow, but it's security a lot higher.

During the algorithms, first the initial hash value (buffers) must be initialized. They are with constant values:

\begin{align*}
H_0 &= \texttt{0x6a09e667} \\
H_1 &= \texttt{0xbb67ae85} \\
H_2 &= \texttt{0x3c6ef372} \\
H_3 &= \texttt{0xa54ff53a} \\
H_4 &= \texttt{0x510e527f} \\
H_5 &= \texttt{0x9b05688c} \\
H_6 &= \texttt{0x1f83d9ab} \\
H_7 &= \texttt{0x5be0cd19}
\end{align*}

Then we prepare the initial input by appending '1' bit and then appending '0' bits until the message is 64 bits less than an exact multiple of 512. Finally simillar to the MD5, the length of the original message gets appended to the end as 64 bit integer.

Afterwards simillar to MD5, we divide the message into 512-bit blocks and each of them is further divided into sixteen 32-bit words. Then comes the algorithm of SHA-256, which is a lot more complex than the one of MD5. More information on the complex algorithm can be found in [9].

After the algorithm is finished, we concatenate all the buffers and end up with the final hashed product.


In [13]:
inputstring = "Hello world"
output = hashlib.sha256(inputstring.encode())

print("Hash of the input string:")
print(output.hexdigest())

Hash of the input string:
64ec88ca00b268e5ba1a35678a1b5316d212f4f366b2477232534a8aeca37f3c


- **Digital Signatures**
  - **Mechanisms for authentication and integrity**: Provides a means to verify the authenticity of digital messages or documents.
  - **Algorithm examples and their operational mechanisms**: Detailed look into how these algorithms function and are applied in rea3-world scenarios.

### 1.3 Principles of Secure Communications

### 1.3.1 Cryptographic Protocols and Their Uses

- **Overview**: Protocols such as TLS utilize the cryptographic tools described above to secure communications across networks.
- **Real-world applications**: Necessity for secure communications is paramount in applications such as web browsing, secure file transfers, and email.

### 1.3.2 Public Key Infrastructure (PKI)

- **Role and structure**: Manages digital certificates and encryption keys to provide secure communications.
- **Certificate Authorities (CA)**: Issues and manages security credentials and public keys for digital certificates.



# 2. SSL/TLS Protocol Analysis

## 2.1 Overview of the SSL/TLS Protocol

### 2.1.1 Protocol Structure and Layers

- **Breakdown**: Detailed overview of the SSL/TLS protocol stack.
- **Function and purpose of each layer**: Includes the Record Protocol, Handshake Protocol, among others.


### 2.1.2 The SSL/TLS Handshake

- **Detailed analysis of the handshake phases**:
  - **ClientHello, ServerHello**: Initial communication stages where parameters are negotiated.
  - **Server certificate and key exchange**: Server provides its certificate and optionally a key exchange method.
  - **Client key exchange**: Client responds with its key exchange data.
  - **Certificate verification**: Authentication of the server's certificate.
  - **Completion of the handshake**: Change Cipher Spec and Finished messages finalize the secure connection setup.

### 2.1.3 Session Establishment and Data Transmission

- **Establishing a secure connection**: Process of using negotiated keys for a secure communication session.
- **Symmetric key encryption for data transfer**: Mechanism to encrypt and decrypt messages using symmetric keys.

## 2.2 Certificate Authorities and Trust Models


### 2.2.1 Role of Certificate Authorities (CAs)
- **Contribution to security in SSL/TLS**: How CAs underpin the trust model by issuing and managing digital certificates.


### 2.2.2 Mathematical Models of Trust

- **Algorithms used for verifying certificate authenticity**: Examination of the algorithms that ensure a certificate is valid and trustworthy.
- **Analysis of trust models in digital communications**: Discuss how trust is established and maintained in cryptographic protocols.


# 3. Implementation and comparison

## 3.1 Using Existing Code Libraries

- **Overview of SSL/TLS Libraries**: Description of commonly used libraries such as OpenSSL, BoringSSL, and others.
- **Advantages of Using Libraries**: Discuss the benefits including reliability, community support, and compliance with standards.
- **Integration Examples**: Show how these libraries can be integrated into existing projects.


## 3.2 Implementing Own SSL/TLS Components

- **Challenges of Implementation**: Discuss the complexities involved in developing custom cryptographic protocols.
- **Component Development**: Detailed process of developing key components such as encryption, key exchange, and certificate handling from scratch.


## 3.3 Comparison Between Using Libraries and Custom Implementation

- **Performance Analysis**: Compare the performance of existing libraries with the custom implementations in terms of speed and resource usage.
- **Security Assessment**: Evaluate the security strengths and vulnerabilities of each approach.
- **Use Case Suitability**: Analyze which approach is more suitable for different types of applications and environments.


# 4. Conclusion

# Bibliography


1. https://www.geeksforgeeks.org/cryptography-and-its-types/
2. https://en.wikipedia.org/wiki/Symmetric-key_algorithm
3. https://en.wikipedia.org/wiki/Public-key_cryptography
4. https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange
5. https://en.wikipedia.org/wiki/Elliptic-curve_Diffie%E2%80%93Hellman
6. https://www.tutorialspoint.com/cryptography/cryptography_hash_functions.htm
7. https://www.geeksforgeeks.org/what-is-the-md5-algorithm/
8. https://www.upgrad.com/blog/sha-256-algorithm/
9. https://blog.boot.dev/cryptography/how-sha-2-works-step-by-step-sha-256/