# Data Structure

## Primitive Data Structures

üí° Data can be stored, processed or retrieved on various different ways. Both basic and advanced types of data structures are employed in nearly every program or software system developed. Therefore, having a strong understanding of data structures is crucial to better understand how can with deal with data and computers.

üéØ Let's see how they work under the hood!

<img src="files/data_structure.webp" source="https://medium.com/swlh/data-structures-the-basics-dc356cb97111"></img>

In [None]:
import pandas as pd
import numpy as np

### Integer

An integer is a whole number that can be positive, negative, or zero and does not have any decimal points.

### How is an Integer Stored in the Binary System?

In the binary system, integers are represented using a sequence of bits (0s and 1s). The binary system is base-2, meaning each position in the sequence represents a power of 2. For example:

- The binary number `1011` represents the decimal number `11` because:
- $1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0$
- $1 \times 8 + 0 \times 4 + 1 \times 2 + 1 \times 1 = 8 + 0 + 2 + 1 = 11$

### Signed and unsigned integer

In computer science, integers can be represented as either signed or unsigned. The distinction between signed and unsigned integers lies in how the bits are interpreted to represent positive and negative values.

#### Unsigned Integer

An unsigned integer can only represent non-negative values (zero and positive values). An octet (8-bit) ranges from 0 to 255 (so 256 different values).

#### Example:

For an 8-bit unsigned integer:

In [None]:
np.uint8(0b000000000)

In [None]:
np.uint8(0b11111111)

#### Signed Integer

A signed integer can represent both positive and negative values, including zero. The most significant bit (MSb) is used to indicate the sign of the number. It's usually the leftmost (index 0) bit.

- If the MSb is `0`, the number is positive.
- If the MSb is `1`, the number is negative.

(MSB refers also to Most Significant "Byte", and a "Byte" is, most of the time equal to an octet, but... not always!).

In that case an octet can range from -128 to 127. So 256 different values.

#### "Naive" Signed Notation (not used)

A "naive" notation could say that whenever the first digit is set to 1, then the number is negative.
In that case the number :

- `0b10000001` would be equal to -1.
- And `0b10000010` would be -2.

But the naive notation has two drawbacks:

- First, there's two representations of the number 0: `0b00000000` and `0b10000000`.
- Second, the addition doesn't work anymore with the common rules.

| Unsigned Decimal | Naive Signed Notation    | Decimal |
|------------------|--------------------------|---------|
| 3                |   00000011               | 3       |
| + 132            | + 10000100               | + -4    |
| = 135            | = 10000111               | **= -1 BUT ‚Üí -7 in binary !!!** |

#### Standard Signed notation (Two's complement)

Two's complement is the most common method of representing signed (positive, negative, and zero) integers on computers. It consists in inverting (or flipping) all bits ‚Äì changing every 0 to 1, and every 1 to 0 (which is like using a NOT bitwise operator).


That's why signed integers are typically represented using two's complement notation, which allows for a straightforward way to perform arithmetic operations.

| Unsigned Decimal | Standard Signed Notation   | Decimal |
|------------------|----------------------------|---------|
| 3                | 00000011                   | 3       |
| + 132            | + 11111100                 | + -4    |
| = 135            | = 11111111                 | **= -1 for both decimal and binary** |

In [None]:
np.array(0b10000000).astype(np.int8)

In [None]:
np.array(0b0000000).astype(np.int8)

In [None]:
np.array(0b11111111).astype(np.int8)

In [None]:
np.array(0b01111111).astype(np.int8)

### Comparison between sign and unsigned integers

| Binary Representation   | Signed Integer | Unsigned Integer |
|-------------------------|----------------|------------------|
| `0b00000000`            | 0              | 0                |
| `0b00000001`            | 1              | 1                |
| `0b01111111`            | 127            | 127              |
| `0b10000000`            | -128           | 128              |
| `0b11111111`            | -1             | 255              |

#### Overflow example

If you add two 8 bit binary together, it may result unexpected outputs.

In [None]:
np.int8(65) + np.int8(62)

In [None]:
np.int8(65) + np.int8(63)

Why? Because:

```python
  0b01000001 # (65 in decimal)
+ 0b00111111 # (63 in decimal)
= 0b10000000 # (-128 in decimal)
```

Same with unsigned integers.

In [None]:
np.uint8(250) + np.uint8(250)

```python
  0b11111010 # (250 in decimal)
+ 0b11111010 # (250 in decimal)
= 0b111110100 # a 9 bit number! Impossible!
= 0b11110100 # So the first bit is discarded, created the number 244 in decimal
``` 

### 32 bits vs 64 bits

In Data Science, most technologies use 32 bits or 64 bits signed integer.

| Type        | Bits | Min Value                  | Max Value                  |
|-------------|------|----------------------------|----------------------------|
| Signed      | 32   | -2,147,483,648             | 2,147,483,647              |
| Unsigned    | 32   | 0                          | 4,294,967,295              |
| Signed      | 64   | -9,223,372,036,854,775,808 | 9,223,372,036,854,775,807  |
| Unsigned    | 64   | 0                          | 18,446,744,073,709,551,615 |


### Float numbers

Python manages float numbers using the [IEEE754](https://en.wikipedia.org/wiki/IEEE_754) standard for floating-point arithmetic.

### IEEE 754 Standard

The IEEE 754 standard specifies the format for floating-point numbers, which includes:

1. **Sign Bit:** A single bit that indicates the sign of the number (0 for positive, 1 for negative).
2. **Exponent:** A field that represents the exponent of the number.
3. **Mantissa (or Significand or Coefficient):** A field that represents the significant digits of the number.

$ \text{value} = \text{sign} \times \text{mantissa} \times 2^{(\text{exponent} - \text{bias})}$

**Example :** $+1.25 \times 2^{-3}$

Here :
- **the sign** is positive.
- **the mantissa** is $1.25$.
- **the base** is $2$ (binary).
- **the exponent** is $-3$.



### Float Representation in Python

Default float format in Python is the ```float64``` format known as **double-precision** (64-bit). Float32 is called **single-precision**.

### Binary Encoding of Float Numbers

For a **64-bit floating-point number (double precision)**, the binary encoding is as follows:

1. **Sign Bit:** 1 bit
2. **Exponent:** 11 bits
3. **Mantissa** (fraction) **:** 52 bits

And for a **32-bit floating-point number (single precision)**, the binary encoding is as follows:

1. **Sign Bit:** 1 bit
2. **Exponent:** 8 bits
3. **Mantissa** (fraction) **:** 23 bits



### Example with a 32 bits float number

<img src='files/float32_binary_example.png'/>

### How is this computed ?

$ \text{value} = \text{sign} \times \text{mantissa} \times 2^{(\text{exponent} - \text{bias})}$

(with $ \text{sign} = \pm 1$ and $ \text{bias} = 2^{e-1} - 1$)

#### The Sign Bit

If it's "1", it's a negative number. Otherwise, it's a positive number. In this example the number is positive.

#### The Exponent

For a ```float32``` it's an octet that ranges from 0 to 255. In the example the exponent is $0b01111100$ so **124** in decimal. But, we're going to apply an exponent biais.

#### Exponent Bias

The exponent can be positive or negative. However, the usual representation of signed numbers (two's complement) would make the comparison between floating-point numbers a bit more difficult. To address this issue, the exponent is "biased" to store it as an unsigned number.

This bias is $2^{e-1} - 1$ (where $e$ represents the number of bits of the exponent); it is therefore a constant value once the number of bits $e$ is fixed (**127** for a 32 bits float and **1023** for a 64 bits float).

[source](https://fr.wikipedia.org/wiki/IEEE_754)

#### The Mantissa

The mantissa (also known as the significand) in the IEEE 754 floating-point representation is the part of the number that represents the significant digits.


| Binary Bit Position | Bit Value (2^n)  |   Decimal Contribution   |
|---------------------|------------------|--------------------------|
| 1                   | $2^{-1}$         | 0.5                      |
| 2                   | $2^{-2}$         | 0.25                     |
| 3                   | $2^{-3}$         | 0.125                    |
| 4                   | $2^{-4}$         | 0.0625                   |
| 5                   | $2^{-5}$         | 0.03125                  |
| 6                   | $2^{-6}$         | 0.015625                 |
| 7                   | $2^{-7}$         | 0.0078125                |
| 8                   | $2^{-8}$         | 0.00390625               |
| 9                   | $2^{-9}$         | 0.001953125              |
| 10                  | $2^{-10}$        | 0.0009765625             |
| 11                  | $2^{-11}$        | 0.00048828125            |
| 12                  | $2^{-12}$        | 0.000244140625           |
| 13                  | $2^{-13}$        | 0.0001220703125          |
| 14                  | $2^{-14}$        | 0.00006103515625         |
| 15                  | $2^{-15}$        | 0.000030517578125        |
| 16                  | $2^{-16}$        | 0.0000152587890625       |
| 17                  | $2^{-17}$        | 0.00000762939453125      |
| 18                  | $2^{-18}$        | 0.000003814697265625     |
| 19                  | $2^{-19}$        | 0.0000019073486328125    |
| 20                  | $2^{-20}$        | 0.00000095367431640625   |
| 21                  | $2^{-21}$        | 0.000000476837158203125  |
| 22                  | $2^{-22}$        | 0.0000002384185791015625 |
| 23                  | $2^{-23}$        | 0.00000011920928955078125|
   
**Total Decimal Value:** 0.9999997615728378

Example : 1010000000000000000000 = $2^{-1} + 2^{-3}$) = 0.625

##### Normalized and denormalized Mantissa

Once we've calculated the 23 bits (for a float32), we need to know is this number is normalized or denormalized. Denormalized numbers are used to better describe very small numbers.

If the exposant is different from zero (or different from $2^{e-1}$), the mantissa is said to be "normalized" meaning there's an implied "1" before the mantissa :

- Ex: the mantissa is **01000000000000000000000** (with an implied leading 1) it means so 1.01 in binary, which is 1.25 in decimal.

If the exposant is equal to zero (or different from $2^{e-1}$), the mantissa is said to be "denormalized" meaning there's no implied "1" before the mantissa :

- Ex: If the mantissa is **01000000000000000000000** (with no implied leading 1) it means so 0.01 in binary, which is 0.25 in decimal.

#### Conclusion about Float

If we take a look at the same float number than before:

<img src='files/float32_binary_example.png'/>

$ \text{value} = \text{sign} \times \text{mantissa} \times 2^{(\text{exponent} - \text{bias})}$

- Here the sign is zero, so it's a positive number.
- The exponent is $0b01111100$, so 124.
- The biais is 127 because it's a 32 bits float.
- The mantissa is equal to $0b01000000000000000000000$. As the exponent is not equal to 0 or $2^{e-1}$, it's a normalized number and the first "1" is implied, meaning that the mantissa is equal to $1.01$ in binary so $1.25$ in decimal.

In [None]:
# Let's make sure we did the maths right using the decimal notation!
1 * 1.25 * 2**(124 - 127) # 1.25 * 2**-3

In [None]:
# Let's make sure we did the maths right using the binary notation!
import numpy as np

binary_string = '00111110001000000000000000000000'

def binary_to_string(binary_string):
    # Convert the binary string to a bytes object
    bytes_list = [int(binary_string[i:i+8], 2) for i in range(0, 32, 8)] # 32 because float 32 bits, 8 because octet
    bytes_object = bytes(bytes_list)

    # Interpret the bytes object as a float32 in big-endian order ('>f4')
    return np.frombuffer(bytes_object, dtype='>f4')[0]

binary_to_string(binary_string)

### Float Exceptions

The IEEE 754 norm allows us to also have a representation for the following numbers (including the NaN -> Not a Number).


#### IEEE 754 Floating-Point Representation

| Type                  | Biased Exponent | Mantissa       |
|-----------------------|-----------------|----------------|
| Zeros                 | 0               | 0              |
| Denormalized Numbers  | 0               | Non-zero       |
| Normalized Numbers    | 1 to \(2^e - 2\)| Any            |
| Infinities            | \(2^e - 1\)     | 0              |
| NaNs                  | \(2^e - 1\)     | Non-zero       |

#### Exact Representations (without the Sign Bit)


| Type                                      | Exponent       | Mantissa                | Approximate Value       | Precision / Error       |
|-------------------------------------------|----------------|-------------------------|--------------------------|--------------------------|
| Zero                                     | 0000 0000      | 000 0000 0000 0000 0000 0000 | 0.0                      |                          |
| Smallest Denormalized Number              | 0000 0000      | 000 0000 0000 0000 0000 0001 | 1.4 √ó 10‚Åª‚Å¥‚Åµ             | 1.4 √ó 10‚Åª‚Å¥‚Åµ             |
| Next Denormalized Number                  | 0000 0000      | 000 0000 0000 0000 0000 0010 | 2.8 √ó 10‚Åª‚Å¥‚Åµ             | 1.4 √ó 10‚Åª‚Å¥‚Åµ             |
| Next Denormalized Number                  | 0000 0000      | 000 0000 0000 0000 0000 0011 | 4.2 √ó 10‚Åª‚Å¥‚Åµ             | 1.4 √ó 10‚Åª‚Å¥‚Åµ             |
| Another Denormalized Number               | 0000 0000      | 100 0000 0000 0000 0000 0000 | 5.9 √ó 10‚Åª¬≥‚Åπ             |                          |
| Largest Denormalized Number              | 0000 0000      | 111 1111 1111 1111 1111 1111 | 1.175 494 21 √ó 10‚Åª¬≥‚Å∏    |                          |
| Smallest Normalized Number               | 0000 0001      | 000 0000 0000 0000 0000 0000 | 1.175 494 35 √ó 10‚Åª¬≥‚Å∏    | 1.4 √ó 10‚Åª‚Å¥‚Åµ             |
| Next Normalized Number                   | 0000 0001      | 000 0000 0000 0000 0000 0001 | 1.175 494 49 √ó 10‚Åª¬≥‚Å∏    | 1.4 √ó 10‚Åª‚Å¥‚Åµ             |
| Almost Double                            | 0000 0001      | 111 1111 1111 1111 1111 1111 | 2.350 988 56 √ó 10‚Åª¬≥‚Å∏    | 1.4 √ó 10‚Åª‚Å¥‚Åµ             |
| Next Normalized Number                   | 0000 0010      | 000 0000 0000 0000 0000 0000 | 2.350 988 70 √ó 10‚Åª¬≥‚Å∏    | 1.4 √ó 10‚Åª‚Å¥‚Åµ             |
| Next Normalized Number                   | 0000 0010      | 000 0000 0000 0000 0000 0001 | 2.350 988 98 √ó 10‚Åª¬≥‚Å∏    | 2.8 √ó 10‚Åª‚Å¥‚Åµ             |
| Almost 1                                 | 0111 1110      | 111 1111 1111 1111 1111 1111 | 0.999 999 94             | 0.6 √ó 10‚Åª‚Å∑ = 2‚Åª¬≤‚Å¥       |
| 1                                        | 0111 1111      | 000 0000 0000 0000 0000 0000 | 1.000 000 00             |                          |
| Next Number after 1                      | 0111 1111      | 000 0000 0000 0000 0000 0001 | 1.000 000 12             | 1.2 √ó 10‚Åª‚Å∑ = 2‚Åª¬≤¬≥       |
| Almost Largest Number                    | 1111 1110      | 111 1111 1111 1111 1111 1110 | 3.402 823 26 √ó 10¬≥‚Å∏     |                          |
| Largest Normalized Number                | 1111 1110      | 111 1111 1111 1111 1111 1111 | 3.402 823 46 √ó 10¬≥‚Å∏     | 2 √ó 10¬≥¬π                 |
| Infinity                                 | 1111 1111      | 000 0000 0000 0000 0000 0000 | Infinity                 |                          |
| First Value (Denormalized) of NaN (signaling) | 1111 1111      | 000 0000 0000 0000 0000 0001 | NaN                      |                          |
| Normalized NaN (signaling)              | 1111 1111      | 010 0000 0000 0000 0000 0000 | NaN                      |                          |
| Last Value (Denormalized) of NaN (signaling) | 1111 1111      | 011 1111 1111 1111 1111 1111 | NaN                      |                          |
| First Value (Denormalized) of NaN (quiet) | 1111 1111      | 100 0000 0000 0000 0000 0000 | NaN                      |                          |
| Last Value (Denormalized) of NaN (quiet) | 1111 1111      | 111 1111 1111 1111 1111 1111 | NaN                      |                          |

#### A Few Things to consider

- There are two zeros: +0 and ‚àí0 (positive zero and negative zero), depending on the value of the sign bit;
- There are two infinities: +‚àû and ‚àí‚àû, depending on the value of the sign bit;
- Zeros and denormalized numbers have a biased exponent of -127 + 127 = 0; all bits of the "exponent" field are therefore 0;
- NaNs and infinities have a biased exponent of 128 + 127 = 255; all bits of the "exponent" field are therefore 1;
- NaNs can have a sign and a significand, but these have no meaning as real values (except for signaling, which can trigger an exception, and error correction).

In [None]:
# That's why in Python you can use infinity
float('inf')

In [None]:
# Or minus infinity
float('-inf')

In [None]:
# Positive zero
float(0.0)

In [None]:
# Negative zero
float(-0.0)

In [None]:
# Positive nan
float('nan')

In [None]:
# Negative nan
float('-nan')

In [None]:
# Let's make sure everything is right!
import struct

def float_to_binary(f):
    # Pack the float number into binary format
    packed_data = struct.pack('>f', f)
    # Unpack the binary data into an integer representation
    int_representation = struct.unpack('>I', packed_data)[0]
    # Convert the integer representation to a binary string
    binary_representation = bin(int_representation)[2:].zfill(32)
    return binary_representation

print(f"Binary representation of float('inf'):   {float_to_binary(float('inf'))}")
print(f"Binary representation of float('-inf'):  {float_to_binary(float('-inf'))}")
print(f"Binary representation of float('0.0'):   {float_to_binary(float('0.0'))}")
print(f"Binary representation of float('-0.0'):  {float_to_binary(float('-0.0'))}")
print(f"Binary representation of float('nan'):   {float_to_binary(float('nan'))}")
print(f"Binary representation of float('-nan'):  {float_to_binary(float('-nan'))}")

## Character (string)

### Character Encodings

For a computer each character has a value. The process of attributing characters to binary/ hexadecimal values is called "character encoding".

#### The original ASCII

The first standard to stand out is the ASCII (The American Standard Code for Information Interchange), in 1963 , which is a table that ranges from zero to 127 (7 bits).

#### Extended ASCII

When 8 bits became the standard, the Extended ASCII was created in 1981 using 8 bits and thus allowing for 128 new characters.


#### The original ASCII Table (from 0 to 127, left) and the Extended ASCII table (from 128 to 255, right).


| Decimal | Hexadecimal | Character | Description                  | Decimal | Hexadecimal | Character | Description                  |
|---------|-------------|-----------|------------------------------|---------|-------------|-----------|------------------------------|
| 0       | 0x00        | NUL       | Null character               | 128     | 0x80        | √á         | C cedilla                    |
| 1       | 0x01        | SOH       | Start of Heading             | 129     | 0x81        | √º         | u with diaeresis             |
| 2       | 0x02        | STX       | Start of Text                | 130     | 0x82        | √©         | e with acute accent          |
| 3       | 0x03        | ETX       | End of Text                  | 131     | 0x83        | √¢         | a with circumflex            |
| 4       | 0x04        | EOT       | End of Transmission          | 132     | 0x84        | √§         | a with diaeresis             |
| 5       | 0x05        | ENQ       | Enquiry                      | 133     | 0x85        | √†         | a with grave accent          |
| 6       | 0x06        | ACK       | Acknowledgment               | 134     | 0x86        | √•         | a with ring above            |
| 7       | 0x07        | BEL       | Bell                         | 135     | 0x87        | √ß         | c with cedilla               |
| 8       | 0x08        | BS        | Backspace                    | 136     | 0x88        | √™         | e with circumflex            |
| 9       | 0x09        | HT        | Horizontal Tab               | 137     | 0x89        | √´         | e with diaeresis             |
| 10      | 0x0A        | LF        | Line Feed                    | 138     | 0x8A        | √®         | e with grave accent          |
| 11      | 0x0B        | VT        | Vertical Tab                 | 139     | 0x8B        | √Ø         | i with diaeresis             |
| 12      | 0x0C        | FF        | Form Feed                    | 140     | 0x8C        | √Æ         | i with circumflex            |
| 13      | 0x0D        | CR        | Carriage Return              | 141     | 0x8D        | √¨         | i with grave accent          |
| 14      | 0x0E        | SO        | Shift Out                    | 142     | 0x8E        | √Ñ         | A with diaeresis             |
| 15      | 0x0F        | SI        | Shift In                     | 143     | 0x8F        | √Ö         | A with ring above            |
| 16      | 0x10        | DLE       | Data Link Escape             | 144     | 0x90        | √â         | E with acute accent          |
| 17      | 0x11        | DC1       | Device Control 1            | 145     | 0x91        | √¶         | ae ligature                  |
| 18      | 0x12        | DC2       | Device Control 2            | 146     | 0x92        | √Ü         | AE ligature                  |
| 19      | 0x13        | DC3       | Device Control 3            | 147     | 0x93        | √¥         | o with circumflex            |
| 20      | 0x14        | DC4       | Device Control 4            | 148     | 0x94        | √∂         | o with diaeresis             |
| 21      | 0x15        | NAK       | Negative Acknowledgment      | 149     | 0x95        | √≤         | o with grave accent          |
| 22      | 0x16        | SYN       | Synchronous Idle             | 150     | 0x96        | √ª         | u with circumflex            |
| 23      | 0x17        | ETB       | End of Transmission Block    | 151     | 0x97        | √π         | u with grave accent          |
| 24      | 0x18        | CAN       | Cancel                       | 152     | 0x98        | √ø         | y with diaeresis             |
| 25      | 0x19        | EM        | End of Medium                | 153     | 0x99        | √ñ         | O with diaeresis             |
| 26      | 0x1A        | SUB       | Substitute                   | 154     | 0x9A        | √ú         | U with diaeresis             |
| 27      | 0x1B        | ESC       | Escape                       | 155     | 0x9B        | √∏         | o with slash                 |
| 28      | 0x1C        | FS        | File Separator               | 156     | 0x9C        | ¬£         | Pound sign                   |
| 29      | 0x1D        | GS        | Group Separator              | 157     | 0x9D        | √ò         | O with slash                 |
| 30      | 0x1E        | RS        | Record Separator             | 158     | 0x9E        | √ó         | Multiplication sign         |
| 31      | 0x1F        | US        | Unit Separator               | 159     | 0x9F        | ∆í         | Florin                       |
| 32      | 0x20        | SP        | Space                        | 160     | 0xA0        | √°         | a with acute accent          |
| 33      | 0x21        | !         | Exclamation mark             | 161     | 0xA1        | √≠         | i with acute accent          |
| 34      | 0x22        | "         | Double quote                | 162     | 0xA2        | √≥         | o with acute accent          |
| 35      | 0x23        | #         | Number sign                  | 163     | 0xA3        | √∫         | u with acute accent          |
| 36      | 0x24        | $         | Dollar sign                  | 164     | 0xA4        | √±         | n with tilde                 |
| 37      | 0x25        | %         | Percent sign                 | 165     | 0xA5        | √ë         | N with tilde                 |
| 38      | 0x26        | &         | Ampersand                    | 166     | 0xA6        | ¬™         | Feminine ordinal indicator  |
| 39      | 0x27        | '         | Single quote                | 167     | 0xA7        | ¬∫         | Masculine ordinal indicator |
| 40      | 0x28        | (         | Left parenthesis            | 168     | 0xA8        | ¬ø         | Inverted question mark       |
| 41      | 0x29        | )         | Right parenthesis           | 169     | 0xA9        | ¬Æ         | Registered trademark         |
| 42      | 0x2A        | *         | Asterisk                     | 170     | 0xAA        | ¬¨         | Not sign                     |
| 43      | 0x2B        | +         | Plus sign                    | 171     | 0xAB        | ¬Ω         | One half                     |
| 44      | 0x2C        | ,         | Comma                        | 172     | 0xAC        | ¬º         | One quarter                  |
| 45      | 0x2D        | -         | Hyphen                       | 173     | 0xAD        | ¬°         | Inverted exclamation mark    |
| 46      | 0x2E        | .         | Period                       | 174     | 0xAE        | ¬´         | Left-pointing double angle quotation mark |
| 47      | 0x2F        | /         | Slash                       | 175     | 0xAF        | ¬ª         | Right-pointing double angle quotation mark |
| 48      | 0x30        | 0         | Digit 0                      | 176     | 0xB0        | ‚ñë         | Light shade                  |
| 49      | 0x31        | 1         | Digit 1                      | 177     | 0xB1        | ‚ñí         | Medium shade                 |
| 50      | 0x32        | 2         | Digit 2                      | 178     | 0xB2        | ‚ñì         | Dark shade                   |
| 51      | 0x33        | 3         | Digit 3                      | 179     | 0xB3        | ‚îÇ         | Box drawings light vertical  |
| 52      | 0x34        | 4         | Digit 4                      | 180     | 0xB4        | ‚î§         | Box drawings light down and left |
| 53      | 0x35        | 5         | Digit 5                      | 181     | 0xB5        | √Å         | A with acute accent          |
| 54      | 0x36        | 6         | Digit 6                      | 182     | 0xB6        | √Ç         | A with circumflex            |
| 55      | 0x37        | 7         | Digit 7                      | 183     | 0xB7        | √É         | A with tilde                 |
| 56      | 0x38        | 8         | Digit 8                      | 184     | 0xB8        | √Ñ         | A with diaeresis             |
| 57      | 0x39        | 9         | Digit 9                      | 185     | 0xB9        | √Ö         | A with ring above            |
| 58      | 0x3A        | :         | Colon                       | 186     | 0xBA        | √Ü         | AE ligature                  |
| 59      | 0x3B        | ;         | Semicolon                   | 187     | 0xBB        | √á         | C with cedilla               |
| 60      | 0x3C        | <         | Less than                    | 188     | 0xBC        | √ä         | E with circumflex            |
| 61      | 0x3D        | =         | Equals sign                  | 189     | 0xBD        | √ã         | E with diaeresis             |
| 62      | 0x3E        | >         | Greater than                 | 190     | 0xBE        | √å         | I with grave accent          |
| 63      | 0x3F        | ?         | Question mark                | 191     | 0xBF        | √ç         | I with acute accent          |
| 64      | 0x40        | @         | At sign                     | 192     | 0xC0        | √é         | I with circumflex            |
| 65      | 0x41        | A         | Uppercase A                  | 193     | 0xC1        | √è         | I with diaeresis             |
| 66      | 0x42        | B         | Uppercase B                  | 194     | 0xC2        | √ê         | Eth                          |
| 67      | 0x43        | C         | Uppercase C                  | 195     | 0xC3        | √ë         | N with tilde                 |
| 68      | 0x44        | D         | Uppercase D                  | 196     | 0xC4        | √í         | O with grave accent          |
| 69      | 0x45        | E         | Uppercase E                  | 197     | 0xC5        | √ì         | O with acute accent          |
| 70      | 0x46        | F         | Uppercase F                  | 198     | 0xC6        | √î         | O with circumflex            |
| 71      | 0x47        | G         | Uppercase G                  | 199     | 0xC7        | √ï         | O with tilde                 |
| 72      | 0x48        | H         | Uppercase H                  | 200     | 0xC8        | √ñ         | O with diaeresis             |
| 73      | 0x49        | I         | Uppercase I                  | 201     | 0xC9        | √ó         | Multiplication sign         |
| 74      | 0x4A        | J         | Uppercase J                  | 202     | 0xCA        | √ò         | O with slash                 |
| 75      | 0x4B        | K         | Uppercase K                  | 203     | 0xCB        | √ô         | U with grave accent          |
| 76      | 0x4C        | L         | Uppercase L                  | 204     | 0xCC        | √ö         | U with acute accent          |
| 77      | 0x4D        | M         | Uppercase M                  | 205     | 0xCD        | √õ         | U with circumflex            |
| 78      | 0x4E        | N         | Uppercase N                  | 206     | 0xCE        | √ú         | U with diaeresis             |
| 79      | 0x4F        | O         | Uppercase O                  | 207     | 0xCF        | √ù         | Y with acute accent          |
| 80      | 0x50        | P         | Uppercase P                  | 208     | 0xD0        | √û         | Thorn                        |
| 81      | 0x51        | Q         | Uppercase Q                  | 209     | 0xD1        | √ü         | Sharp s                      |
| 82      | 0x52        | R         | Uppercase R                  | 210     | 0xD2        | √°         | a with acute accent          |
| 83      | 0x53        | S         | Uppercase S                  | 211     | 0xD3        | √¢         | a with circumflex            |
| 84      | 0x54        | T         | Uppercase T                  | 212     | 0xD4        | √§         | a with diaeresis             |
| 85      | 0x55        | U         | Uppercase U                  | 213     | 0xD5        | √•         | a with ring above            |
| 86      | 0x56        | V         | Uppercase V                  | 214     | 0xD6        | √¶         | ae ligature                  |
| 87      | 0x57        | W         | Uppercase W                  | 215     | 0xD7        | √ß         | c with cedilla               |
| 88      | 0x58        | X         | Uppercase X                  | 216     | 0xD8        | √®         | e with grave accent          |
| 89      | 0x59        | Y         | Uppercase Y                  | 217     | 0xD9        | √©         | e with acute accent          |
| 90      | 0x5A        | Z         | Uppercase Z                  | 218     | 0xDA        | √™         | e with circumflex            |
| 91      | 0x5B        | [         | Left square bracket         | 219     | 0xDB        | √´         | e with diaeresis             |
| 92      | 0x5C        | \         | Backslash                   | 220     | 0xDC        | √¨         | i with grave accent          |
| 93      | 0x5D        | ]         | Right square bracket        | 221     | 0xDD        | √≠         | i with acute accent          |
| 94      | 0x5E        | ^         | Caret                        | 222     | 0xDE        | √Æ         | i with circumflex            |
| 95      | 0x5F        | _         | Underscore                   | 223     | 0xDF        | √Ø         | i with diaeresis             |
| 96      | 0x60        | `         | Grave accent                | 224     | 0xE0        | √∞         | eth                          |
| 97      | 0x61        | a         | Lowercase a                  | 225     | 0xE1        | √±         | n with tilde                 |
| 98      | 0x62        | b         | Lowercase b                  | 226     | 0xE2        | √≤         | o with grave accent          |
| 99      | 0x63        | c         | Lowercase c                  | 227     | 0xE3        | √≥         | o with acute accent          |
| 100     | 0x64        | d         | Lowercase d                  | 228     | 0xE4        | √¥         | o with circumflex            |
| 101     | 0x65        | e         | Lowercase e                  | 229     | 0xE5        | √µ         | o with tilde                 |
| 102     | 0x66        | f         | Lowercase f                  | 230     | 0xE6        | √∂         | o with diaeresis             |
| 103     | 0x67        | g         | Lowercase g                  | 231     | 0xE7        | √∑         | Division sign                |
| 104     | 0x68        | h         | Lowercase h                  | 232     | 0xE8        | √∏         | o with slash                 |
| 105     | 0x69        | i         | Lowercase i                  | 233     | 0xE9        | √π         | u with grave accent          |
| 106     | 0x6A        | j         | Lowercase j                  | 234     | 0xEA        | √∫         | u with acute accent          |
| 107     | 0x6B        | k         | Lowercase k                  | 235     | 0xEB        | √ª         | u with circumflex            |
| 108     | 0x6C        | l         | Lowercase l                  | 236     | 0xEC        | √º         | u with diaeresis             |
| 109     | 0x6D        | m         | Lowercase m                  | 237     | 0xED        | √Ω         | y with acute accent          |
| 110     | 0x6E        | n         | Lowercase n                  | 238     | 0xEE        | √æ         | thorn                        |
| 111     | 0x6F        | o         | Lowercase o                  | 239     | 0xEF        | √ø         | y with diaeresis             |
| 112     | 0x70        | p         | Lowercase p                  | 240     | 0xF0        | ƒÄ         | A with macron                |
| 113     | 0x71        | q         | Lowercase q                  | 241     | 0xF1        | ƒÅ         | a with macron                |
| 114     | 0x72        | r         | Lowercase r                  | 242     | 0xF2        | ƒÇ         | A with breve                 |
| 115     | 0x73        | s         | Lowercase s                  | 243     | 0xF3        | ƒÉ         | a with breve                 |
| 116     | 0x74        | t         | Lowercase t                  | 244     | 0xF4        | ƒÑ         | A with ogonek                |
| 117     | 0x75        | u         | Lowercase u                  | 245     | 0xF5        | ƒÖ         | a with ogonek                |
| 118     | 0x76        | v         | Lowercase v                  | 246     | 0xF6        | ƒÜ         | C with acute                 |
| 119     | 0x77        | w         | Lowercase w                  | 247     | 0xF7        | ƒá         | c with acute                 |
| 120     | 0x78        | x         | Lowercase x                  | 248     | 0xF8        | ƒà         | C with circumflex            |
| 121     | 0x79        | y         | Lowercase y                  | 249     | 0xF9        | ƒâ         | c with circumflex            |
| 122     | 0x7A        | z         | Lowercase z                  | 250     | 0xFA        | ƒå         | C with caron                 |
| 123     | 0x7B        | {         | Left curly brace            | 251     | 0xFB        | ƒç         | c with caron                 |
| 124     | 0x7C        | \|         | Vertical bar                | 252     | 0xFC        | ƒé         | D with caron                 |
| 125     | 0x7D        | }         | Right curly brace           | 253     | 0xFD        | ƒè         | d with caron                 |
| 126     | 0x7E        | ~         | Tilde                       | 254     | 0xFE        | ƒò         | E with ogonek                |
| 127     | 0x7F        | DEL       | Delete                      | 255     | 0xFF        | ƒô         | e with ogonek                |


### The Unicode Standard

But writing in other languages than the western alphabet was complicated. So in 1991 a new standard was created : the [Unicode Standard](https://www.unicode.org/standard/standard.html). As of Unicode version 16.0 (September 2024), it defines 155,063 characters and 3790 emojis.

### The Character Encoding

There are different ways to encode Unicode. The most common is UTF-8.

#### How does the utf-8 works?

The UTF-8 (Unicode Transformation Format ‚Äì 8-bit) is capable to encode 1,112,064 valid Unicode characters.

Basically :
- If the character is plain ASCII, then it's the same thing than ASCII.
- But if the first digit is a "1", then it's not Extended ASCII anymore, it means the following character will be coded on two octets.
- If it starts with "11", it means coded on three octets.
- And if it starts with "111", it means on four octets.

(The exact explanation [can be found here](https://en.wikipedia.org/wiki/UTF-8#Byte_map) for example.)

In [None]:
unicode_string = "A√©‚Ç¨´†ú"

for car in unicode_string:
    car_encoded = car.encode('utf-8') # default argument is also utf-8
    binary_representation = ' '.join(format(byte, '08b') for byte in car_encoded)
    print(f"{car} :{binary_representation}")

## Boolean data type

- Named after the English mathematician George Boole (1815 - 1964)
- 0 is False, and 1 is True
- It should take only a bit, but as Python object are complex, it takes a little bit more in the memory.
- Most comparisons return a Bool.


### Boolean Operations

#### With ```and```

In [None]:
True and True  # True
True and False  # False
False and True  # False
False and False  # False

#### With ```or```

In [None]:
True or True  # True
True or False  # True
False or True  # True
False or False  # False

#### With `not`

In [None]:
not True  # False
not False  # True


### Boolean Context

#### With `int` or `float`

- What happens if we convert `int` to`bool`?

In [None]:
print(bool(0))
print(bool(1))
print(bool(-1))
print(bool(121212))
print(bool(-121212))
print(bool(0.0001))
print(bool(0.0))

In Python, many values can be evaluated in a boolean context. This means they can be used in conditions where a boolean value is expected.

üí° With `int` or `float`: `0` is considered `False`, and any non-zero number is considered `True`.

In [None]:
if 0: print("Zero ? This line will not print")
if 1: print("One !")

In [None]:
print(bool(""))
print(bool("Hello"))


üí° With `str`: An empty string `""` is considered `False`, and any non-empty string is considered `True`.

In [None]:
if "": print(' ""  ? This line will not print')
if "whatever": print("Working !")

#### With `list` and `dict`

In [None]:
print(bool([]))
print(bool([1, 2, 3]))
print(bool({}))
print(bool({'a': 1}))

In [None]:
if []: print("Won't print")
if [1]: print("Non empty list. Will print")
if {}: print("Won't print")
if {'a': 1}: print("Non empty dict. Will print")

#### With `None`

In [None]:
bool(None)

üí° With **None**: The `None` value is considered `False`.


#### Other uses of `bool`

1. With **Conditional Statements**:

In [None]:
x = 10
y = 20

print(x < y)
print(y < x)


2. With a **Loop Control**:

In [None]:
while True:
    print("This loop will run indefinitely")
    break  # Let's stop it right away


3. **Function Return Values**:


In [None]:
def is_even(number):
    return number % 2 == 0

print(is_even(4))  # True
print(is_even(7))  # False

### Subclass of `int`

In Python, Booleans are a subclass of `int` (https://docs.python.org/3/c-api/bool.html). So they can be used just as an `int` !

In [None]:
3 + True

In [None]:

3 + False

In [None]:
3 * True

In [None]:
3 * False

In [None]:
3 * [] # returns an empty list

In [None]:
3 * bool([]) # returns zero

In [None]:
3 * bool(["whatever"]) # returns 3

## Non Primitive Data Structures

### Linear Data Structure

#### Array

Fixed size, contiguous memory, fast access.

<img src="files/array.webp" source="https://www.geeksforgeeks.org/">

#### Stack
LIFO, push/pop operations, useful for function calls and undo mechanisms.

<img src="files/stack.png" source="https://www.geeksforgeeks.org/">


#### Queue

FIFO, enqueue/dequeue operations, useful for scheduling and buffering.

<img src="files/queue.png" source="https://www.geeksforgeeks.org/">


#### Linked List
Dynamic size, non-contiguous memory, each element contains a reference to the next element.

<img src="files/linked-list.webp" source="https://www.geeksforgeeks.org/">

## Non Linear Data Structures

### Tree

Hierarchical structure with a root node and child nodes, useful for representing hierarchical data.

<img src="files/tree.webp" source="https://www.geeksforgeeks.org/">

### Graph

Collection of nodes and edges, useful for representing networks and relationships.

<img src="files/graph.webp" source="https://www.geeksforgeeks.org/">

### Hash Table

Maps keys to values using a hash function, providing fast insertion, deletion, and lookup operations.

<img src="files/hash-table.webp" source="https://www.geeksforgeeks.org/">