**Note**: Click on "*Kernel*" > "*Restart Kernel and Clear All Outputs*" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height="12" style="display: inline-block" src="../static/link/to_mb.png">](https://mybinder.org/v2/gh/webartifex/intro-to-data-science/main?urlpath=lab/tree/00_python_in_a_nutshell/07_content_data_types.ipynb).

# Chapter 0: Python in a Nutshell (Part 4)

An important skill for any data scientist is to learn to "think" like a computer does. So far, we have seen that Python is a pretty "intuitive" language: Many concepts can already be understood after seeing them once or just a couple of times. Many of the aspects that make other languages harder to learn, are somehow "magically" automated by Python in the background, most notably the management of the memory.

This section introduces a couple of more "advanced" concepts that presumably are *not* so intuitive to beginners.

## "Simple" Data Types

At first, let's review the concept of **object-orientation**, which is the paradigm by which Python manages the memory.

Take the following three examples. Whereas `a` and `b` have the same **value** (i.e., **semantic meaning**) to us humans, we see in this section that there are a couple of caveats to look out for.

In [1]:
a = 42
b = 42.0
c = 42.87

An important idea to understand is that each of the right-hand sides lead to a *new* **object** being created in the computer's memory *first*. An object can be thought of as a "box" in memory holding $1$s and $0$s (i.e., physical energy flows inside the computer).

Objects can and do exist without being **referenced** by a variable. Also, an object may even have several variables referencing them, just as a human may have different names in different contexts (e.g., a formal name in the password, a name by which one is known to friends, and maybe a different name by which one is called by one's spouse).

In the example, while both `a` and `b` have the *same* value, they are two *distinct* objects. The `is` operator checks if the objects referenced by two variables are indeed the *same* one, or, in other words, have the same **identity**.

In [2]:
a == b

True

In [3]:
a is b

False

Every object always has some **data type**, which determines how the object behaves and what we can do with it. The types of `a` and `b` are `int` and `float`, respectively.

In [4]:
type(a)

int

In [5]:
type(b)

float

While it seems cumbersome to analyze numbers at this level of detail, the following code cell shows how `float`ing-point numbers, one gold standard of numbers in all of computer science and engineering, behave couter-intutive. Yet, *nothing* is wrong here.

In [6]:
0.1 + 0.2 == 0.3

False

The data type of an object also determines which **methods** we can invoke on it. A method is just a function that is "attached" to an object and can be accessed with the `.` operator seen above. A method necessarily needs the objects it is attached to as in input, which is why it is attached to an object to begin with.

For example, `float` objects come with an `.is_integer()` method that tells us if the number has non-`0` decimals.

In [7]:
b.is_integer()

True

In [8]:
c.is_integer()

False

`int` objects on the contrary have no notion of the concept of decimals, which is why they do *not* have an `.is_integer()` method. That is what the `AttributeError` tells us.

In [9]:
a.is_integer()  # Note: In Python versions < 3.12 this cell raises an `AttributeError`

True

What we could do here, is to take `a` and pass it to the [float() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#float) built-in, a so-called **constructor**, which takes the value of its input and creates a *new* object of the desired `float` type. Yet, we know the answer to `aa.is_integer()` already, even without executing the code cell as `a` has no non-`0` decimals to begin with.

In [10]:
aa = float(a)

In [11]:
aa.is_integer()

True

Let's create another example `d` to see further examples of methods.

In [12]:
d = "Python rocks"

The type of `d` is `str`, which is short for "**string**" and is defined in computer science as a sequence of characters.

In [13]:
type(d)

str

`str` objects support various methods that "make sense" in the context of textual data, for example, the `.lower()` and `.upper()` methods.

In [14]:
d.lower()

'python rocks'

In [15]:
d.upper()

'PYTHON ROCKS'

### "Complex" Data Types

The examples in the previous section are considered "simple" as they only model *scalar* values (i.e., an individual object per example). However, we have already seen an example of a more "complex" object, namely the `list` called `numbers` from before.

In [16]:
numbers = [1, 2, 3, 4]

In [17]:
type(numbers)

list

In [18]:
numbers

[1, 2, 3, 4]

`list` objects also come with specific methods on them, for example, the `.append()` method that adds another element at the end of a `list`.

In [19]:
numbers.append(5)

Note how the `.append()` method does not lead to any output below the code cell. That is an indication that `numbers` is "changed in place." The formal term for this property is **mutability**. A good working definition is: Any object whose value can be changed *after* its creation, is a **mutable** objects. Objects *without* this property are called **immutable**.

An example for the latter, is the `tuple` data type. `tuple`s are simply `list`s with the additional property that they cannot be changed. Everything is else is the same as for `list`s. `tuple`s are created with parentheses replacing the brackets.

In [20]:
more_numbers = (7, 8, 9)

`more_numbers` does not know about the `.append()` method.

In [21]:
more_numbers.append(10)

AttributeError: 'tuple' object has no attribute 'append'

Whereas both `list` and `tuple` objects perserve the **order** of their elements, the `set` data type does not. Additionally, any object may only be an element of a `set` at most once. The syntax to create `set`s are curly braces, `{` and `}`. By giving up order, `set` objects offer significantly increased processing speed in various situations.

In [22]:
other_numbers = {3, 2, 1, 3, 3, 2}

In [23]:
other_numbers

{1, 2, 3}

One last example of a "complex" data type is the `dict`ionary type, which models a mapping relationship among the objects it contains. The syntax to create `dict`s also involves curly braces with the additon of using a `:` to specify the mapping relationships.

For example, to map `int`egers to `str`ings modeling the English words corresponding to the numbers, we could write the following. The objects to the left of the `:` take the role of the **keys** while the ones to the right take the role of the **values**.

In [24]:
to_words = {
    0: "zero",
    1: "one",
    2: "two",
}

The main purpose of `dict`s is to **look up** the value mapped to by some key. We can use the indexing notion to achieve that.

In [25]:
to_words[0]

'zero'

Looking up the values can *not* be done as the `KeyError` below shows.

In [26]:
to_words["zero"]

KeyError: 'zero'

Instead, we would have to create a `dict` mapping the words to numbers, like the one below.

In [27]:
to_numbers = {
    "zero": 0,
    "one": 1,
    "two": 2,
}

In [28]:
to_numbers["zero"]

0

`dict`s are among the most optimized data type in the Python world and a major building block in codebases solving real-life problems.

A big factor in getting good at any programming language is to learn what data types to use in which situations. There is no "best" data type; choosing among a couple of data types always comes down to trade-offs.