# Session 09

## Images and Data Analysis

## Images

For image processing in Python we use [Pillow](https://python-pillow.org), a _fork_ of the Python Imaging Library.

```bash
python -m pip install pillow
```

> Note: In our case we used the requirements.txt file.


In [None]:
from PIL import Image



In [None]:
print("Format:", im.format)
print("Size:", im.size)
print("Mode:", im.mode)

im.show()


The images normally display on their own window, but we can make Jupyter to display them "inline" with matplotlib.


In [None]:
from matplotlib.pyplot import imshow

%matplotlib inline

imshow(im)

Each image has it's own characteristics. [This is the famous "Lenna" image](https://www.cranbrooktownsman.com/opinion/booknotes-playboy-lena-sderberg-and-the-first-jpeg/).


In [None]:
lenna = Image.open("images/lenna.jpg")

print("Format:", lenna.format)
print("Size:", lenna.size)
print("Mode:", lenna.mode)

imshow(lenna)


We can automate image processing.


In [None]:
import os

print(os.listdir("images"))


In [None]:
for filename in os.listdir("images"):
    if filename.split(".")[1] in ["jpg", "png"]:
        print(filename)


We will "split" the filename to use the name and extension separately.


In [None]:
for filename in os.listdir("images"):
    if filename.split(".")[1] in ["jpg", "png"]:
        image = Image.open("images/" + filename)
        image.thumbnail((128, 128))
        image_rgb = image.convert("RGB")
        image_rgb.save("new_images/" + filename.split(".")[0] + "_tmb.jpg", "JPEG")


In [None]:
from PIL import ImageDraw, ImageFont

draw = ImageDraw.Draw(lenna)
font = ImageFont.truetype("Sudo-Bold.ttf", 72)

draw.text((250, 350), "Lenna", fill=(0, 0, 0), font=font)

imshow(lenna)


In [None]:
lenna = Image.open("images/lenna.jpg")
draw = ImageDraw.Draw(lenna)
font = ImageFont.truetype("Sudo-Bold.ttf", 72)

caption = "Lenna"

bounding_box = draw.textbbox((0, 0), caption, font=font)
print(bounding_box)


In [None]:
text_width = bounding_box[2] - bounding_box[0] + 10
text_height = bounding_box[3] - bounding_box[1] + 15

image_width, image_height = lenna.size

x_start = (image_width - text_width) / 2
y_start = (image_height - 50) - text_height

print(x_start, y_start, text_width, text_height)


In [None]:
draw.rectangle(
    (x_start, y_start, x_start + text_width, y_start + text_height),
    fill=(255, 255, 255),
    outline=(0, 0, 255),
)

draw.text(
    (x_start + 5, y_start - 5),
    caption,
    fill=(0, 0, 0),
    font=font,
)

imshow(lenna)


And we can save the new image.


In [None]:
lenna.save("new_images/lenna_captioned.png")


For more information about Pillow:

- [Tutorial](https://pillow.readthedocs.io/en/stable/handbook/tutorial.html)
- [HandBook](https://pillow.readthedocs.io/en/stable/handbook/index.html)


## Data Analysis

Data analysis is about **asking** and **answering** questions about your data.

| Characteristic | What does it means                                            |
| -------------- | ------------------------------------------------------------- |
| Accuracy       | Does it match reality? Is it error-free and exact?            |
| Consistency    | There are no contradictions in the data                       |
| Completeness   | How comprehensive is the information?                         |
| Reliability    | Is it reproducible?                                           |
| Relevance      |  Do you need this information?                                |
| Timeliness     | Can we get it on time to be relevant?                         |
| Auditable      | Do we know where the data comes from? How it was manipulated? |
| Secure         | Do we know who has access to the data? Is PII secure?         |


We have to be careful with Bias.

> Data bias is when the source data is skewed, providing results that are not fully representative of what you are researching, and can be either intentionally or unintentionally done.

