<h1 style="text-align: center">
<div style="color: #DD3403; font-size: 60%">Data Science DISCOVERY MicroProject</div>
<span style="">MicroProject: Image Steganography with DataFrames</span>
<div style="font-size: 60%;"><a href="https://discovery.cs.illinois.edu/microproject/image-steganography-with-dataframes/">https://discovery.cs.illinois.edu/microproject/image-steganography-with-dataframes/</a></div>
</h1>

<hr style="color: #DD3403;">

## Overview

**Steganography** describes the technique of hiding data within secondary, usually ordinary, data to avoid detection.  For example, an ordinary PNG image might look like a picture to us -- but, hidden inside of the image, is a special encoding that reveals hidden data that otherwise goes undetected.

In this MicroProject, you will explore steganography by decoding a message secretly hidden in an image just for you.  Let's nerd out!

<hr style="color: #DD3403;">

## Part 0: Using the DISCOVERY Library

Identical to the Mosaic Project you recently completed, we have provided you the DISCOVERY library that loads an image from disk and converts the image into a DataFrame containing the `x`, `y`, `r`, `g`, and `b` values for every pixel.

The "Block I" image is included in this MicroProject and is just an ordinary Illini "Block I" (open it up and check it out!).  The following code imports the DISCOVERY library and loads the "Block I" image called `i.png`:

In [47]:
import DISCOVERY

df = DISCOVERY.df_image("i.png")
df

Unnamed: 0,x,y,r,g,b
0,0,0,255,255,255
1,0,1,255,255,255
2,0,2,255,255,255
3,0,3,255,255,255
4,0,4,255,255,255
...,...,...,...,...,...
73897,225,322,255,255,255
73898,225,323,255,255,255
73899,225,324,255,255,255
73900,225,325,255,255,255


In [48]:
### TEST CASE for Part 0: Using the DISCOVERY Library
tada = "\N{PARTY POPPER}"

assert("df" in vars())
assert(len(df) == 73902)
print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Part 1: Steganography

In the Journal of Vision Research (June 2000), Andrew Stockman and Lindsay T. Sharpe's *["The spectral sensitivities of the middle- and long-wavelength-sensitive cones derived from measurements in observers of known genotype"](https://www.sciencedirect.com/science/article/pii/S0042698900000213)* presents an incredibly detailed study of the sensitivity that the three types of cones have to various wavelengths of light.

Their work quantifies that **the human eye is far more sensitive to green and red light than to blue light**.  Given this finding, this MicroProject will explore the hypothesis that we may be able hide data by slightly changing the blue components of pixels in an image and those changes without the changes being detected visually.


## "Blue Steganography"

Based on the background research and hypothesis above, we will implement a way to detect "Blue Steganography" in an image.  A few basic foundations:

- We never want to change the blue (`b`) value of any pixel by more than 1 (ex: if the original value `b` value is `153`, our final value should be `152`, `153`, or `154` -- at most only adding or subtracting one to `153`).  Larger changes than this would almost certainly be easily detectable.

- To store information in this way, we will hide our message by checking if the `b` value is an **even number** or an **odd number**.  Specifically:

1. We will consider **all columns** but **only the first 27 rows** of any image (`y=0` to `y=26`).
2. **Exactly one** `b` value in the first 27 rows of each column will be an **even value**.   All other `b` values in the first 27 rows will be an odd value.
3. The **row** (y-value) in which the even value was found will represent the letter in our secret, hidden message.
    - The even `b` value in the first row of the image, `y=0`, represents an `A`,
    - The even `b` value in row `y=1` represents a `B`,
    - The even `b` value in row `y=2` represents a `C`,
    - ...and so on, until...
    - The even `b` value in row `y=25` represents a `Z`, and
    - The even `b` value in row `y=26` represents a space (` `).

*(This is only one possible steganography algorithm among many, but it should succeed in being an unnoticeable change to an image -- and it should be simple to process!)*

### Finding the First 27 Rows

Using the image `df` loaded earlier, first filter the DataFrame to **only the first 27 rows** (`y=0` to `y=26`) making sure to keep all the columns:

In [49]:
# Filter the DataFrame to only the first 26 rows:
df = df[(df['y'] >= 0)&(df['y']<= 26)]
df

Unnamed: 0,x,y,r,g,b
0,0,0,255,255,255
1,0,1,255,255,255
2,0,2,255,255,255
3,0,3,255,255,255
4,0,4,255,255,255
...,...,...,...,...,...
73597,225,22,255,255,255
73598,225,23,255,255,255
73599,225,24,255,255,255
73600,225,25,255,255,255


In [50]:
### TEST CASE for Part 1: Finding the First 27 Rows
tada = "\N{PARTY POPPER}"
assert(max(df.y) == 26), "The maximum value for `y` should be 26 (y=0...26) to consider only the first 27 rows."
assert(len(df) == 6102)

# ...and copy the DataFrame to avoid warnings about the slice:
df = df.copy()

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


### Label the Pixels with an Even `b` Value

Now that you have a DataFrame with only the rows we're interested in, we need to find the one row in each column that has an **even value**.  To detect if a value is even, we can use the "modulo" operation.  The "modulo" operation -- denoted by the percentage operator ($\%$) -- is the integer remainder after doing integer/long division.

For example:
- $7 / 4$ has us divide $7$ into $4$.  The result of integer/long division is "$1$ remainder $3$".  Therefore, $7 \% 4 = 3$ (the remainder).
- Similarly, $43 / 11$ is "$3$ remainder $10$".  Therefore, $43 \% 11 = 10$ (since 10 is the remainder).
- Finally, $11 / 2$ is "$5$ remainder $1$".   Therefore, $11 \% 2 = 1$  (since 1 is the remainder).

When we "modulo" by 2, there's a very special property to determine if we have an even or odd number:
- Even numbers ALWAYS have a remainder of `0`
- Odd numbers ALWAYS have a remainder of `1`

Create a new column called `mod2` that stores the result of the operation `df.b % 2` to find if the blue value of each pixel is even or odd:

In [51]:
# Create a new column `mod2`:
df ["mod2"]=df ['b']%2
df ["mod2" ]

0        1
1        1
2        1
3        1
4        1
        ..
73597    1
73598    1
73599    1
73600    1
73601    1
Name: mod2, Length: 6102, dtype: int64

In [52]:
### TEST CASE for Part 1: Label the Pixels with an Even b Value
tada = "\N{PARTY POPPER}"
assert("mod2" in df), "You must have the column `mod2` in your DataFrame"
assert(sum(df.mod2) == 6074), "You should have 6,074 odd blue pixels among your 6,102 pixels.  Check your mod2 logic."
print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


### Filter to Include ONLY Even-Valued Blue Pixels

Our steganography algorithm hides our message in the cells with **even values for the blue pixels**.  Using your `mod2` column you just created and your knowledge of the result of the `mod2` operation, filter your DataFrame to only contain the pixels with an even value in the blue pixel:

In [53]:
# Filter your DataFrame to contain ONLY even-valued blue pixels:
df = df[df['mod2'] == 0]
df

Unnamed: 0,x,y,r,g,b,mod2
24,0,24,255,255,254,0
341,1,14,255,255,254,0
674,2,20,255,255,254,0
1007,3,26,255,255,254,0
1313,4,5,255,255,254,0
1649,5,14,255,255,254,0
1982,6,20,255,255,254,0
2302,7,13,255,255,254,0
2619,8,3,255,255,254,0
2969,9,26,255,255,254,0


In [54]:
### TEST CASE for Part 1: Filter your DataFrame to contain ONLY even-valued blue pixels
tada = "\N{PARTY POPPER}"
assert("mod2" in df), "You must have the column `mod2` in your DataFrame"
assert(sum(df.mod2) == 0), "You should have no odd-valued blue pixels"
assert(len(df) == 28), "You should have only 28 pixels"

# ...and copy the DataFrame to avoid warnings about the slice:
df = df.copy()

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Part 2: Uncovering the Hidden Message

You now have a DataFrame where you have found **exactly one row for every column** and that row tells us the letter in our hidden message.  Let's now decode the message!

### Creating a Character from a Number

In Python, the `chr` function will translate a UTF-8 character code into a letter.  The UTF-8 is an international standard and defines the translation from binary data into letters.  For example, UTF-8 defines that the letter `"A"` is encoded by the number `65`, `"B"` is encoded by the number `66`, and so on.

We can see this in action by running the following cells:

In [55]:
chr(65)

'A'

In [56]:
chr(87)

'W'

Let's use the fact this is already built into Python (and all other programming languages) to do the work for us!  To set up for using the `chr` function, add a new column to your DataFrame called `charCode`.

The `charCode` column can simply add `65` to the `y` value of each row so that `y=0` has a `charCode` of `65`, `y=1` has a `charCode` of `66`, and so on.  Create this new column below:

In [57]:
# Add the column "charCode":
df["charCode"]=df["y"]+65
df["charCode"]

24      89
341     79
674     85
1007    91
1313    70
1649    79
1982    85
2302    78
2619    68
2969    91
3289    84
3604    72
3928    69
4277    91
4585    72
4913    73
5235    68
5562    68
5890    69
6226    78
6566    91
6879    77
7198    69
7539    83
7866    83
8175    65
8508    71
8833    69
Name: charCode, dtype: int64

In [58]:
### TEST CASE for Part 2: Creating a Character from a Number
tada = "\N{PARTY POPPER}"
assert("charCode" in df), "You must have the column `charCode` in your DataFrame"
assert(min(df.charCode) == 65), "The smallest charCode value must be 65 (you have a smaller value in your DataFrame)"
assert(max(df.charCode) == 91), "The largest charCode value must be 91 (you have a larger value in your DataFrame)"
print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


### Apply the `chr` Function to Every Row

Occasionally, we will need to interface between standard Python and DataFrames.  The `chr` function is an Python function, and not a DataFrame operation, so we will need to "apply" this function to our DataFrame.

To do this, we can use the DataFrame `apply` operation to apply a function to every row.  To use the `chr` function on every value in the `charCode` column, the following syntax will do exactly that:

> ```py
> # Run the `chr` function for each row, using the value stored in the column `charCode`:
> df["charCode"].apply(chr)
> ```

Store the result of the code above in a new column called `character`:

In [59]:
# Create a new column `character` by using the apply function:
df["character"] = df["charCode"].apply(chr)

df

Unnamed: 0,x,y,r,g,b,mod2,charCode,character
24,0,24,255,255,254,0,89,Y
341,1,14,255,255,254,0,79,O
674,2,20,255,255,254,0,85,U
1007,3,26,255,255,254,0,91,[
1313,4,5,255,255,254,0,70,F
1649,5,14,255,255,254,0,79,O
1982,6,20,255,255,254,0,85,U
2302,7,13,255,255,254,0,78,N
2619,8,3,255,255,254,0,68,D
2969,9,26,255,255,254,0,91,[


In [60]:
### TEST CASE for Part 2: Applying the chr Function
tada = "\N{PARTY POPPER}"
assert("character" in df), "You must have the column `character` in your DataFrame"
assert(len(df[df.character == "A"] == 1)), "Your character translation is incorrect"
assert(len(df[df.character == "E"] == 4)), "Your character translation is incorrect"
assert(len(df[df.character == "S"] == 2)), "Your character translation is incorrect"

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


### Concatenate the Entire String

We're almost done!!  If you look at your DataFrame, and read down the `character` column, you will see the hidden message -- but we can do better -- we want to combine everything together into one string!

To combine all string columns together across an entire DataFrame, the `str.cat()` function will **concatenate** all the strings together into one large string.  For example:

> ```py
> message = df.columnName.str.cat()
> message    # combines all `columnName` strings together across the full DataFrame
> ```

Create a variable `message` that combines all of the `character`s together across the full DataFrame:

In [61]:
message = df["character"].str.cat()
message

'YOU[FOUND[THE[HIDDEN[MESSAGE'

### Fix the Spaces


Oh no, you may have saw this coming!  There space character is not correct -- it looks like it was mapped to the wrong character!

Use `message.replace("[", " ")` to replace all `"["` characters with a space `" "` character:

In [62]:
# Fix the spaces in our string:
message = message.replace("[", " ")
message

'YOU FOUND THE HIDDEN MESSAGE'

In [63]:
### TEST CASE for Part 2: Concatenate the Entire String and Fix the Spaces
tada = "\N{PARTY POPPER}"
assert("message" in vars()), "You must define the variable `message`"
assert(len(message) == 28), "Your message must be 28 characters long"
assert(message[1] + message[5] + message[21] == "OOM"), "Your message appears incorrect"
assert(message[3] == " "), "Your message must have spaces for spaces (not '[')"
print(f"{tada} All Tests Passed! {tada}")


🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Part 3: Make a Reuseable Function

Often, programmers will do a task once to understand it and then combine the steps together into a function that can be used with different inputs.

Complete the following function, `decodeHiddenMessage`, that takes in a filename (like `i.png`) and returns the hidden message contained within in.  *(This function should simply combine all the steps in the previous parts into a single function, but with the provided file name `fileName` instead of `i.png`.*)

In [64]:
def decodeHiddenMessage(fileName):
  df=DISCOVERY.df_image(fileName)
  df = df[(df['y']>= 0)&(df['y']<= 26)]
  df['mod2']= df['b']%2
  df= df[df['mod2'] == 0].copy()
  df['charCode']= df['y']+ 65
  df['character']= df['charCode'].apply(chr)
  message = df['character'].str.cat()
  message = message.replace("["," ")
  return message

Let's first make sure your function work with the same input we've worked with already:

In [65]:
decodeHiddenMessage("i.png")

'YOU FOUND THE HIDDEN MESSAGE'

In [66]:
### TEST CASE for Part 3: Reuseable Function

tada = "\N{PARTY POPPER}"
assert("decodeHiddenMessage" in vars()), "You must define the function `decodeHiddenMessage`"
__test_msg = decodeHiddenMessage("i.png")
assert(len(__test_msg) == 28), "Your message must be 28 characters long"
assert(__test_msg[1] + __test_msg[5] + __test_msg[21] == "OOM"), "Your message appears incorrect"

__test_msg = decodeHiddenMessage("discovery.png")
assert(len(__test_msg) == 127), "Your message must be 127 characters long"
assert(__test_msg[1] + __test_msg[5] + __test_msg[21] == "HEU"), "Your message appears incorrect"

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Part 4: My Hidden Message for You

I've hidden a few more messages for you!  Let's use the function you created to decode it:

In [67]:
decodeHiddenMessage("discovery.png")

'THE NEXT IMAGE FOR YOU IS ON DISCORD  FIND IT IN THE DISCORD CHANNEL FOR THIS MICROPROJECT  YOU WILL DEFINITELY WANT TO DO THIS'

You can now use this cell for any other file if you want:

We can use this idea to hide messages all over in plain sight!  Who knows where else you might find hidden messages in images! :)

<hr style="color: #DD3403;">

## Submission

You're almost done!  All you need to do is to commit your lab to GitHub and run the GitHub Actions Grader:

1.  ⚠️ **Make certain to save your work.** ⚠️ To do this, go to **File => Save All**

2.  After you have saved, exit this notebook and return to https://discovery.cs.illinois.edu/microproject/image-steganography-with-dataframes/ and complete the section **"Commit and Grade Your Notebook"**.

3. If you see a 100% grade result on your GitHub Action, you've completed this MicroProject! 🎉