# Example: Fun with Text, Strings and Characters
Textual data on a computer is represented as the `String` data type. `Strings` in languages such as [C](https://en.wikipedia.org/wiki/C_(programming_language)) were modeled as a sequence of characters, where each character was type `Char.` 
* Characters were represented via the [American Standard Code for Information Interchange (ASCII) system](https://en.wikipedia.org/wiki/ASCII), which was a set of `7-bit` teleprinter codes for the [AT&T](https://www.att.com) Teletypewriter exchange (TWX) network. For example, the character `A` in the ASCII system has an index of `65`.
* Later, `8-bit` character mappings were developed, i.e., the so-called [extended ASCII systems](https://en.wikipedia.org/wiki/Extended_ASCII), which had $0,\dots,255$ possible character values.

However, some of this remains true today, while other things are very different. For example, modern languages have sophisticated built-in `String` types constructed using the [Unicode](https://en.wikipedia.org/wiki/Unicode) character set. 
* The [Unicode standard](https://en.wikipedia.org/wiki/Unicode) encodes approximately 1.1 million possible characters, the first `128` of which are the same as the original `ASCII` set. [Unicode](https://en.wikipedia.org/wiki/Unicode) characters, which use up to 4$\times$bytes (32-bits) of storage per character, are indexed using the `base 16` (hexadecimal) number systems.

## Setup
Let's load some `external packages`, i.e., code that other people have made available to the world, using the [Julia package manager](https://docs.julialang.org/en/v1/stdlib/Pkg/) (which we'll explore in a few lectures from now):

In [1]:
# add -
using Pkg;
Pkg.add("DataFrames");
Pkg.add("PrettyTables");

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Examples-AY-2024/week-1/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Examples-AY-2024/week-1/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Examples-AY-2024/week-1/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Examples-AY-2024/week-1/Manifest.toml`


In [2]:
# use -
using DataFrames
using PrettyTables

## Characters
Let's start by considering the representation of the characters in the `ASCII` character system. The original `ASCII` characters are encoded as the first `128` characters (`0`$\rightarrow$`127`) in the modern [Unicode](https://en.wikipedia.org/wiki/Unicode) system. Let's look at what they are:

In [8]:
ASCII_character_range = range(0,stop=127,step=1) |> collect; # what is going on here?
character_table_df = DataFrame();
for i ∈ eachindex(ASCII_character_range)
    my_ascii_char_index = ASCII_character_range[i];
    c = convert(Char,my_ascii_char_index)

    row = (
        i = my_ascii_char_index,
        character = c
    );

    push!(character_table_df,row);
end
pretty_table(character_table_df, tf=tf_simple)

 [1m     i [0m [1m character [0m
 [90m Int64 [0m [90m      Char [0m
      0          \0
      1        \x01
      2        \x02
      3        \x03
      4        \x04
      5        \x05
      6        \x06
      7          \a
      8          \b
      9          \t
     10          \n
     11          \v
     12          \f
     13          \r
     14        \x0e
     15        \x0f
     16        \x10
     17        \x11
     18        \x12
     19        \x13
     20        \x14
     21        \x15
     22        \x16
     23        \x17
     24        \x18
     25        \x19
     26        \x1a
     27          \e
     28        \x1c
     29        \x1d
     30        \x1e
     31        \x1f
     32
     33           !
     34           "
     35           #
     36           $
     37           %
     38           &
     39           '
     40           (
     41           )
     42           *
     43           +
     44           ,
     45           -
     46         