# String Encoding and Formatting

### String Encoding

- There are two widely used text encoding systems: ASCII and Unicode.
- ASCII is originally developed based on the English alphabet and encodes only 128 characters.

|	Decimal	|	Hex	|	Symbol	|	Decimal	|	Hex	|	Symbol	|
|	:----	|	:----	|	:----	|	:----	|	:----	|	:----	|
|	0	|	0	|	NUL (null)	|	64	|	40	|	@	|
|	1	|	1	|	SOH (start of heading)	|	65	|	41	|	A	|
|	2	|	2	|	STX (start of text)	|	66	|	42	|	B	|
|	3	|	3	|	ETX (end of text)	|	67	|	43	|	C	|
|	4	|	4	|	EOT (end of transmission)	|	68	|	44	|	D	|
|	5	|	5	|	ENQ (enquiry)	|	69	|	45	|	E	|
|	6	|	6	|	ACK (acknowledge)	|	70	|	46	|	F	|
|	7	|	7	|	BEL (bell)	|	71	|	47	|	G	|
|	8	|	8	|	BS (backspace)	|	72	|	48	|	H	|
|	9	|	9	|	TAB (horizontal tab)	|	73	|	49	|	I	|
|	10	|	A	|	LF (NL line feed, new line)	|	74	|	4A	|	J	|
|	11	|	B	|	VT (vertical tab)	|	75	|	4B	|	K	|
|	12	|	C	|	FF (NP form feed, new page)	|	76	|	4C	|	L	|
|	13	|	D	|	CR (carriage return)	|	77	|	4D	|	M	|
|	14	|	E	|	SO (shift out)	|	78	|	4E	|	N	|
|	15	|	F	|	SI (shift in)	|	79	|	4F	|	O	|
|	16	|	10	|	DLE (data link escape)	|	80	|	50	|	P	|
|	17	|	11	|	DC1 (device control 1)	|	81	|	51	|	Q	|
|	18	|	12	|	DC2 (device control 2)	|	82	|	52	|	R	|
|	19	|	13	|	DC3 (device control 3)	|	83	|	53	|	S	|
|	20	|	14	|	DC4 (device control 4)	|	84	|	54	|	T	|
|	21	|	15	|	NAK (negative acknowledge)	|	85	|	55	|	U	|
|	22	|	16	|	SYN (synchronous idle)	|	86	|	56	|	V	|
|	23	|	17	|	ETB (end of trans. block)	|	87	|	57	|	W	|
|	24	|	18	|	CAN (cancel)	|	88	|	58	|	X	|
|	25	|	19	|	EM (end of medium)	|	89	|	59	|	Y	|
|	26	|	1A	|	SUB (substitute)	|	90	|	5A	|	Z	|
|	27	|	1B	|	ESC (escape)	|	91	|	5B	|	[	|
|	28	|	1C	|	FS (file separator)	|	92	|	5C	|	\	|
|	29	|	1D	|	GS (group separator)	|	93	|	5D	|	]	|
|	30	|	1E	|	RS (record separator)	|	94	|	5E	|	^	|
|	31	|	1F	|	US (unit separator)	|	95	|	5F	|	_	|
|	32	|	20	|	 (space)	|	96	|	60	|	`	|
|	33	|	21	|	!	|	97	|	61	|	a	|
|	34	|	22	|	"	|	98	|	62	|	b	|
|	35	|	23	|	#	|	99	|	63	|	c	|
|	36	|	24	|	$	|	100	|	64	|	d	|
|	37	|	25	|	%	|	101	|	65	|	e	|
|	38	|	26	|	&	|	102	|	66	|	f	|
|	39	|	27	|	'	|	103	|	67	|	g	|
|	40	|	28	|	(	|	104	|	68	|	h	|
|	41	|	29	|	)	|	105	|	69	|	i	|
|	42	|	2A	|	*	|	106	|	6A	|	j	|
|	43	|	2B	|	+	|	107	|	6B	|	k	|
|	44	|	2C	|	,	|	108	|	6C	|	l	|
|	45	|	2D	|	-	|	109	|	6D	|	m	|
|	46	|	2E	|	.	|	110	|	6E	|	n	|
|	47	|	2F	|	/	|	111	|	6F	|	o	|
|	48	|	30	|	0	|	112	|	70	|	p	|
|	49	|	31	|	1	|	113	|	71	|	q	|
|	50	|	32	|	2	|	114	|	72	|	r	|
|	51	|	33	|	3	|	115	|	73	|	s	|
|	52	|	34	|	4	|	116	|	74	|	t	|
|	53	|	35	|	5	|	117	|	75	|	u	|
|	54	|	36	|	6	|	118	|	76	|	v	|
|	55	|	37	|	7	|	119	|	77	|	w	|
|	56	|	38	|	8	|	120	|	78	|	x	|
|	57	|	39	|	9	|	121	|	79	|	y	|
|	58	|	3A	|	:	|	122	|	7A	|	z	|
|	59	|	3B	|	;	|	123	|	7B	|	{	|
|	60	|	3C	|	<	|	124	|	7C	|	|	|
|	61	|	3D	|	=	|	125	|	7D	|	}	|
|	62	|	3E	|	>	|	126	|	7E	|	~	|
|	63	|	3F	|	?	|	127	|	7F	|	DEL	|

- UNICODE contains more than 100,000 characters covering more than 100 languages.

In [1]:
ord('B')  # Convert a single character to the ordinal number

66

In [2]:
ord('b')

98

In [3]:
ord('P'), ord('y'), ord('t'), ord('h'), ord('o'), ord('n')

(80, 121, 116, 104, 111, 110)

In [4]:
ord('P') + ord('y') + ord('t') + ord('h') + ord('o') + ord('n')

642

In [5]:
chr(67)  # Convert a number to the corresponding character

'C'

In [6]:
chr(99)

'c'

In [7]:
chr(80), chr(121), chr(116), chr(104), chr(111), chr(110)

('P', 'y', 't', 'h', 'o', 'n')

In [8]:
chr(80) + chr(121) + chr(116) + chr(104) + chr(111) + chr(110)

'Python'

In [9]:
n = int('12')  # Convert a string to an integer number
n

12

In [10]:
n = int('12.0')  # This won't work

ValueError: invalid literal for int() with base 10: '12.0'

In [11]:
n = float('12.0')
n

12.0

In [12]:
s = str(12)  # Convert a number to a string
s

'12'

In [13]:
s = str(12.0)
s

'12.0'

- The following program converts a user-inputted message to the corresponding ASCII codes.

In [14]:
msg = input('What is the message? ')
print('The encoded message:', end=' ')
for c in msg:
    print(ord(c), end=' ')

What is the message? Wake up, Neo...
The encoded message: 87 97 107 101 32 117 112 44 32 78 101 111 46 46 46 

<img src="images/wake_up_neo.jpg" alt="The Matrix: Wake up Neo...", style="width: 200px;"/>

- Now, let's decode the message.

In [15]:
encoded_msg = input('What is the encoded message? ')
codes = encoded_msg.split()  # Split the input into a list of seperate number strings
codes

What is the encoded message? 87 97 107 101 32 117 112 44 32 78 101 111 46 46 46


['87',
 '97',
 '107',
 '101',
 '32',
 '117',
 '112',
 '44',
 '32',
 '78',
 '101',
 '111',
 '46',
 '46',
 '46']

In [16]:
decoded_msg = ''  # Set the initial msg empty
for num_str in codes:
    code = int(num_str)  # Convert each number string to an integer code
    decoded_msg += chr(code)  # Convert each code to a character and add it to the end of the msg
print("The decoded message:", decoded_msg)

The decoded message: Wake up, Neo...


### String Formatting (Pretty It Up)

- These examples can be found by running `help(FORMATTING)` in a Jupyter Notebook cell.
```python
s = '...{[index][:][[fill]align][sign][#][0][width][grouping_option][.precision][type]}...'
s.format(...)```

In [17]:
'{0}, {1}, {2}'.format('a', 'b', 'c')  # Accessing arguments by position

'a, b, c'

In [18]:
'{}, {}, {}'.format(10, -20.987649534590876, 30.4)

'10, -20.987649534590876, 30.4'

In [19]:
'{2}, {1}, {0}'.format('a', 'b', 'c')

'c, b, a'

In [20]:
'{0}{1}{0}'.format('abra', 'cad')  # Repeated indices

'abracadabra'

In [21]:
'Coordinates: {lat}, {lon}'.format(lat='37.24N', lon='-115.81W')  # Accessing arguments by name

'Coordinates: 37.24N, -115.81W'

In [22]:
coord = [3, 5]
'X: {0[0]};  Y: {0[1]}'.format(coord)  # Accessing arguments' items

'X: 3;  Y: 5'

In [23]:
'{:+}; {:+}'.format(3.14, -3.14)  # Show both the plus and minus signs

'+3.14; -3.14'

In [24]:
'{:,}'.format(1234567890)  # Using the comma as a thousands separator

'1,234,567,890'

In [25]:
'ratio = {:.2%}'.format(19/22)  # Expressing a percentage

'ratio = 86.36%'

In [26]:
'{:.2f}'.format(1234.5678)  # Precision: two digits after the decimal point

'1234.57'

In [27]:
'{:.2}'.format(1234.5678)  # Precision: two digits in total

'1.2e+03'

In [28]:
'{:e}'.format(1234.5678)  # Precision: scientific notation with 6 digits after the decimal point

'1.234568e+03'

In [29]:
'{:.4E}'.format(1234.5678)  # Precision: scientific notation with 4 digits after the decimal point

'1.2346E+03'

In [30]:
'Alignment: {:<30}'.format('left aligned')  # Align the text with a specified width

'Alignment: left aligned                  '

In [31]:
'Alignment: {:>30}'.format('right aligned')

'Alignment:                  right aligned'

In [32]:
'Alignment: {:^30}'.format('centered')

'Alignment:            centered           '

In [33]:
'Alignment: {:*^30}'.format('centered')  # Use '*' as a fill character

'Alignment: ***********centered***********'

In [34]:
"dict: {{ apples:{}, bananas:{} }}".format(10, 20)  # The brace character can be escaped by doubling

'dict: { apples:10, bananas:20 }'

- The following program count the total value of some coins.
<img src="images/coins.jpg" alt="Coins", style="width: 200px;"/>

In [35]:
print("Enter the number (integer) of each type of coins.")
q = int(input("Quarters: "))
d = int(input("Dimes: "))
n = int(input("Nickels: "))
p = int(input("Pennies: "))
total = q * 25 + d * 10 + n * 5 + p
print("Total: ${:.2f}".format(total/100))

Enter the number (integer) of each type of coins.
Quarters: 12
Dimes: 1
Nickels: 0
Pennies: 4
Total: $3.14


### Course Materials on YouTube and GitHub

- Course videos are hosted by YouTube (http://youtube.com/yongtwang).
- Course documents (Jupyter Notebooks and Python source code) are hosted by GitHub (http://github.com/yongtwang).