# Data Representation in C

How do we interpret the number 742? The number is:
* 7 hundreds
* 4 tens
* 2 ones

Different bases are more natural in certain situations: base 10 is convenient for humans, but binary is more convenient in digital systems (*on* and *off*).

Base 2 (binary) has 2 symbols) (0, 1). Decimal has 10 symbols (0-9) and hexadecimal has 16 symbols (0-9 + A-F). Note that for any base $n$, the symbols represent a range $(0, n-1)$.

As a generic representation, multiply each digit by its weight, then sum the results. Mathematically if $ d_0 d_1 d_3 d_4 $ is in base $ n $:
$ d_0 d_1 d_3 d_4 = d_3 \times n^3 + d_2 \times n^2 + d_1 \times n^1 + d_0 \times n^0 $

In a computer, data are stored as binary digits in fixed-size cells, called *words*. An 8-bit word is usually called a *byte*, and is an extremely common size for a memory cell. 4 bits, in a burst of cuteness, is called a *nybble* and represents a single hexadecimal digit.

### Decimal to binary conversion

In [5]:
#include <stdio.h>
#include <stdlib.h>

// convert decimal numbers to binary
unsigned long long decimalToBinary(int decimalnum){
    long remainder = 0;
    long temp = 1;
    unsigned long long binarynum = 0;
    
    while (decimalnum > 0){
        remainder  = decimalnum % 2;
        decimalnum = decimalnum / 2;
        binarynum = binarynum + remainder * temp;
        temp = temp * 10; // shift left one column
    }
    return binarynum;
}

// print the first 15 binary numbers
int main(void){
    for (long i = 60000; i <= 65536; i++){
        printf("%6ld = %20llu\n", i, decimalToBinary(i) );
    }
    return EXIT_SUCCESS;
}

 60000 =     1110101001100000
 60001 =     1110101001100001
 60002 =     1110101001100010
 60003 =     1110101001100011
 60004 =     1110101001100100
 60005 =     1110101001100101
 60006 =     1110101001100110
 60007 =     1110101001100111
 60008 =     1110101001101000
 60009 =     1110101001101001
 60010 =     1110101001101010
 60011 =     1110101001101011
 60012 =     1110101001101100
 60013 =     1110101001101101
 60014 =     1110101001101110
 60015 =     1110101001101111
 60016 =     1110101001110000
 60017 =     1110101001110001
 60018 =     1110101001110010
 60019 =     1110101001110011
 60020 =     1110101001110100
 60021 =     1110101001110101
 60022 =     1110101001110110
 60023 =     1110101001110111
 60024 =     1110101001111000
 60025 =     1110101001111001
 60026 =     1110101001111010
 60027 =     1110101001111011
 60028 =     1110101001111100
 60029 =     1110101001111101
 60030 =     1110101001111110
 60031 =     1110101001111111
 60032 =     1110101010000000
 60033 =  

 61416 =     1110111111101000
 61417 =     1110111111101001
 61418 =     1110111111101010
 61419 =     1110111111101011
 61420 =     1110111111101100
 61421 =     1110111111101101
 61422 =     1110111111101110
 61423 =     1110111111101111
 61424 =     1110111111110000
 61425 =     1110111111110001
 61426 =     1110111111110010
 61427 =     1110111111110011
 61428 =     1110111111110100
 61429 =     1110111111110101
 61430 =     1110111111110110
 61431 =     1110111111110111
 61432 =     1110111111111000
 61433 =     1110111111111001
 61434 =     1110111111111010
 61435 =     1110111111111011
 61436 =     1110111111111100
 61437 =     1110111111111101
 61438 =     1110111111111110
 61439 =     1110111111111111
 61440 =     1111000000000000
 61441 =     1111000000000001
 61442 =     1111000000000010
 61443 =     1111000000000011
 61444 =     1111000000000100
 61445 =     1111000000000101
 61446 =     1111000000000110
 61447 =     1111000000000111
 61448 =     1111000000001000
 61449 =  

 63494 =     1111100000000110
 63495 =     1111100000000111
 63496 =     1111100000001000
 63497 =     1111100000001001
 63498 =     1111100000001010
 63499 =     1111100000001011
 63500 =     1111100000001100
 63501 =     1111100000001101
 63502 =     1111100000001110
 63503 =     1111100000001111
 63504 =     1111100000010000
 63505 =     1111100000010001
 63506 =     1111100000010010
 63507 =     1111100000010011
 63508 =     1111100000010100
 63509 =     1111100000010101
 63510 =     1111100000010110
 63511 =     1111100000010111
 63512 =     1111100000011000
 63513 =     1111100000011001
 63514 =     1111100000011010
 63515 =     1111100000011011
 63516 =     1111100000011100
 63517 =     1111100000011101
 63518 =     1111100000011110
 63519 =     1111100000011111
 63520 =     1111100000100000
 63521 =     1111100000100001
 63522 =     1111100000100010
 63523 =     1111100000100011
 63524 =     1111100000100100
 63525 =     1111100000100101
 63526 =     1111100000100110
 63527 =  

 65425 =     1111111110010001
 65426 =     1111111110010010
 65427 =     1111111110010011
 65428 =     1111111110010100
 65429 =     1111111110010101
 65430 =     1111111110010110
 65431 =     1111111110010111
 65432 =     1111111110011000
 65433 =     1111111110011001
 65434 =     1111111110011010
 65435 =     1111111110011011
 65436 =     1111111110011100
 65437 =     1111111110011101
 65438 =     1111111110011110
 65439 =     1111111110011111
 65440 =     1111111110100000
 65441 =     1111111110100001
 65442 =     1111111110100010
 65443 =     1111111110100011
 65444 =     1111111110100100
 65445 =     1111111110100101
 65446 =     1111111110100110
 65447 =     1111111110100111
 65448 =     1111111110101000
 65449 =     1111111110101001
 65450 =     1111111110101010
 65451 =     1111111110101011
 65452 =     1111111110101100
 65453 =     1111111110101101
 65454 =     1111111110101110
 65455 =     1111111110101111
 65456 =     1111111110110000
 65457 =     1111111110110001
 65458 =  

In [21]:
#include <stdio.h>
#include <stdlib.h>

int main(void){    
    printf("int type has %lu bytes\n", sizeof(int)); // use %lu to print long unsigned integer
    printf("long type has %lu bytes.\n", sizeof(long));
    printf("long long type has %lu bytes.\n", sizeof(long long));
    printf("float type has %lu bytes.\n", sizeof(float));
    printf("double type has %lu bytes.\n", sizeof(double));
}

int type has 4 bytes
long type has 8 bytes.
long long type has 8 bytes.
float type has 4 bytes.
double type has 8 bytes.


## Signed Magnitude

Note that in a signed integer type, we *lose* one bit to represent the sign. In a naive implementation, we would just use the first bit in the word to represent a negative sign:

| Cell contents | Value in Base 10|
|:-------------:|:---------------:|
| 011111111     |            127  |
| ...           |            ...  |
| 000000000     |            +0   |
| 100000000     |            -0   |
| ...           |            ...  |
| 111111111     |            -127 |

It is bothersome that this implementation gives us two different representations of zero. Maybe we can do better.

A better implementation, used in basically all digital logic, is called *two's complements* format. We negate a binary number by:
* flip all the bits
* add one

| Cell contents | Value in Base 10|
|:-------------:|:---------------:|
| 00000000      |            0    |
| 00000001      |            1    |
| ...           |            ...  |
| 01111111     |            +127 |
| 10000000     |            -128 |
| ...           |            ...  |
| 11111110     |            -2   |
| 11111111     |            -1 |


## Characters

Characters are also represented by binary values, or *character codes*. C by default uses ASCII, the *American Standard Code for Information Interchange*, which includes 95 letter characters and 30 control codes. It has 128 values and fits in 7 bits.

## Floating Point

Binary numbers can have a *binary point*, where digits to the right are fractional and digits to the left are whole, analagously to the decimal point. 

Some decimal fractions produce a repeating fraction when converted to binary. To store these values in a fixed-size cell, it must be truncated, and will produce a small error.

In [None]:
#include <stdio.h>
#include <stdlib.h>

int main(void){    
    int s = 0;
}