# Floating Point Representation

In order to store a wider range of values and decimal values than integers, we use **floating point representation**. Integers are stored (almost) entirely naïvely: their bits directly represent, in base 2, the original base 10 number we want to store. But floats have an internal format specified for their bits that makes them very useful.

## IEEE 754

The standard and most common representation is [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754), which is what we'll study.

Begin by reading [this summary of the format](https://steve.hollasch.net/cgindex/coding/ieeefloat.html). The questions below are mostly intended to be answered alongside your reading, i.e. they are in chronological order as you read.

Two notes:

* Though the question treats both single (32-bit) and double (64-bit) floats, all questions refer to single-precision floats.

* You may use an [IEEE 754 converter](https://www.h-schmidt.net/FloatConverter/IEEE754.html) to help you understand concepts if you get stuck.



### Pre-reading

**(A)** Look up and explain (briefly) how scientific notation works.

**(B)** Look up and explain (briefly) the meaning of "least significant bit" and "most significant bit".

### During reading

**(C)** Explain why the range of a 31-bit storage format is half that of a 32-bit one. How is this double range achieved in IEEE 754? Additionally, why are -0 and +0 stored as separate values?

**(D)** If you wanted to represent an exponent of `-50`, what would the 8 exponent bits look like?

**(E)** Why is it safe and helpful to assume (in normalized floats) an implicit leading bit of `1` for the mantissa?

<details>
<summary>Hint</summary>

> What happens when multiplying values less than 1 by each other?
</details>

**(F)** How do floats represent infinity, and what is the difference between quiet and signalling NaNs?

**(G)** What are the two different outcomes possible when attempting to divide by zero?

### Post-reading

**(H)** In your own words, explain the basic insight that allows floats to store so much wider a range in terms of both upper bound and decimal value than ints, using the same limit of 32 bits.

**(I)** A certain IEEE 754 float has the following binary representation: `0 1000000 10000000000000000000000`. What is its value in decimal representation? Explain how you got that answer.

<details>
<summary>Hint</summary>

> Don't forget the implicit leading bit of the mantissa.
</details>

**(J)** Which bits of the exponent and mantissa should be set to represent 8.5?

<details>
<summary>Hint</summary>

> 8.5 == 8 * 1.0625.
</details>

**(K)** Use the converter to try the values 0.5 and 0.6. So many bits are set! For the second one, what is the error, i.e. how far from the intended value?

**(L)** Why is it inaccurate? Why is it so much easier to store a decimal of 0.5 than 0.6?

<details>
<summary>Hint</summary>

> 0.125 is also much easier to store than 0.1.
</details>