## When comparing to null, cannot write = null. Must write **is null**

---

In [None]:
SELECT * FROM users WHERE email IS NULL;

<div style="display: flex; flex-direction: row;">
  <div style="flex: 1; margin:10px;">
  <div class="hexagon">
  <p>About binary &amp; hex</p>
  
<style>
.hexagon {
  width: 200px;
  height: 110px;
  position: relative;
  background-color: #262626;
  margin: 55px 0;
  display: flex;
  align-items: center;
  justify-content: center;
}
.hexagon p {
  color: #ffffff;
  font-size: 20px;
  text-align: center;
  margin: 0;
}
.hexagon:before,
.hexagon:after {
  content: "";
  position: absolute;
  width: 0;
  border-left: 100px solid transparent;
  border-right: 100px solid transparent;
}

.hexagon:before {
  top: -55px;
  border-bottom: 55px solid #262626;
}

.hexagon:after {
  bottom: -55px;
  border-top: 55px solid #262626;
}
</style>
</div>
</div>

<div style="flex: 3; margin:10px;">

**Decimal - Base 10**\
0123456789\
123

**Binary - Base 2**\

0000 **0**
0001 **1**
0010 **2**
0011 **3**
0100 **4**
0101 **5**
0110 **6**
0111 **7**
1000 **8**
1001 **9**
1010 **10**
1011 **11**
1100 **12**
1101 **13**
1110 **14**
1111 **15**


**Hexadecimal - Base 16**\
0123456789ABCDEF
<p>
10 = 16 = 1*16+0<br>
1F = 31 = 1*16+15<br>
20 = 32 = 2*16+0<br>
FF = 255 = 15 * 16 + 15
</p>
4F = 01001111 = 79

</div>
</div>


---

# **Aggregation**

Data som är sammanslagna av flera datapunkter (kluster), till exempel medelantal, kallas aggregerad data.

Varför? Öka överskådlighet.

Anta att vi har rådata som innehåller klockslag för varje gång en bil passerar en bro. För att få överblick kan vi *aggregera* antal bilar per timme. BPT.

En aggregeringsfunktion tar en lista med värden, gör en beräkning på dessa och returnerar ett värde (skalär).

Vanliga aggs:

- Count()
- Sum()
- Avg()
- Stdev()
- Min()
- Max()
- String_agg()

## Example with periodic table

<div class="image-container">
  <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/03/Simple_Periodic_Table_Chart-blocks.svg/2560px-Simple_Periodic_Table_Chart-blocks.svg.png" alt="Periodic Table">
</div>

<style>
.image-container {
  position: relative;
  background-color: #CECECE;
  margin: 55px 0;
  padding: 10px;
  display: flex;
  align-items: center;
  justify-content: center;
  overflow: hidden;
}

.image-container img {
  max-width: 70%;
  height: auto;
}
</style>

In [None]:
select count(*) from Elements
-- aggregate count

In [None]:
select
    count(*) as 'Number of rows'
    count(Meltingpoint) as 'Melting point values'
    count(Boilingpoint) as 'Boiling point values'
    count(*) count(Meltingpoint) as 'Null values in meltingpoint',
    sum(Mass) as 'Sum of mass',
    avg(Boilingpoint) as 'Average boiling point',
    min(Boilingpoint) as 'Min boiling point'
    max(Boilingpoint) as 'Max boiling point'
    string_agg(symbol, ', ') -- Get one string with separator 
FROM
    Elements

**Abount null values**

If value is null count(), skips row. Effectively giving us count of all occuring values. This goes for all aggregations, including stdev() but it differs from SQL dbs.

In most databases, such as SQL Server and MySQL, stdev() calculates the standard deviation based on non-null values only.


In [None]:
select * from Elements order by boilingPoint

In [None]:
SELECT count(DISTINCT land) from städer;

In [None]:
SELECT count(period) from Elements -- all non null
SELECT count(distinct period) from Elements -- all non null uniques

## Grouping


In [None]:
SELECT * from Elements
SELECT count(DISTINCT period) from Elements -- 118 values

In [None]:
SELECT count(period) from Elements where Period = 1 -- 2 values (H, He)

In [None]:
SELECT count(period) from Elements group by period -- All grouped values

In [None]:
SELECT
    period,
    count(period) as 'Number of elements'
    string_aggs(Symbol ', ' as 'Symbols')
    
from
    Elements
group by
    period -- 7 Groups with n of elements and symbols

In [None]:
SELECT
    period,
    count(period) as 'Number of elements'
    string_aggs(Symbol ', ' as 'Symbols')
    
from
    Elements
where Boilingpoint < 500

group by
    period
having
    count(period) => 18

Null output, because no elements with boilingpoint below 500 f and period above 18 exist.

---

## LINKS

<a href="https://learn.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings">.NET / Custom numeric format strings</a>\
<a href="https://www.sqltutorial.org/sql-aggregate-functions">sqltutorial / sql-aggregate functions</a>\
<a href="https://learn.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-ver16">t-sql / data types</a>

# **Common data types in SQL**

### **Ints**

| Data type | Range | Range expression | Storage |
| --------- | ----- | ---------------- | ------- |
| bigint | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | -2^63 to 2^63-1 | 8 Bytes |
| int | -2,147,483,648 to 2,147,483,647 | -2^31 to 2^31-1 | 4 Bytes |
| smallint | -32,768 to 32,767 | -2^15 to 2^15-1 | 2 Bytes |
| tinyint | 0 to 255 | 2^0-1 to 2^8-1 | 1 Byte |

### **Floats**

| n value | Precision | Storage size |
| ------- | --------- | ------------ |
| 1-24 | 7 digits | 4 bytes |
| 25-53 | 15 digits | 8 bytes |

### **DATE, DATETIME2, DATETIME**

- DATE: Store dates only, without any time component. It can represent any date between January 1, 0001 and December 31, 9999, and it takes up 3 bytes of storage.

- DATETIME2: Store both date and time values, with a higher precision than DATETIME. It can represent any date between January 1, 0001 and December 31, 9999, with a time accuracy of 100 nanoseconds.

- DATETIME: Store both date and time values, but with a lower precision than DATETIME2. It can represent any date between January 1, 1753 and December 31, 9999, with a time accuracy of up to 3.33 milliseconds.


### **CHAR, VARCHAR, TEXT** (ASCII)

- CHAR: Store fixed-length character strings, where the maximum length is specified when the column is defined. If a shorter string is stored, it will be padded with spaces to fill the remaining length, so a CHAR(10) column will always take up 10 bytes of storage, regardless of the length of the actual string.

- VARCHAR: Store variable-length character strings, where the maximum length is also specified when the column is defined. If a shorter string is stored, it will only take up the necessary amount of storage, without any padding.

- TEXT: Store large variable-length character strings, where the maximum length is not specified when the column is defined. TEXT columns can store up to 2^31-1 bytes of data (about 2 GB), and they do not require any padding or fixed-length allocation. TEXT is typically used for storing large amounts of text data, such as the contents of a book or a long article.

In [None]:
   / \  / ___| / ___|_ _|_ _|
  / _ \ \___ \| |    | | | |
 / ___ \ ___) | |___ | | | |
/_/   \_\____/ \____|___|___|

### **NCHAR, NVARCHAR, NTEXT** Same but unicode 🦄

# **Use nvarchar for new dbs**

In [None]:
🌑🌑🌑🌑🌑🌑🌑🌑🌑🌑🌑
🌑🌑🌑🌔🌕🌕🌕🌘🌑🌑🌑
🌑🌑🌓🌕🌕🌕🌕🌕🌗🌑🌑
🌑🌑🌔🌕🌘🌑🌒🌕🌗🌑🌑
🌑🌑🌕🌘🌑🌑🌑🌒🌖🌑🌑
🌑🌒🌕🌕🌘🌑🌒🌕🌕🌑🌑
🌑🌒🌕🌕🌕🌕🌕🌕🌕🌑🌑
🌑🌒🌕🌕🌕🌕🌕🌕🌕🌑🌑
🌑🌒🌕🌕🌕🌖🌕🌕🌕🌑🌑
🌑🌒🌕🌕🌗🌑🌓🌕🌕🌑🌑
🌒🌕🌕🌕🌗🌔🌕🌕🌕🌑🌑
🌑🌑🌑🌑🌑🌑🌑🌑🌑🌑🌑

---