# Relational Database Options for GCP

| | Notes | Use cases | Pricing |
|-|-|-|-|
| VM Instances | Allows full control over the whole database server (You need to setup everything starting from the OS installation, to database configuration) | Used when the database you need is not supported by Cloud SQL or Big Query (or other "managed" databases on GCP) | Storage, Instance Size and Runtime |
| Cloud SQL | Only allows control on some of the database configurations because the OS and database installation are managed by Google. | Ideal for relational data that uses OLTP processing (transaction based). It can also be used for simple OLAP (analysis) queries like counting the number of inventors.| Storage, Instance Size and Runtime |
| Big Query | ?. | Ideal for data that uses complex OLAP queries for big data as it will be able to scale better. For example, counting the average number of patents for each inventor by country.| Storage and number of queries |
| Cloud Spanner | Only allows control on some of the database configurations because the OS and database installation are managed by Google. | Ideal for relational data that uses OLTP processing (transaction based) that needs to scale globally (for example, a 24/7 10000 transactions each second). Usually used for banking and gaming because it needs an almost 0 downtime | Storage, Instance Size and Runtime |

Given our use case for the Patents View, Big Query is the one best suited for this. However, I will need to study it thoroughly first as I have only scratched the surface.
For now, let's start with Cloud SQL as this is the one I'm most familiar with.

Let's revisit using Big Query once I am fully familiar with it (most likely after we finish the Backend topic).

More information here:
https://www.toptal.com/database/google-bigquery-tutorial
https://k21academy.com/google-cloud/cloud-sql-vs-cloud-spanner/

## Cloud SQL

You can think of this as a pre-built docker container that you only need to start and stop.

Some of the things that you need to take note of when initializing:
- `Instance ID` - The name of the instance (**This is a permanent configuration**)
- `Password` - The password for the database server
- `Database Version` - The "PostgreSQL" server version
    - Usually, we use the latest one to future proof the database. But if existing systems use old versions that are incompatible with the latest one (rarely happens), match this to the existing system's version
- `Region` - The region where the server will be deployed (**This is a permanent configuration**)
    - For production, use the region where most of the compute resources (Backend Services) will be deployed. This is generally within the same region as the customer's region
    - For testing, you can use the one closest to your location for lower network latency
- `Zonal Availability` - The zone within the region where the server will be deployed
    - For production systems, use multiple zones so that if the primary zone has an outage (natural calamity for example), the secondary zone will be used instead
    - For testing, use a single zone to minimize cost
    - Any zone will do. The only time you need to pick a specific zone is that if you have strict regulations wherein you cannot deploy servers on specific locations
- `Machine Type` - The "size" of the instance where you pick CPU and Memory sizes
    - This will directly affect the performance of the instance
    - Upgrading/Downgrading the configurations here is commonly known as "vertical" scaling, since you are updating the "size" of the instance
    - For testing, we can start with the smallest one (shared), then work our way upwards
        - This is true for production as well, wherein instead of starting with the smallest one, we start with whatever we have used in tests that are "fast enough"
- `Storage` - The "size" of the storage for the database
    - Typically, we use `SSD` as it's waaay faster than `HDD`
    - For testing, we can use `HDD` to save some costs since we will likely have less data for tests vs production
    - For the capacity, we can use the smallest one and let google "autoscale" it instead to avoid overprovisioning
- `Connections` - Allows us to select the networks that can connect to the server
    - For production, always use Private IP so that anyone from the internet cannot access it directly
    - Users who needs to access the database typically connect to a backend service and the backend service will be the one to communicate with the database
        - This flow allows us to only show data that are authorized for the user, and not everything that is stored in the database
    - For testing, we can use public IP so that we can connect to it directly. But I suggest we only set it to our own IP addresses and not the whole internet
    - Also allow "private" connections so that we can connect to it once we start deploying the backend
- `Backups` - Allows for automatic backup creation for the database
    - Useful for production systems so that we can recover any lost data if a breach or production mistakes occur
    - For testing, we can turn this off to save some costs
- `Maintenance` - The schedule at which Google (or you) can automatically stop the instance for maintenance purposes (like upgrading the database version)
    - Use a date/time where there are least number of users


Additional notes
- After creation, make sure to only allow SSL connections by selecting the "instance" and going to the "SECURITY" tab, then tick `Allow only SSL connections`
    - This will ensure that all data transfers are encrypted