# Functions for Manipulating Data in PostgreSQL


Below are some examples of Queries practiced during the [course](https://campus.datacamp.com/courses/functions-for-manipulating-data-in-postgresql/).

Overview:
- Using Extensions: `SIMILARITY`, User-defined datatypes, User-defined functions, 
- Full text search: `TSVECTOR`, `LIKE`
- String operations: `SUBSTR`, `POSITION`, `LENGTH`, `CONCAT`, `REPLACE`, Changing Case
- Datetime operations: `DATE_TRUNC`, `TIMESTAMP`, `CAST`, `INTERVAL`
- Filtering using: Contains `@>` or `ANY`
- Accessing data in an Array using index `[][1]` and `=`
- Using `INFORMATION_SCHEMA.COLUMNS` to understand table datatypes

## Extending functionality using Extensions

In [None]:
-- Enable the pg_trgm extension
CREATE EXTENSION IF NOT EXISTS pg_trgm

-- Enable the fuzzystrmatch extension
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;

Error: -- Enable the pg_trgm extension
CREATE EXTENSION IF NOT EXISTS pg_trgm

-- Enable the fuzzystrmatch extension
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch; - syntax error at or near "CREATE"

### Similarity between 2 strings

In [None]:
-- Select the title and description columns
SELECT
-- Calculate the similarity between the two
	SIMILARITY(title, description)
	title,
	description,

FROM dvdrentals.film

Error: -- Select the title and description columns
SELECT 
-- Calculate the similarity between the two
	SIMILARITY(title, description)
	title, 
	description, 
  
FROM dvdrentals.film - syntax error at or near "FROM"


similarity |	title	| description
|--------------|-----------|------------|
0 |	BEACH HEARTBREAKERS	| A Fateful Display of a Womanizer And a Mad Scientist who must Outgun a A Shark in Soviet Georgia
0.022222223	| BEAST HUNCHBACK |	A Awe-Inspiring Epistle of a Student And a Squirrel who must Defeat a Boy in Ancient China
0.029126214 |	BEDAZZLED MARRIED | A Astounding Character Study of a Madman And a Robot who must Meet a Mad Scientist in An Abandoned Fun House

### User-defined Datatypes

In [None]:
-- Create an enumerated data type, compass_position
CREATE TYPE compass_position AS ENUM (
  	-- Use the four cardinal directions
  	'North',
  	'South',
  	'East',
  	'West'
);
-- Confirm the new data type is in the pg_type system table
SELECT typname
FROM pg_type
WHERE typname='compass_position';

Error: -- Create an enumerated data type, compass_position
CREATE TYPE compass_position AS ENUM (
  	-- Use the four cardinal directions
  	'North', 
  	'South',
  	'East', 
  	'West'
);
-- Confirm the new data type is in the pg_type system table
SELECT typname
FROM pg_type
WHERE typname='compass_position'; - permission denied for schema public

typname
|--------------|
compass_position

### User-defined Functions

In [None]:
-- Select the film title and inventory ids
SELECT
	f.title,
    i.inventory_id,
    -- Determine whether the inventory is held by a customer
    inventory_held_by_customer(i.inventory_id) as held_by_cust
FROM film as f
	INNER JOIN inventory AS i ON f.film_id=i.film_id
WHERE
	-- Only include results where the held_by_cust is not null
    inventory_held_by_customer(i.inventory_id) IS NOT NULL

Error: -- Select the film title and inventory ids
SELECT 
	f.title, 
    i.inventory_id,
    -- Determine whether the inventory is held by a customer
    dvdrentals.inventory_held_by_customer(i.inventory_id) as held_by_cust
FROM dvdrentals.film as f 
	INNER JOIN dvdrentals.inventory AS i ON f.film_id=i.film_id 
WHERE
	-- Only include results where the held_by_cust is not null
    dvdrentals.inventory_held_by_customer(i.inventory_id) IS NOT NULL - function dvdrentals.inventory_held_by_customer(integer) does not exist


title	| inventory_id	| held_by_cust
|--------------|-----------|------------|
ACE GOLDFINGER	| 9	| 366
AFFAIR PREJUDICE	| 21	| 111
AFRICAN EGG	| 25	| 590

## Full text search

### `TO_TSVECTOR` as an alternative to `LIKE` and `REGEX`

In [1]:
SELECT title, description
FROM dvdrentals.film
-- look for titles that contain elf
WHERE to_tsvector(title) @@ to_tsquery('elf');

Unnamed: 0,title,description
0,ELF MURDER,A Action-Packed Story of a Frisbee And a Woman...
1,ENCINO ELF,A Astounding Drama of a Feminist And a Teacher...
2,GHOSTBUSTERS ELF,A Thoughtful Epistle of a Dog And a Feminist w...


### `LIKE` operator

In [2]:
SELECT *
FROM dvdrentals.film
-- Select only records that contain the word 'GOLD'
WHERE title LIKE '%GOLD%'
LIMIT 5;

Unnamed: 0,film_id,title,description,release_year,language_id,rental_duration,rental_rate,length,replacement_cost,rating,special_features
0,2,ACE GOLDFINGER,A Astounding Epistle of a Database Administrat...,2010,1,3,4.99,48,12.99,G,"{Trailers,""Deleted Scenes""}"
1,95,BREAKFAST GOLDFINGER,A Beautiful Reflection of a Student And a Stud...,2009,4,5,4.99,123,18.99,G,"{Trailers,Commentaries,""Deleted Scenes""}"
2,365,GOLD RIVER,A Taut Documentary of a Database Administrator...,2008,6,4,4.99,154,21.99,R,"{Trailers,Commentaries,""Deleted Scenes"",""Behin..."
3,366,GOLDFINGER SENSIBILITY,A Insightful Drama of a Mad Scientist And a Hu...,2008,3,3,0.99,93,29.99,G,"{Trailers,Commentaries,""Behind the Scenes""}"
4,367,GOLDMINE TYCOON,A Brilliant Epistle of a Composer And a Frisbe...,2008,2,6,0.99,153,20.99,R,"{Trailers,""Behind the Scenes""}"


### String operations: LEFT, RIGHT, SUBSTRING, ...

`SUBSTR` ( [full_string] , [start_char] , [ length_char ] ) - returns a substring

`POSITION` ( [substring] IN [full_string] ) - returns the index of the first occurence of the substring

`LENGTH` ( [string] ) - returns the length of the string

In [3]:
SELECT
  -- Select only the street name from the address table
  SUBSTR(address, POSITION(' ' IN address)+1, LENGTH(address)) AS substring
FROM dvdrentals.address
LIMIT 3;

Unnamed: 0,substring
0,MySakila Drive
1,MySQL Boulevard
2,Workhaven Lane


### String operations: Concatenate, Replace...
change Case: `UPPER`, `LOWER`, `INITCAP`, etc

In [4]:
SELECT
  -- Concatenate the category name to coverted to uppercase to the film title converted to title case
  UPPER(c.category)  || ': ' || INITCAP(f.title) AS film_category,
  -- Convert the description column to lowercase, practice replacement too
  REPLACE(LOWER(f.description), ' ', '_') AS description
FROM
  dvdrentals.film AS f
  INNER JOIN dvdrentals.category AS fc
  	ON f.film_id = fc.film_id
  INNER JOIN dvdrentals.category AS c
  	ON fc.category = c.category
LIMIT 5;

Unnamed: 0,film_category,description
0,DOCUMENTARY: Academy Dinosaur,a_epic_drama_of_a_feminist_and_a_mad_scientist...
1,DOCUMENTARY: Academy Dinosaur,a_epic_drama_of_a_feminist_and_a_mad_scientist...
2,DOCUMENTARY: Academy Dinosaur,a_epic_drama_of_a_feminist_and_a_mad_scientist...
3,DOCUMENTARY: Academy Dinosaur,a_epic_drama_of_a_feminist_and_a_mad_scientist...
4,DOCUMENTARY: Academy Dinosaur,a_epic_drama_of_a_feminist_and_a_mad_scientist...


### Using `DATE_TRUNC` to truncate data at different precision levels 
(especially useful for `GROUP BY`)

In [8]:
-- Number of rentals per day of the month
SELECT
  DATE_TRUNC('day', rental_date) AS rental_day,
  -- Count total number of rentals
  COUNT(*) AS rentals
FROM dvdrentals.rental
GROUP BY 1
ORDER BY rental_day;

Unnamed: 0,rental_day,rentals
0,2005-05-25 00:00:00+00:00,122
1,2005-05-26 00:00:00+00:00,166
2,2005-05-27 00:00:00+00:00,169
3,2005-05-28 00:00:00+00:00,193
4,2005-05-29 00:00:00+00:00,160
5,2005-05-30 00:00:00+00:00,159
6,2005-05-31 00:00:00+00:00,173
7,2005-06-01 00:00:00+00:00,14
8,2005-06-15 00:00:00+00:00,298
9,2005-06-16 00:00:00+00:00,335


### Built-in `timestamp` features, and using `CAST`

In [6]:
SELECT
	-- Select the current date
	CURRENT_DATE,
    -- CAST the result of the NOW() function to a date
    CAST( CURRENT_TIME(0) AS time )

Unnamed: 0,current_date,current_time
0,2023-10-10 00:00:00+00:00,2023-10-10 13:28:23



|CURRENT_DATE |CURRENT_TIME|
|------------|------------  |
| 2023-10-10 | 15:27:48   |


### Datetime operations: `INTERVAL` subtract and add.

In [5]:
-- prepare a CTE in order to make these columns usable for calculations and operations
WITH joined_film_incl_expected_return AS (

    SELECT
        f.title,
        r.rental_date,
        f.rental_duration,
         -- Add the rental duration to the rental date
        INTERVAL '1' day * f.rental_duration + r.rental_date AS expected_return_date,
        r.return_date
    FROM dvdrentals.film AS f
    INNER JOIN dvdrentals.inventory AS i ON f.film_id = i.film_id
    INNER JOIN dvdrentals.rental AS r ON i.inventory_id = r.inventory_id
    ORDER BY f.title
)

SELECT
    *,
	-- compute the overdue_by interval
    J.return_date - expected_return_date AS overdue_by
FROM joined_film_incl_expected_return AS J;

Unnamed: 0,title,rental_date,rental_duration,expected_return_date,return_date,overdue_by
0,ACADEMY DINOSAUR,2005-08-03 00:13:10+00:00,6,2005-08-09 00:13:10+00:00,2005-08-12 01:35:10+00:00,"{'days': 3, 'hours': 1, 'minutes': 22}"
1,ACADEMY DINOSAUR,2005-08-02 04:47:19+00:00,6,2005-08-08 04:47:19+00:00,2005-08-03 04:02:19+00:00,"{'days': -5, 'minutes': -45}"
2,ACADEMY DINOSAUR,2005-07-10 17:07:31+00:00,6,2005-07-16 17:07:31+00:00,2005-07-16 17:03:31+00:00,{'minutes': -4}
3,ACADEMY DINOSAUR,2005-05-31 00:21:07+00:00,6,2005-06-06 00:21:07+00:00,2005-06-06 04:36:07+00:00,"{'hours': 4, 'minutes': 15}"
4,ACADEMY DINOSAUR,2005-08-23 03:56:37+00:00,6,2005-08-29 03:56:37+00:00,2005-08-25 22:58:37+00:00,"{'days': -3, 'hours': -4, 'minutes': -58}"
...,...,...,...,...,...,...
16039,ZORRO ARK,2005-06-16 01:50:32+00:00,3,2005-06-19 01:50:32+00:00,2005-06-17 05:02:32+00:00,"{'days': -1, 'hours': -20, 'minutes': -48}"
16040,ZORRO ARK,2005-08-01 14:11:25+00:00,3,2005-08-04 14:11:25+00:00,2005-08-06 08:52:25+00:00,"{'days': 1, 'hours': 18, 'minutes': 41}"
16041,ZORRO ARK,2005-06-16 04:52:51+00:00,3,2005-06-19 04:52:51+00:00,2005-06-20 23:33:51+00:00,"{'days': 1, 'hours': 18, 'minutes': 41}"
16042,ZORRO ARK,2005-07-07 18:22:45+00:00,3,2005-07-10 18:22:45+00:00,2005-07-08 19:10:45+00:00,"{'days': -1, 'hours': -23, 'minutes': -12}"


### Filtering using the 'Contains' Operator `@>`, which is alternative syntax to the `ANY` function.

In [35]:
SELECT
  title,
  special_features
FROM dvdrentals.film
-- Filter where special_features contains 'Deleted Scenes'
WHERE special_features::text[] @> ARRAY['Deleted Scenes']::text[]
-- cast as an array of text using the ::text[] syntax. This will allow the comparison to be performed correctly.
LIMIT 10;

Unnamed: 0,title,special_features
0,ACADEMY DINOSAUR,"{""Deleted Scenes"",""Behind the Scenes""}"
1,ACE GOLDFINGER,"{Trailers,""Deleted Scenes""}"
2,ADAPTATION HOLES,"{Trailers,""Deleted Scenes""}"
3,AFRICAN EGG,"{""Deleted Scenes""}"
4,AGENT TRUMAN,"{""Deleted Scenes""}"
5,AIRPLANE SIERRA,"{Trailers,""Deleted Scenes""}"
6,ALABAMA DEVIL,"{Trailers,""Deleted Scenes""}"
7,ALADDIN CALENDAR,"{Trailers,""Deleted Scenes""}"
8,ALASKA PHANTOM,"{Commentaries,""Deleted Scenes""}"
9,ALI FOREVER,"{""Deleted Scenes"",""Behind the Scenes""}"


### Filtering using the `ANY` operator

In [34]:
SELECT
    title,
    special_features
FROM
    dvdrentals.film
WHERE
    'Trailers' = ANY (special_features::text[])
-- By adding ::text[] after special_features, we are casting it as an array of text, which allows the ANY operator to work correctly.
LIMIT 5;

Unnamed: 0,title,special_features
0,ACE GOLDFINGER,"{Trailers,""Deleted Scenes""}"
1,ADAPTATION HOLES,"{Trailers,""Deleted Scenes""}"
2,AIRPLANE SIERRA,"{Trailers,""Deleted Scenes""}"
3,AIRPORT POLLOCK,{Trailers}
4,ALABAMA DEVIL,"{Trailers,""Deleted Scenes""}"


### Accessing data in an array simple way, using `=`

** TO REVISE: Cannot get it to work even with `::text[]`, best i can get is a limited selection (that only with "trailers" as the only special feature), by making them both arrays.

In [14]:
-- Select the title and special features column
SELECT
  title,
  special_features
FROM dvdrentals.film
-- Use the array index of the special_features column
WHERE special_features::text[][1] = 'Trailers';

Error: -- Select the title and special features column 
SELECT 
  title, 
  special_features 
FROM dvdrentals.film
-- Use the array index of the special_features column
WHERE special_features::text[][1] = 'Trailers'; - malformed array literal: "Trailers"

### Reading and doing calculations with Datetime:


In [32]:
SELECT
 	-- Select the rental and return dates
	rental_date,
	return_date,
 	-- Calculate the expected_return_date
	rental_date + interval '3 days' AS expected_return_date
FROM dvdrentals.rental
LIMIT 5;


Unnamed: 0,rental_date,return_date,expected_return_date
0,2005-05-25 02:54:33+00:00,2005-05-28 23:40:33+00:00,2005-05-28 02:54:33+00:00
1,2005-05-25 03:03:39+00:00,2005-06-02 02:12:39+00:00,2005-05-28 03:03:39+00:00
2,2005-05-25 03:04:41+00:00,2005-06-03 05:43:41+00:00,2005-05-28 03:04:41+00:00
3,2005-05-25 03:05:21+00:00,2005-06-02 08:33:21+00:00,2005-05-28 03:05:21+00:00
4,2005-05-25 03:08:07+00:00,2005-05-27 05:32:07+00:00,2005-05-28 03:08:07+00:00


### Let's see what data types we are dealing with:

In [31]:
-- Select all columns from the TABLES system database
 SELECT column_name, data_type
 FROM INFORMATION_SCHEMA.COLUMNS
 -- Filter by schema
 WHERE table_name in ('film')
 	AND column_name in ('title', 'special_features');

Unnamed: 0,column_name,data_type
0,special_features,text
1,title,text


In [27]:
-- Select all columns from the TABLES system database
 SELECT column_name, data_type
 FROM INFORMATION_SCHEMA.COLUMNS
 -- Filter by schema
 WHERE table_name in ('rental');

Unnamed: 0,column_name,data_type
0,rental_id,integer
1,rental_date,timestamp with time zone
2,inventory_id,integer
3,customer_id,integer
4,return_date,timestamp with time zone


### Estabilishing the Version of PostgreSQL:

In [7]:
SELECT version();

Unnamed: 0,version
0,"PostgreSQL 13.10 on aarch64-unknown-linux-gnu,..."
