# Data Manipulation in SQL

Here you can access every table used in the course. To access each table, you will need to specify the `soccer` schema in your queries (e.g., `soccer.match` for the `match` table, and `soccer.league` for the `league` table).

--- 
_Note: When using sample integrations such as those that contain course data, you have read-only access. You can run queries, but cannot make any changes such as adding, deleting, or modifying the data (e.g., creating tables, views, etc.)._

## Take Notes

Add notes about the concepts you've learned and SQL cells with queries you want to keep.

## **Raw data preview:**

**Match table:**

In [11]:
SELECT *
FROM soccer.match
LIMIT 5

Unnamed: 0,id,country_id,season,stage,date,hometeam_id,awayteam_id,home_goal,away_goal
0,757,1,2011/2012,1,2011-07-29 00:00:00+00:00,1773,8635,2,1
1,758,1,2011/2012,1,2011-07-30 00:00:00+00:00,9998,9985,1,1
2,759,1,2011/2012,1,2011-07-30 00:00:00+00:00,9987,9993,3,1
3,760,1,2011/2012,1,2011-07-30 00:00:00+00:00,9991,9984,0,1
4,761,1,2011/2012,1,2011-07-30 00:00:00+00:00,9994,10000,0,0


**League table:**

In [1]:
SELECT *
FROM soccer.league
LIMIT 5

Unnamed: 0,id,country_id,name
0,1,1,Belgium Jupiler League
1,1729,1729,England Premier League
2,4769,4769,France Ligue 1
3,7809,7809,Germany 1. Bundesliga
4,10257,10257,Italy Serie A


**Team table:**

In [9]:
SELECT *
FROM soccer.team
LIMIT 5;

Unnamed: 0,id,team_api_id,team_long_name,team_short_name
0,1,9987,KRC Genk,GEN
1,2,9993,Beerschot AC,BAC
2,3,10000,SV Zulte-Waregem,ZUL
3,4,9994,Sporting Lokeren,LOK
4,5,9984,KSV Cercle Brugge,CEB


**Country table:**

In [10]:
SELECT *
FROM soccer.country
LIMIT 5;

Unnamed: 0,id,name
0,1,Belgium
1,1729,England
2,4769,France
3,7809,Germany
4,10257,Italy


**Example 1.7** 

Let's generate a list of matches won by Italy's Bologna team

In [12]:
-- Select the season, date, home_goal, and away_goal columns
SELECT 
	season,
    date,
	home_goal,
	away_goal
FROM soccer.match
WHERE 
-- Exclude games not won by Bologna
	CASE 
		WHEN hometeam_id = 9857 AND home_goal > away_goal 
			THEN 'Bologna Win'
		WHEN awayteam_id = 9857 AND away_goal > home_goal 
			THEN 'Bologna Win' 
		END IS NOT NULL

	-- IS NOT NULL: avoid output rows when this CASE condition is Null (when the team doesnt win)
		
LIMIT 100;
		

### SELECT Subqueries:

### FROM Subqueries:

### Correlated Subqueries

**Advantages of using subqueries:**

1. Avoid the limit of joins: Subqueries can be used to break down complex queries into smaller, more manageable parts.

2. Subqueries can be used to match specific conditions or values from one table to another, allowing for more flexible and targeted queries.

3. Subqueries can simplify complex queries by allowing for step-by-step filtering and aggregation of data.

4. Improve query performance: In some cases, subqueries can be more efficient than joins

**Disadvantages:**
High processing time because they are run multiple times

**Example 3.3** 

What was the highest scoring match for each `country`, in each `season`

This code runs this many times: `countries` x `season` 

(18 seconds to run on datacamp)

// can compare run time with another way of writing. 

// i dont think this code outputs what we expected


In [13]:
SELECT 
	-- Select country ID, date, home, and away goals from match
	main.country_id,
    main.date,
    main.home_goal,
    main.away_goal
FROM soccer.match AS main
WHERE 
	-- Filter for matches with the highest number of goals scored
	(home_goal + away_goal) = 
        (SELECT MAX(sub.home_goal + sub.away_goal)
         FROM soccer.match AS sub
         WHERE main.country_id = sub.country_id
               AND main.season = sub.season);


Unnamed: 0,country_id,date,home_goal,away_goal
0,1,2011-10-29 00:00:00+00:00,4,5
1,1,2012-11-17 00:00:00+00:00,2,6
2,1,2012-12-09 00:00:00+00:00,1,7
3,1,2013-01-19 00:00:00+00:00,2,6
4,1,2012-08-19 00:00:00+00:00,2,6
...,...,...,...,...
73,24558,2012-09-30 00:00:00+00:00,6,2
74,24558,2014-02-16 00:00:00+00:00,5,3
75,24558,2015-04-30 00:00:00+00:00,6,2
76,24558,2015-05-03 00:00:00+00:00,2,6


## Common Table Expressions (CTE)

- Can reference previous CTE in latter ones and in the main query
- Improves organization of code
- Improves query performance (run once and stored in memory)


**Before CTE:**

**With CTE:**

In [None]:
-- Set up your CTE
WITH match_list AS (
  -- Select the league, date, home, and away goals
    SELECT 
  		l.name AS league, 
     	m.date, 
  		m.home_goal, 
  		m.away_goal,
       (m.home_goal + m.away_goal) AS total_goals
    FROM match AS m
    LEFT JOIN league as l ON m.country_id = l.id)
-- Select the league, date, home, and away goals from the CTE
SELECT league, date, home_goal, away_goal
FROM match_list
-- Filter by total goals
WHERE total_goals >= 10;

## Explore Datasets
Use the `match`, `league`, and `country` tables to explore the data and practice your skills!
- Use the `match`, `league`, and `country` tables to return the number of matches played in Great Britain versus elsewhere in the world.
    - "England", "Scotland", and "Wales" should be categorized as "Great Britain"
    - All other leagues will need to be categorized as "World".
- Use the `match` and `country` tables to return the countries in which the average number of goals (home and away goals) scored are greater than the average number of goals of all matches.
- In a soccer league, points are assigned to teams based on the result of a game. Here, let's assume that 3 points are awarded for a win, 1 for a tie, and 0 for a defeat. Use the `match` table to calculate the running total of points earned by the team "Chelsea" (team id 8455) in the season "2014/2015".
    - The final output should have the match date, the points earned by Chelsea, and the running total.