# Rendu de projet données immobilières

Useful links : [Subject](https://simplonline.co/briefs/327207de-b68d-4563-bc7a-249a7637782d) | [Original dataset (.csv)](https://www.data.gouv.fr/fr/datasets/5c4ae55a634f4117716d5656/) <br>
This project was realized by : Adil, Marianne, Nidal, Théo & Zohra

----
## 📍 TO DO :
- Find a solution for data type of col `Code postal`
<br>
<br>
-----

### Table of content :
> 💡 Links are working if you clik them in theo's GitHub repo
- [Dictionnaire de données - Data Dictionary](https://github.com/theoprovost/projet_imm#data_dictionary)
- [Modèle Conceptuel de données (MCD)]()
- [Modèle Physique de données (MPD)]()

### Table of questions :
1. [Nombre d'appartements et maisons vendus en 2020](#) 
2. [Nombre de biens vendu par trimestre](#)
3. [Proportion des ventes de biens par trimestre par typologie de bien](#)
4. [Proportion d’appartements vendus par nombre de pièces](#)
5. [Les 10 départements où il y a eu le plus de ventes immobilières](#)
6. [Les 10 départements où il y en a eu le moins](#)
7. [Prix au m2 en IDF](#)
8. [Liste des 10 appartements les plus chers avec le département et le nombre de mètres carrés](#)
9. [Taux d’évolution du nombre de ventes entre le premier et le second trimestre de 2020](#)
10. [Liste des communes où le nombre de ventes a augmenté d'au moins 20% entre le premier et le second trimestre de 2020](#)

In [3]:
# PG connection
import psycopg2 as pgconn

def connect():
    try:
        conn = pgconn.connect(
            host='localhost',
            database='immo_db',
            user='postgres',
            password=''
        )
    except (Exception, pgconn.DatabaseError) as error:
        print('⚠︎ Postgres connection error')
        conn = error
        print(conn)
    finally:
        print('Connection established with postgres.')
        return conn
    

conn = connect()
conn.autocommit = True
cur = conn.cursor()

# Import pandas
import pandas as pd

# Import numpy
import numpy as np

# Import seaborn
import seaborn as sn

Connection established with postgres.


----------
## Nombre d'appartements et maisons vendus en 2020

```sql
SELECT sum(total_houses + total_flats) AS sub_total, total_houses, total_flats FROM (
	SELECT * FROM 
	(
		SELECT sum(case when bien.code_type_local = 1 then 1 else 0 end) AS total_houses, 
		sum(case when bien.code_type_local = 2 then 1 else 0 end) AS total_flats
		FROM transaction LEFT JOIN bien ON transaction.bien_id = bien.id 
	) x 
) y GROUP BY total_flats, total_houses;
```
<br>

In [4]:
cur.execute('SELECT sum(total_houses + total_flats) AS sub_total, total_houses, total_flats FROM (SELECT * FROM (SELECT sum(case when bien.code_type_local = 1 then 1 else 0 end) AS total_houses, sum(case when bien.code_type_local = 2 then 1 else 0 end) AS total_flats FROM transaction LEFT JOIN bien ON transaction.bien_id = bien.id ) x ) y GROUP BY total_flats, total_houses;')
result = cur.fetchone()

df = pd.DataFrame(result)
df = df.transpose()
df = df.rename(columns={0: 'Total', 1: 'Sub-total houses', 2: 'Sub-total flats'})

df

Unnamed: 0,Total,Sub-total houses,Sub-total flats
0,69876,34756,35120


------------
##  Nombre de biens vendu par trimestre 

```sql
SELECT 
	sum(case when date_mutation <= '2020-03-31' then 1 else 0 end) as t1,
	sum(case when date_mutation BETWEEN '2020-04-01' AND '2020-06-30' then 1 else 0 end) as t2,
	sum(case when date_mutation BETWEEN '2020-07-01' AND '2020-09-30' then 1 else 0 end) as t3,
	sum(case when date_mutation >= '2020-10-01' then 1 else 0 end) as t4
FROM transaction;
```
<br>

In [6]:
cur.execute("SELECT sum(case when date_mutation <= '2020-03-31' then 1 else 0 end) as t1, sum(case when date_mutation BETWEEN '2020-04-01' AND '2020-06-30' then 1 else 0 end) as t2, sum(case when date_mutation BETWEEN '2020-07-01' AND '2020-09-30' then 1 else 0 end) as t3, sum(case when date_mutation >= '2020-10-01' then 1 else 0 end) as t4 FROM transaction")
result = cur.fetchone()

df = pd.DataFrame(result)
df = df.transpose()
df = df.rename(columns={0: 'T1', 1: 'T2', 2: 'T3', 3: 'T4'}, index={0 : '2020'})
df

Unnamed: 0,T1,T2,T3,T4
2020,61410,50793,49230,38567


-----------
##  Proportion des ventes de biens par trimestre par typologie de bien

```sql
SELECT 
	sum(case when date_mutation <= '2020-03-31' then 1 else 0 end) as t1,
	sum(case when date_mutation BETWEEN '2020-04-01' AND '2020-06-30' then 1 else 0 end) as t2,
	sum(case when date_mutation BETWEEN '2020-07-01' AND '2020-09-30' then 1 else 0 end) as t3,
	sum(case when date_mutation >= '2020-10-01' then 1 else 0 end) as t4
FROM transaction LEFT JOIN bien on transaction.bien_id = bien.id GROUP BY bien.code_type_local;
```
<br>

In [7]:
cur.execute("SELECT sum(case when date_mutation <= '2020-03-31' then 1 else 0 end) as t1, sum(case when date_mutation BETWEEN '2020-04-01' AND '2020-06-30' then 1 else 0 end) as t2, sum(case when date_mutation BETWEEN '2020-07-01' AND '2020-09-30' then 1 else 0 end) as t3, sum(case when date_mutation >= '2020-10-01' then 1 else 0 end) as t4 FROM transaction LEFT JOIN bien on transaction.bien_id = bien.id GROUP BY bien.code_type_local;")
result = cur.fetchall()

df = pd.DataFrame(result, columns=['T1', 'T2', 'T3', 'T4'], index=['Houses', 'Flats', 'Isolated dependancy', 'Industrial space', 'Not specified'])
df

Unnamed: 0,T1,T2,T3,T4
Houses,9643,8725,9020,7368
Flats,10161,8982,9040,6937
Isolated dependancy,7986,7182,7324,5131
Industrial space,2455,1566,1546,1297
Not specified,31165,24338,22300,17834


--------------
##  Proportion d’appartements vendus par nombre de pièces

```sql
SELECT count(transaction.id), bien.nb_pieces_principales FROM transaction LEFT JOIN bien ON transaction.bien_id = bien.id WHERE bien.code_type_local = 2 GROUP BY bien.nb_pieces_principales ORDER BY bien.nb_pieces_principales;
```
<br>

In [8]:
cur.execute('SELECT count(transaction.id), bien.nb_pieces_principales FROM transaction LEFT JOIN bien ON transaction.bien_id = bien.id WHERE bien.code_type_local = 2 GROUP BY bien.nb_pieces_principales ORDER BY bien.nb_pieces_principales;')
result = cur.fetchall()

df = pd.DataFrame(result, columns=['Total nb of flats', 'Nb of rooms'])
df

Unnamed: 0,Total nb of flats,Nb of rooms
0,78,0
1,7608,1
2,10978,2
3,10450,3
4,4674,4
5,1015,5
6,215,6
7,58,7
8,24,8
9,11,9


------------
## Les 10 départements où il y a eu le plus de ventes immobilières

❌ ERR : Attention, trouver une solution pour ne pas perdre le CP (cast en float auto de `pd.read_csv()` dénature la colonne `Code postal` : ie. les 9 premiers départements disparaissent) 

```sql
SELECT count(transaction.id), substr(code_postal, 0, 3) FROM transaction LEFT JOIN bien ON transaction.bien_id = bien.id GROUP BY substr(code_postal, 0, 3) ORDER BY count(transaction.id) DESC LIMIT 10
```
<br>

In [17]:
cur.execute('SELECT count(transaction.id), substr(code_postal, 0, 3) FROM transaction LEFT JOIN bien ON transaction.bien_id = bien.id GROUP BY substr(code_postal, 0, 3) ORDER BY count(transaction.id) DESC LIMIT 10')
result = cur.fetchall()

df = pd.DataFrame(result, columns=['N° of transactions', 'Department'])
df

Unnamed: 0,N° of transactions,Department
0,13196,10
1,11678,61
2,11520,62
3,10241,11
4,6950,64
5,6660,63
6,5632,65
7,5613,22
8,5529,21
9,5350,12


------------
##  Les 10 départements où il y en a eu le moins

```sql
SELECT count(transaction.id), substr(code_postal, 0, 3) FROM transaction LEFT JOIN bien ON transaction.bien_id = bien.id GROUP BY substr(code_postal, 0, 3) ORDER BY count(transaction.id) ASC LIMIT 10
```
<br>

In [20]:
cur.execute('SELECT count(transaction.id), substr(code_postal, 0, 3) FROM transaction LEFT JOIN bien ON transaction.bien_id = bien.id GROUP BY substr(code_postal, 0, 3) ORDER BY count(transaction.id) ASC LIMIT 10')
result = cur.fetchall()

df = pd.DataFrame(result, columns=['N° of transactions', 'Department'])
df

Unnamed: 0,N° of transactions,Department
0,89,46
1,118,58
2,150,57
3,229,69
4,236,97
5,242,85
6,258,54
7,283,38
8,284,88
9,289,37


--------------
##  Prix moyen du mètre carré en IDF

```sql
	SELECT (sum(transaction.valeur_fonciere) / sum(bien.surface_reel_bati)) FROM transaction LEFT JOIN bien ON transaction.bien_id = bien.id WHERE substr(bien.code_postal, 0, 3) IN ('75', '92', '93', '94')
```
<br>

In [10]:
cur.execute("SELECT (sum(transaction.valeur_fonciere) / sum(bien.surface_reel_bati)) FROM transaction LEFT JOIN bien ON transaction.bien_id = bien.id WHERE substr(bien.code_postal, 0, 3) IN ('75', '92', '93', '94');")
result = cur.fetchone()

print(f'The medium price in IDF is {round(result[0], 2) or False}/m2')

The medium price in IDF is 4400.79/m2


------------------
##  Liste des 10 appartements les plus chers avec le département et le nombre de mètres carrés

```sql
SELECT * FROM (
	SELECT valeur_fonciere, code_postal, surface_reel_bati FROM transaction LEFT JOIN bien on transaction.bien_id = bien.id ORDER BY transaction.valeur_fonciere DESC
) x WHERE (-1 IN (valeur_fonciere)) IS NOT NULL LIMIT 10
```
<br>

In [11]:
cur.execute('SELECT * FROM (SELECT valeur_fonciere, code_postal, surface_reel_bati FROM transaction LEFT JOIN bien on transaction.bien_id = bien.id ORDER BY transaction.valeur_fonciere DESC) x WHERE (-1 IN (valeur_fonciere)) IS NOT NULL LIMIT 10;')
result = cur.fetchall()

df = pd.DataFrame(result, columns=['Land value', 'Postal code', 'Surface'])
df

Unnamed: 0,Land value,Postal code,Surface
0,49000000.0,6190,70.0
1,49000000.0,6190,996.0
2,49000000.0,6190,0.0
3,47500000.0,6230,455.0
4,47500000.0,6230,
5,47500000.0,6230,93.0
6,47500000.0,6230,0.0
7,47500000.0,6230,455.0
8,43200000.0,6200,
9,43200000.0,6200,


--------------
## Taux d’évolution du nombre de ventes entre le premier et le second trimestre de 2020

Evolution rate formula : $ {\displaystyle t={\frac {V_{A}-V_{D}}{|V_{D}|}}} $ where t is expressed in percentage (%)




```sql
SELECT (t2 - t1)::FLOAT / t1 FROM
	(SELECT 
		sum(case when date_mutation <= '2020-03-31' then 1 else 0 end) as t1,
		sum(case when date_mutation BETWEEN '2020-04-01' AND '2020-06-30' then 1 else 0 end) as t2
	FROM transaction) AS ts
```
<br>

In [12]:
cur.execute("SELECT (t2 - t1)::FLOAT / t1 FROM (SELECT sum(case when date_mutation <= '2020-03-31' then 1 else 0 end) as t1, sum(case when date_mutation BETWEEN '2020-04-01' AND '2020-06-30' then 1 else 0 end) as t2, sum(case when date_mutation BETWEEN '2020-07-01' AND '2020-09-30' then 1 else 0 end) as t3, sum(case when date_mutation >= '2020-10-01' then 1 else 0 end) as t4 FROM transaction) AS ts")
result = cur.fetchone()

print(f'The evolution rate bewteen the first and second semester of 2020 is equal to {round(result[0], 3)}%')


The evolution rate bewteen the first and second semester of 2020 is equal to -0.173%


-------------
##  Liste des communes où le nombre de ventes a augmenté d'au moins 20% entre le premier et le second trimestre de 2020

```sql
SELECT (t2 - t1)::FLOAT/ t1 , commune FROM
	(SELECT 
		sum(case when date_mutation <= '2020-03-31' then 1 else 0 end) as t1,
		sum(case when date_mutation BETWEEN '2020-04-01' AND '2020-06-30' then 1 else 0 end) as t2,
	 	commune
	FROM transaction
	LEFT JOIN bien ON transaction.bien_id = bien.id
	LEFT JOIN localisation ON bien.code_postal = localisation.code_postal
	LEFT JOIN commune ON localisation.code_commune = commune.code_commune
	GROUP BY commune.commune
	) ts WHERE (t2 - t1)::FLOAT/ t1 >= 0.2 AND commune IS NOT NULL ORDER BY (t2 - t1)::FLOAT/ t1 DESC;
```
<br>

In [21]:
cur.execute("SELECT (t2 - t1)::FLOAT/ t1 , commune FROM (SELECT sum(case when date_mutation <= '2020-03-31' then 1 else 0 end) as t1,sum(case when date_mutation BETWEEN '2020-04-01' AND '2020-06-30' then 1 else 0 end) as t2, commune FROM transaction LEFT JOIN bien ON transaction.bien_id = bien.id LEFT JOIN localisation ON bien.code_postal = localisation.code_postal LEFT JOIN commune ON localisation.code_commune = commune.code_commune GROUP BY commune.commune) ts WHERE (t2 - t1)::FLOAT/ t1 >= 0.2 AND commune IS NOT NULL ORDER BY (t2 - t1)::FLOAT/ t1 DESC LIMIT 20;")
result = cur.fetchall()

df = pd.DataFrame(result, columns=['Evolution rate', 'Municipality'])
df

Unnamed: 0,Evolution rate,Municipality
0,8.0,SERAUCOURT LE GRAND
1,4.444444,SISSONNE
2,2.941176,MAGNIEU
3,2.571429,LA BOISSE
4,2.208333,NIVOLLET-MONTGRIFFON
5,1.952096,ARMIX
6,1.769231,ST-MARTIN-DU-FRESNE
7,1.72093,MONTANGES
8,1.428571,SAINT MICHEL
9,1.219355,MARSONNAS
