# Export CSV table INPI siretisee

* The ID is kxb88sjjt94211q

## Objective(s)

*  Dans le but de partager de la travail de siretisation, nous devons exporter la table ets_inpi_insee_no_duplicate 
* La table contient partiellement des index_id avec des doublons. Cela est du a une mauvaise préparation de la donnée ou bien a des index impossibles en l’état a dédoublonner. Dès lors, il faut retirer ses index de la table a exporter
* Nous allons créer une table finale qui contient les informations des données brutes de l’INPI et les données transformées. Pour cela, il faut utiliser la table ets_inpi_sql 
  * Il faudra ensuite exporter en csv la table complète et une seconde table avec uniquement le siren, siren, ID séquence et variables référentielles de l’établissement au sens de l’INPI.
  * Les deux CSV seront disponibles [calfdata/TEMP_PARTAGE_DATA_INPI](https://s3.console.aws.amazon.com/s3/buckets/calfdata/TEMP_PARTAGE_DATA_INPI/?region=eu-west-3&tab=overview)
* Please, update the Source URL by clicking on the button after the information have been pasted
  * US 01 CSV INPI Modify rows
  * Delete tables and Github related to the US: Delete rows
  
## Metadata

* Epic: Epic 5
* US: US 1
* Date Begin: 9/14/2020
* Duration Task: 1
* Description: Export de la base INPI siretisee sans les doublons
* Status: Active
  * Change Status task: Active
  * Update table: Modify rows
* Source URL: US 01 CSV INPI
* Task type: Jupyter Notebook
* Users: Thomas Pernet
* Watchers: Thomas Pernet
* User Account: https://937882855452.signin.aws.amazon.com/console
* Estimated Log points: 5
* Task tag: #s3,#export-csv,#siretisation,#inpi
* Toggl Tag: #share-result

## Input Cloud Storage [AWS/GCP]

If link from the internet, save it to the cloud first
Table/file

* Origin: 
    * Athena
* Name: 
    * ets_insee_inpi_no_duplicate
    * ets_inpi_sql
* Github: 
    * https://github.com/thomaspernet/InseeInpi_matching/blob/master/Notebooks_matching/Data_preprocessed/programme_matching/08_US_DATUM/11_creation_table_ets_insee_inpi_no_duplicate.md
    * https://github.com/thomaspernet/InseeInpi_matching/blob/master/01_Data_preprocessing/Data_preprocessed/programme_matching/01_preparation/05_nettoyage_enseigne_inpi.md

## Destination Output/Delivery

Table/file

* Origin: 
    * Athena
* Name:
    * ets_inpi_no_doublon_siret
* GitHub:
 * https://github.com/thomaspernet/InseeInpi_matching/blob/master/Notebooks_matching/Notebooks_matching/Data_preprocessed/programme_matching/09_export_tables/00_export_table_no_doublon_inpi_siret.ipynb


## Connexion serveur

In [3]:
from awsPy.aws_authorization import aws_connector
from awsPy.aws_athena import service_athena
from awsPy.aws_s3 import service_s3
from pathlib import Path
import pandas as pd
import numpy as np
import seaborn as sns
import os, shutil

path = os.getcwd()
parent_path = str(Path(path).parent)
path_cred = r"{}/credential_AWS.json".format(parent_path)
con = aws_connector.aws_instantiate(credential = path_cred,
                                       region = 'eu-west-3')

region = 'eu-west-3'
bucket = 'calfdata'

In [4]:
bucket = 'calfdata'
con = aws_connector.aws_instantiate(credential = path_cred,
                                       region = region)
client= con.client_boto()
s3 = service_s3.connect_S3(client = client,
                      bucket = bucket, verbose = False) 

In [None]:
pandas_setting = True
if pandas_setting:
    cm = sns.light_palette("green", as_cmap=True)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.max_colwidth', None)

In [5]:
s3_output = 'inpi/sql_output'
database = 'siretisation'

# Brief analysis

Il y a certains `index_id` qui peuvent avoir des doublons après avoir merger avec la table `ets_inpi_sql` car la date de transmission est la même (ie le timestamp) Lors de nos développements, nous n'avons pas envisagé ce cas de figure, toutefois lors de la mise en production, cet aspect a été pris en compte.

Dans la query si dessous, nous allons imprimer les lignes ayant des doublons:

In [10]:
query = """
WITH merge_inpi AS (
  SELECT 
    ROW_NUMBER() OVER (PARTITION BY ets_insee_inpi_no_duplicate.index_id ORDER BY file_timestamp) AS row_id_group,
    ets_insee_inpi_no_duplicate.index_id, 
    ets_insee_inpi_no_duplicate.siren, 
    ets_insee_inpi_no_duplicate.siret, 
    ets_insee_inpi_no_duplicate.sequence_id
  FROM 
    siretisation.ets_insee_inpi_no_duplicate 
    INNER JOIN siretisation.ets_inpi_sql ON ets_insee_inpi_no_duplicate.index_id = siretisation.ets_inpi_sql.index_id 
  WHERE 
    count_index = 1
) 
SELECT 

  index_id, COUNT(*) AS cnt
  
FROM 
  merge_inpi 
GROUP BY  index_id
ORDER BY cnt DESC
LIMIT 20
"""
s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = "nb_index", ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )

Unnamed: 0,index_id,cnt
0,3610030,2
1,6825138,2
2,4408217,2
3,9651680,2
4,3610029,2
5,9651679,2
6,3610031,2
7,6825137,2
8,4408216,2
9,9651678,2


# Creation tables

Nous avons constaté dans avec la query précédente qu'il y avait 9 lignes ayant des timestamps de transmission identique. Pour ne pas avoir de doublons lors de la création de la table `ets_inpi_no_doublon_siret`, nous décidons de ne récupérer la première ligne. Ce n'est pas optimal comme solution!

{
	"StorageDescriptor": {
		"cols": {
			"FieldSchema": [
				{
					"name": "row_id_group",
					"type": "bigint",
					"comment": "Nombre de lignes par index_id. Normalement que des 1"
				},
				{
					"name": "index_id",
					"type": "bigint",
					"comment": "Identification "
				},
				{
					"name": "siren",
					"type": "string",
					"comment": ""
				},
				{
					"name": "siret",
					"type": "string",
					"comment": ""
				},
				{
					"name": "sequence_id",
					"type": "bigint",
					"comment": ""
				},
				{
					"name": "code_greffe",
					"type": "string",
					"comment": ""
				},
				{
					"name": "nom_greffe",
					"type": "string",
					"comment": ""
				},
				{
					"name": "numero_gestion",
					"type": "string",
					"comment": ""
				},
				{
					"name": "id_etablissement",
					"type": "string",
					"comment": ""
				},
				{
					"name": "status",
					"type": "string",
					"comment": ""
				},
				{
					"name": "origin",
					"type": "string",
					"comment": ""
				},
				{
					"name": "date_greffe",
					"type": "string",
					"comment": ""
				},
				{
					"name": "file_timestamp",
					"type": "string",
					"comment": ""
				},
				{
					"name": "libelle_evt",
					"type": "string",
					"comment": ""
				},
				{
					"name": "last_libele_evt",
					"type": "string",
					"comment": ""
				},
				{
					"name": "status_admin",
					"type": "varchar(1)",
					"comment": ""
				},
				{
					"name": "type",
					"type": "string",
					"comment": ""
				},
				{
					"name": "status_ets",
					"type": "varchar(5)",
					"comment": ""
				},
				{
					"name": "siège_pm",
					"type": "string",
					"comment": ""
				},
				{
					"name": "rcs_registre",
					"type": "string",
					"comment": ""
				},
				{
					"name": "adresse_ligne1",
					"type": "string",
					"comment": ""
				},
				{
					"name": "adresse_ligne2",
					"type": "string",
					"comment": ""
				},
				{
					"name": "adresse_ligne3",
					"type": "string",
					"comment": ""
				},
				{
					"name": "adresse_reconstituee_inpi",
					"type": "string",
					"comment": ""
				},
				{
					"name": "adresse_distance_inpi",
					"type": "string",
					"comment": ""
				},
				{
					"name": "list_numero_voie_matching_inpi",
					"type": "array<string>",
					"comment": ""
				},
				{
					"name": "numero_voie_matching",
					"type": "string",
					"comment": ""
				},
				{
					"name": "voie_clean",
					"type": "string",
					"comment": ""
				},
				{
					"name": "type_voie_matching",
					"type": "string",
					"comment": ""
				},
				{
					"name": "code_postal",
					"type": "string",
					"comment": ""
				},
				{
					"name": "code_postal_matching",
					"type": "string",
					"comment": ""
				},
				{
					"name": "ville",
					"type": "string",
					"comment": ""
				},
				{
					"name": "ville_matching",
					"type": "string",
					"comment": ""
				},
				{
					"name": "code_commune",
					"type": "string",
					"comment": ""
				},
				{
					"name": "pays",
					"type": "string",
					"comment": ""
				},
				{
					"name": "domiciliataire_nom",
					"type": "string",
					"comment": ""
				},
				{
					"name": "domiciliataire_siren",
					"type": "string",
					"comment": ""
				},
				{
					"name": "domiciliataire_greffe",
					"type": "string",
					"comment": ""
				},
				{
					"name": "domiciliataire_complément",
					"type": "string",
					"comment": ""
				},
				{
					"name": "siege_domicile_représentant",
					"type": "string",
					"comment": ""
				},
				{
					"name": "nom_commercial",
					"type": "string",
					"comment": ""
				},
				{
					"name": "enseigne",
					"type": "string",
					"comment": ""
				},
				{
					"name": "activité_ambulante",
					"type": "string",
					"comment": ""
				},
				{
					"name": "activité_saisonnière",
					"type": "string",
					"comment": ""
				},
				{
					"name": "activité_non_sédentaire",
					"type": "string",
					"comment": ""
				},
				{
					"name": "date_début_activité",
					"type": "string",
					"comment": ""
				},
				{
					"name": "activité",
					"type": "string",
					"comment": ""
				},
				{
					"name": "origine_fonds",
					"type": "string",
					"comment": ""
				},
				{
					"name": "origine_fonds_info",
					"type": "string",
					"comment": ""
				},
				{
					"name": "type_exploitation",
					"type": "string",
					"comment": ""
				},
				{
					"name": "csv_source",
					"type": "string",
					"comment": ""
				}
			]
		},
		"location": "s3://calfdata/inpi/sql_output/tables/bf7473f3-4aab-4389-abed-ccc92e1d42ec/",
		"inputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
		"outputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
		"compressed": "false",
		"numBuckets": "0",
		"SerDeInfo": {
			"name": "ets_inpi_no_doublon_siret",
			"serializationLib": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
			"parameters": {}
		},
		"bucketCols": [],
		"sortCols": [],
		"parameters": {},
		"SkewedInfo": {},
		"storedAsSubDirectories": "false"
	},
	"parameters": {
		"EXTERNAL": "TRUE",
		"has_encrypted_data": "false"
	}
}

## Steps

In [12]:
query = """
DROP TABLE `ets_inpi_no_doublon_siret`;
"""
s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = None, ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )

{'Results': {'State': 'SUCCEEDED',
  'SubmissionDateTime': datetime.datetime(2020, 9, 14, 15, 35, 53, 951000, tzinfo=tzlocal()),
  'CompletionDateTime': datetime.datetime(2020, 9, 14, 15, 35, 55, 83000, tzinfo=tzlocal())},
 'QueryID': 'ad43640e-cc28-4fbf-b404-68a203e4d0a6'}

In [14]:
query = """
CREATE TABLE siretisation.ets_inpi_no_doublon_siret
WITH (
  format='PARQUET'
) AS
WITH merge_inpi AS (
  SELECT 
    ROW_NUMBER() OVER (PARTITION BY ets_insee_inpi_no_duplicate.index_id ORDER BY file_timestamp) AS row_id_group,
    ets_insee_inpi_no_duplicate.index_id, 
    ets_insee_inpi_no_duplicate.siren, 
    ets_insee_inpi_no_duplicate.siret, 
    ets_insee_inpi_no_duplicate.sequence_id,
    code_greffe, 
    nom_greffe, 
    numero_gestion, 
    id_etablissement, 
    status, 
    origin, 
    date_greffe, 
    file_timestamp, 
    libelle_evt, 
    last_libele_evt, 
    ets_insee_inpi_no_duplicate.status_admin, 
    type, 
    ets_insee_inpi_no_duplicate.status_ets, 
    "siège_pm", 
    rcs_registre, 
    adresse_ligne1, 
    adresse_ligne2, 
    adresse_ligne3, 
    adresse_reconstituee_inpi, 
    ets_insee_inpi_no_duplicate.adresse_distance_inpi, 
    ets_insee_inpi_no_duplicate.list_numero_voie_matching_inpi, 
    numero_voie_matching, 
    voie_clean, 
    type_voie_matching, 
    code_postal, 
    code_postal_matching, 
    ville, 
    ville_matching, 
    code_commune, 
    pays, 
    domiciliataire_nom, 
    domiciliataire_siren, 
    domiciliataire_greffe, 
    "domiciliataire_complément", 
    "siege_domicile_représentant", 
    nom_commercial, 
    ets_insee_inpi_no_duplicate.enseigne, 
    "activité_ambulante", 
    "activité_saisonnière", 
    "activité_non_sédentaire", 
    ets_insee_inpi_no_duplicate."date_début_activité", 
    "activité", 
    origine_fonds, 
    origine_fonds_info, 
    type_exploitation, 
    csv_source 
  FROM 
    siretisation.ets_insee_inpi_no_duplicate 
    INNER JOIN siretisation.ets_inpi_sql ON ets_insee_inpi_no_duplicate.index_id = siretisation.ets_inpi_sql.index_id 
  WHERE 
    count_index = 1
) 
SELECT 

  *
  
FROM 
  merge_inpi 
WHERE row_id_group = 1    
"""

s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = None, ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )

{'Results': {'State': 'SUCCEEDED',
  'SubmissionDateTime': datetime.datetime(2020, 9, 14, 15, 36, 56, 523000, tzinfo=tzlocal()),
  'CompletionDateTime': datetime.datetime(2020, 9, 14, 15, 38, 9, 474000, tzinfo=tzlocal())},
 'QueryID': 'bf7473f3-4aab-4389-abed-ccc92e1d42ec'}

Maintenant que la table est créée, nous pouvons la copier dans le dossier [calfdata/TEMP_PARTAGE_DATA_INPI](https://s3.console.aws.amazon.com/s3/buckets/calfdata/TEMP_PARTAGE_DATA_INPI/?region=eu-west-3&tab=overview)

In [44]:
query = """
SELECT index_id, siren, siret, sequence_id, code_greffe, nom_greffe, numero_gestion, id_etablissement
FROM ets_inpi_no_doublon_siret 
"""
output = s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = None, ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )

In [52]:
source_key =  '{}/{}.csv'.format(s3_output, output['QueryID'])
destination_key_filename = '{}/{}.csv'.format('TEMP_PARTAGE_DATA_INPI', 'inpi_siret')
s3.copy_object_s3(source_key = source_key,
                              destination_key = destination_key_filename,
                              remove = True
                                                )

'inpi/sql_output/deb369ef-a55c-4dd6-968d-bc6e9c50ff2d.csv'

In [56]:
query = """
SELECT *
FROM ets_inpi_no_doublon_siret 
"""
output = s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = None, ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )

In [57]:
source_key =  '{}/{}.csv'.format(s3_output, output['QueryID'])
destination_key_filename = '{}/{}.csv'.format('TEMP_PARTAGE_DATA_INPI', 'inpi_siret_full')
s3.copy_object_s3(source_key = source_key,
                              destination_key = destination_key_filename,
                              remove = True )

True

# Analyse table

Nombre de lignes

In [16]:
query = """
SELECT COUNT(*) as cnt
FROM ets_inpi_no_doublon_siret
"""
s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = "analyse_1", ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )

Unnamed: 0,cnt
0,9549221


Nombre de siren

In [18]:
query = """
SELECT COUNT(DISTINCT(siren)) as CNT
FROM ets_inpi_no_doublon_siret
"""
s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = "analyse_2", ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )

Unnamed: 0,CNT
0,5315882


Nombre de siret

In [19]:
query = """
SELECT COUNT(DISTINCT(siret)) as CNT
FROM ets_inpi_no_doublon_siret
"""
s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = "analyse_3", ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )

Unnamed: 0,CNT
0,6095107


Nombre d'établissements par ville

In [43]:
query = """
SELECT ville_matching, COUNT(DISTINCT(siret)) as CNT
FROM ets_inpi_no_doublon_siret
GROUP BY ville_matching
ORDER BY CNT DESC
LIMIT 25
"""
(
    s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = "analyse_4", ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )
    .set_index('ville_matching')
    .style
    .format("{:,.0f}")
    .bar(subset= ['CNT' ],
                       color='#d65f5f')
)

Unnamed: 0_level_0,CNT
ville_matching,Unnamed: 1_level_1
PARIS,566506
MARSEILLE,109371
NICE,56640
TOULOUSE,50138
BORDEAUX,40916
MONTPELLIER,34616
NANTES,31333
BOULOGNEBILLANCOURT,23676
RENNES,21390
AIXENPROVENCE,19322


Nombre d"établissements par Greffe

In [41]:
query = """
SELECT nom_greffe, COUNT(DISTINCT(siret)) as CNT
FROM ets_inpi_no_doublon_siret
GROUP BY nom_greffe
ORDER BY CNT DESC
LIMIT 25
"""
(
    s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = "analyse_4", ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )
    .set_index('nom_greffe')
    .style
    .format("{:,.0f}")
    .bar(subset= ['CNT' ],
                       color='#d65f5f')
)

Unnamed: 0_level_0,CNT
nom_greffe,Unnamed: 1_level_1
Paris,566518
Nanterre,192190
Bordeaux,156321
Bobigny,151958
Marseille,136482
Versailles,129834
Toulouse,128577
Créteil,122785
Lyon,117515
Lille Métropole,106684


Nombre d'établissements créés par année

In [27]:
query = """
SELECT YEAR(
Coalesce(
      try(
        date_parse(
          "date_début_activité", '%Y-%m-%d'
        )
      ), 
      try(
        date_parse(
          "date_début_activité", '%Y-%m-%d %hh:%mm:%ss.SSS'
        )
      ), 
      try(
        date_parse(
          "date_début_activité", '%Y-%m-%d %hh:%mm:%ss'
        )
      ), 
      try(
        cast(
          "date_début_activité" as timestamp
        )
      )
    ) 
) as date_debut_activite,


COUNT(DISTINCT(siret)) as CNT
FROM ets_inpi_no_doublon_siret
GROUP BY YEAR(
Coalesce(
      try(
        date_parse(
          "date_début_activité", '%Y-%m-%d'
        )
      ), 
      try(
        date_parse(
          "date_début_activité", '%Y-%m-%d %hh:%mm:%ss.SSS'
        )
      ), 
      try(
        date_parse(
          "date_début_activité", '%Y-%m-%d %hh:%mm:%ss'
        )
      ), 
      try(
        cast(
          "date_début_activité" as timestamp
        )
      )
    ) 
)
ORDER BY CNT DESC
LIMIT 25
"""
(
    s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = "analyse_4", ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )
    .dropna()
    .style
    .format("{:,.0f}")
    .bar(subset= ['CNT' ],
                       color='#d65f5f')
)

Unnamed: 0,date_debut_activite,CNT
0,,2481652
1,2019.0,413012
2,2018.0,402728
3,2016.0,392317
4,2017.0,387392
5,2015.0,335819
6,2014.0,275551
7,2013.0,250670
8,2012.0,239519
9,2011.0,229117


Nombre de siret par événements

A verifier pourquoi nombre de lignes différents du nombre de siret par ville

In [30]:
query = """
SELECT ville_matching, libelle_evt,
COUNT(DISTINCT(siret)) as CNT
FROM ets_inpi_no_doublon_siret
GROUP BY ville_matching, libelle_evt
ORDER BY CNT DESC
"""

output = (
    s3.run_query(
            query=query,
            database=database,
            s3_output=s3_output,
  filename = "analyse_4", ## Add filename to print dataframe
  destination_key = None ### Add destination key if need to copy output
        )
    #.dropna()
    #.style
    #.format("{:,.0f}")
    #.bar(subset= ['CNT' ],
    #                   color='#d65f5f')
)

In [40]:
(output
 #.dropna()
 .set_index(['ville_matching','libelle_evt'])
 .unstack(-1)
 .assign(total = lambda x: x.sum(axis = 1))
 .sort_values(by = 'total', ascending = False)
 .head(25)
 .fillna(0)
 .style
 .format("{:,.0f}")
 .bar(subset= ['total' ],
                       color='#d65f5f')
)

Unnamed: 0_level_0,CNT,CNT,CNT,CNT,total
libelle_evt,Etablissement ouvert,Etablissement supprimé,Modifications relatives au dossier,Modifications relatives à un établissement,Unnamed: 5_level_1
ville_matching,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
PARIS,539691,57735,0,145208,742634
MARSEILLE,108077,679,187,4853,113796
NICE,55725,352,96,2836,59009
TOULOUSE,49639,333,141,1450,51563
BORDEAUX,40315,443,83,2474,43315
MONTPELLIER,34192,865,60,2330,37447
NANTES,31177,2763,20,1027,34987
BOULOGNEBILLANCOURT,23347,153,33,1226,24759
RENNES,21020,164,93,1043,22320
AIXENPROVENCE,19144,879,35,1123,21181


# Generation report

In [None]:
import os, time, shutil, urllib, ipykernel, json
from pathlib import Path
from notebook import notebookapp

In [None]:
def create_report(extension = "html", keep_code = False):
    """
    Create a report from the current notebook and save it in the 
    Report folder (Parent-> child directory)
    
    1. Exctract the current notbook name
    2. Convert the Notebook 
    3. Move the newly created report
    
    Args:
    extension: string. Can be "html", "pdf", "md"
    
    
    """
    
    ### Get notebook name
    connection_file = os.path.basename(ipykernel.get_connection_file())
    kernel_id = connection_file.split('-', 1)[0].split('.')[0]

    for srv in notebookapp.list_running_servers():
        try:
            if srv['token']=='' and not srv['password']:  
                req = urllib.request.urlopen(srv['url']+'api/sessions')
            else:
                req = urllib.request.urlopen(srv['url']+ \
                                             'api/sessions?token=' + \
                                             srv['token'])
            sessions = json.load(req)
            notebookname = sessions[0]['name']
        except:
            pass  
    
    sep = '.'
    path = os.getcwd()
    #parent_path = str(Path(path).parent)
    
    ### Path report
    #path_report = "{}/Reports".format(parent_path)
    #path_report = "{}/Reports".format(path)
    
    ### Path destination
    name_no_extension = notebookname.split(sep, 1)[0]
    source_to_move = name_no_extension +'.{}'.format(extension)
    dest = os.path.join(path,'Reports', source_to_move)
    
    ### Generate notebook
    if keep_code:
        os.system('jupyter nbconvert --to {} {}'.format(
    extension,notebookname))
    else:
        os.system('jupyter nbconvert --no-input --to {} {}'.format(
    extension,notebookname))
    
    ### Move notebook to report folder
    #time.sleep(5)
    shutil.move(source_to_move, dest)
    print("Report Available at this adress:\n {}".format(dest))

In [None]:
create_report(extension = "html")