## 1. Cluster_PreparaDatos

### Objetivo

Proceso general de preparación de datos en donde se identificarán las variables que participarán en los procesos posteriores de clusterización. Se considerarán todas las variables de entrada.

### Descripción General de notebook

    1. Carga de datos base:
        - Movimientos históricos
        - Demografía por barrio
        - Geografía por estación Bicimad
        
    2. Concatenación de datasets
    3. Exportar DataFrame a CSV

## 1. Carga de datos base

In [1]:
import time
import pandas as pd

from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score, adjusted_rand_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

%run "../1. Librerias Mongo/MongoDB_Connections.ipynb"
%run "../1. Librerias Mongo/MongoDB_Funciones_Consultas.ipynb"

### Movimientos históricos

Carga de 2.912.138 registros históricos de viajes Bicimad

In [6]:
t_ini = time.time()
data_Tracks = pd.read_csv('../../Data/DataFrame_Final_Cierre_2017_2019.csv', parse_dates=['FECHA'])
t_end = time.time()
print (t_end - t_ini)

print(data_Tracks.shape)
data_Tracks.info()

2.7475762367248535
(2912138, 21)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2912138 entries, 0 to 2912137
Data columns (total 21 columns):
 #   Column               Dtype         
---  ------               -----         
 0   ESTACION             int64         
 1   ANIO                 int64         
 2   MES                  int64         
 3   DIA                  int64         
 4   HORA                 int64         
 5   FECHA                datetime64[ns]
 6   DIA_SEMANA           int64         
 7   AM_PM                object        
 8   TEMPORADA            object        
 9   TEMPORADA_NUM        int64         
 10  Es_Festivo           int64         
 11  Es_FinSemana         int64         
 12  TEMPERATURA          float64       
 13  VIENTO               float64       
 14  PRESION              float64       
 15  HUMEDAD              float64       
 16  PRECIPITACION_1h     float64       
 17  PRECIPITACION_3h     float64       
 18  DESC_TIEMPO          object  

Visualización de demanda y cantidad de viajes consolidados por estación según registros históricos

In [7]:
data_Tracks.groupby('ESTACION').agg(DEMANDA=('DEMANDA', 'sum'), CUENTA=('DEMANDA', 'count'))

Unnamed: 0_level_0,DEMANDA,CUENTA
ESTACION,Unnamed: 1_level_1,Unnamed: 2_level_1
1,90192,19681
2,42031,14344
3,69541,19659
4,48318,16108
5,44461,17600
...,...,...
171,58379,17302
172,57917,17496
173,19783,10912
174,33287,14239


Demanda consolidada por año de historia

In [8]:
data_Tracks.groupby('ANIO').agg(DEMANDA=('DEMANDA', 'sum'))

Unnamed: 0_level_0,DEMANDA
ANIO,Unnamed: 1_level_1
2017,2741661
2018,3387010
2019,3645679
2020,476381


### Datos Demográficos por Barrio de Madrid

Los datos demográficos se encuentran cargados dentro de la colección "Demografia" en la base de datos BiciMAD en MongoDB Atlas

In [9]:
# DEMOGRAFIA
db_DemoG = _connect_mongo('cloud', 'cluster0.15npsxw.mongodb.net', None, 'ucmtfm2022', 'UCM_2022', 'BiciMAD', 'Demografia')
data_Demografia = _consulta_Demografia(db_DemoG)
data_Demografia

Conexion OK
Collection(Database(MongoClient(host=['ac-x1d17w2-shard-00-01.15npsxw.mongodb.net:27017', 'ac-x1d17w2-shard-00-02.15npsxw.mongodb.net:27017', 'ac-x1d17w2-shard-00-00.15npsxw.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, authsource='admin', replicaset='atlas-xyv6ql-shard-0', tls=True, serverselectiontimeoutms=4000, tlscafile='C:\\ProgramData\\Anaconda3\\lib\\site-packages\\certifi\\cacert.pem'), 'BiciMAD'), 'Demografia')


Unnamed: 0,Distrito,Distrito_Nombre,Barrio,Barrio_Nombre,Tasa_Paro,Renta_Media_Persona,Renta_Media_Hogar,Poblacion
0,1,Centro,01-03,Cortes,4087029691,209470742,409625241,10760
1,1,Centro,01-01,Palacio,5675629192,1944300705,3897353727,23708
2,1,Centro,01-05,Universidad,4283483014,1788222351,3534422565,33434
3,3,Retiro,03-01,Pacifico,4228419906,1952732758,4483175729,33879
4,3,Retiro,03-03,Estrella,3593663803,2369669349,6349510119,23504
...,...,...,...,...,...,...,...,...
126,19,Vicalvaro,19-02,Valdebernardo,7433348396,1443877896,4147793538,17851
127,20,San Blas - Canillejas,20-01,Simancas,5999410377,132980422,3359267455,28799
128,20,San Blas - Canillejas,20-08,El Salvador,493547958,2171906345,5731967856,11516
129,21,Barajas,21-02,Aeropuerto,5705474171,10330,27316,1911


In [10]:
data_Demografia.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131 entries, 0 to 130
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Distrito             131 non-null    int64 
 1   Distrito_Nombre      131 non-null    object
 2   Barrio               131 non-null    object
 3   Barrio_Nombre        131 non-null    object
 4   Tasa_Paro            131 non-null    object
 5   Renta_Media_Persona  131 non-null    object
 6   Renta_Media_Hogar    131 non-null    object
 7   Poblacion            131 non-null    int64 
dtypes: int64(2), object(6)
memory usage: 8.3+ KB


### Datos geográficos por Estación BiciMad

Los datos de ubicación geográfica de cada estación de BiciMad se encuentran cargados dentro de la colección "Estaciones" en la base de datos BiciMAD en MongoDB Atlas

In [11]:
db_Stations = _connect_mongo('cloud', 'cluster0.15npsxw.mongodb.net', None, 'ucmtfm2022', 'UCM_2022', 'BiciMAD', 'Estaciones')
data_Stations = _consulta_stations(db_Stations)

data_Stations = data_Stations.drop(columns='Finca')

data_Stations

Conexion OK
Collection(Database(MongoClient(host=['ac-x1d17w2-shard-00-01.15npsxw.mongodb.net:27017', 'ac-x1d17w2-shard-00-02.15npsxw.mongodb.net:27017', 'ac-x1d17w2-shard-00-00.15npsxw.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, authsource='admin', replicaset='atlas-xyv6ql-shard-0', tls=True, serverselectiontimeoutms=4000, tlscafile='C:\\ProgramData\\Anaconda3\\lib\\site-packages\\certifi\\cacert.pem'), 'BiciMAD'), 'Estaciones')


Unnamed: 0,Id_Estacion,Nro_Estacion,Gis_X,Gis_Y,Fec_Alta,Distrito,Distrito_Nombre,Barrio,Barrio_Nombre,Calle,Tipo_Reserva,Plazas,Longitud,Latitud,Direccion
0,6,5,44044706,44755396,41813,1,CENTRO,01-04,JUSTICIA,"FUENCARRAL, CALLE, DE",BiciMAD,27,-3.702074,40.428362,"FUENCARRAL, CALLE, DE, 106"
1,5,4,4403964,447556536,41813,1,CENTRO,01-05,UNIVERSIDAD,"MANUELA MALASA�A, CALLE, DE",BiciMAD,24,-3.702674,40.428590,"MANUELA MALASA�A, CALLE, DE, 3"
2,10,9,43981351,447412947,41813,1,CENTRO,01-01,PALACIO,"SAN MIGUEL, PLAZA, DE",BiciMAD,24,-3.709409,40.415613,"SAN MIGUEL, PLAZA, DE, 9"
3,12,11,44031486,447539519,41813,1,CENTRO,01-05,UNIVERSIDAD,"SAN ANDRES, CALLE, DE",BiciMAD,24,-3.703619,40.427052,"SAN ANDRES, CALLE, DE, 20"
4,13,12,44009553,447556073,41813,1,CENTRO,01-05,UNIVERSIDAD,"SAN BERNARDO, CALLE, DE",BiciMAD,24,-3.706220,40.428527,"SAN BERNARDO, CALLE, DE, 87"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
261,257,249,44191427,448119542,44194,5,CHAMARTiN,05-06,CASTILLA,"CASTELLANA, PASEO, DE LA",BiciMAD,24,-3.685296,40.479416,"CASTELLANA, PASEO, DE LA, frente al 298"
262,258,250,44181282,447691124,44194,5,CHAMARTiN,05-01,EL VISO,"SERRANO, CALLE, DE",BiciMAD,24,-3.686100,40.440815,"SERRANO, CALLE, DE, 113 b"
263,259,251,44344795,447727258,44194,5,CHAMARTiN,05-02,PROSPERIDAD,"SANTA HORTENSIA, CALLE, DE",BiciMAD,24,-3.666853,40.444183,"SANTA HORTENSIA, CALLE, DE, 31"
264,263,255,44191045,447945404,44194,5,CHAMARTiN,05-05,NUEVA ESPA�A,"GENERAL LOPEZ POZAS, CALLE, DEL",BiciMAD,24,-3.685182,40.463729,"GENERAL LOPEZ POZAS, CALLE, DEL, 2"


## 2. Concatenación (Join) de datasets

### Estaciones - Demografia

In [28]:
df_EstacionDemog = pd.merge(data_Stations, data_Demografia, how='left', 
                            left_on=['Distrito', 'Barrio'], 
                            right_on=['Distrito', 'Barrio'])
df_EstacionDemog = df_EstacionDemog.drop(columns=['Distrito_Nombre_y', 'Barrio_Nombre_y'])
df_EstacionDemog.rename(columns = {'Id_Estacion':'ESTACION', 'Distrito_Nombre_x':'Distrito_Nombre', 
                                   'Barrio_Nombre_x':'Barrio_Nombre'}, inplace = True)

In [29]:
df_EstacionDemog

Unnamed: 0,ESTACION,Nro_Estacion,Gis_X,Gis_Y,Fec_Alta,Distrito,Distrito_Nombre,Barrio,Barrio_Nombre,Calle,Tipo_Reserva,Plazas,Longitud,Latitud,Direccion,Tasa_Paro,Renta_Media_Persona,Renta_Media_Hogar,Poblacion
0,6,5,44044706,44755396,41813,1,CENTRO,01-04,JUSTICIA,"FUENCARRAL, CALLE, DE",BiciMAD,27,-3.702074,40.428362,"FUENCARRAL, CALLE, DE, 106",4374558304.0,2375966607.0,4820279271.0,18072.0
1,5,4,4403964,447556536,41813,1,CENTRO,01-05,UNIVERSIDAD,"MANUELA MALASA�A, CALLE, DE",BiciMAD,24,-3.702674,40.42859,"MANUELA MALASA�A, CALLE, DE, 3",4283483014.0,1788222351.0,3534422565.0,33434.0
2,10,9,43981351,447412947,41813,1,CENTRO,01-01,PALACIO,"SAN MIGUEL, PLAZA, DE",BiciMAD,24,-3.709409,40.415613,"SAN MIGUEL, PLAZA, DE, 9",5675629192.0,1944300705.0,3897353727.0,23708.0
3,12,11,44031486,447539519,41813,1,CENTRO,01-05,UNIVERSIDAD,"SAN ANDRES, CALLE, DE",BiciMAD,24,-3.703619,40.427052,"SAN ANDRES, CALLE, DE, 20",4283483014.0,1788222351.0,3534422565.0,33434.0
4,13,12,44009553,447556073,41813,1,CENTRO,01-05,UNIVERSIDAD,"SAN BERNARDO, CALLE, DE",BiciMAD,24,-3.70622,40.428527,"SAN BERNARDO, CALLE, DE, 87",4283483014.0,1788222351.0,3534422565.0,33434.0
5,14,13,43973378,447543465,41813,1,CENTRO,01-05,UNIVERSIDAD,"CONDE DUQUE, CALLE, DEL",BiciMAD,24,-3.710473,40.427365,"CONDE DUQUE, CALLE, DEL, 22",4283483014.0,1788222351.0,3534422565.0,33434.0
6,16,15,43999677,447531359,43881,1,CENTRO,01-05,UNIVERSIDAD,"NORTE, CALLE, DEL",BiciMAD,21,-3.707361,40.426294,"NORTE, CALLE, DEL, 10",4283483014.0,1788222351.0,3534422565.0,33434.0
7,21,020 ampliacion,44101452,447451649,43291,1,CENTRO,01-04,JUSTICIA,"ALCALA, CALLE, DE",BiciMAD,6,-3.69529,40.419186,"ALCALA, CALLE, DE, 49",4374558304.0,2375966607.0,4820279271.0,18072.0
8,24,24,43971567,447438475,41813,1,CENTRO,01-01,PALACIO,"CARLOS III, CALLE, DE",BiciMAD,24,-3.710587,40.417906,"CARLOS III, CALLE, DE, 1",5675629192.0,1944300705.0,3897353727.0,23708.0
9,30,26,44108203,447503158,41813,1,CENTRO,01-04,JUSTICIA,"SALESAS, PLAZA, DE LAS",BiciMAD,24,-3.694542,40.423831,"SALESAS, PLAZA, DE LAS, 8",4374558304.0,2375966607.0,4820279271.0,18072.0


In [30]:
df_EstacionDemog.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 266 entries, 0 to 265
Data columns (total 19 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ESTACION             266 non-null    int64  
 1   Nro_Estacion         266 non-null    object 
 2   Gis_X                266 non-null    object 
 3   Gis_Y                266 non-null    object 
 4   Fec_Alta             266 non-null    object 
 5   Distrito             266 non-null    int64  
 6   Distrito_Nombre      266 non-null    object 
 7   Barrio               266 non-null    object 
 8   Barrio_Nombre        266 non-null    object 
 9   Calle                266 non-null    object 
 10  Tipo_Reserva         266 non-null    object 
 11  Plazas               266 non-null    int64  
 12  Longitud             266 non-null    float64
 13  Latitud              266 non-null    float64
 14  Direccion            266 non-null    object 
 15  Tasa_Paro            264 non-null    obj

### Estacion y movimientos consolidados totales

El objetivo de esta concatenación es agregarle al dataframe de Estaciones un valor representativo de la demanda histórica de viajes realizado desde cada una de ellas. El parámetro a incorporar es DEMANDA_RATIO y que se calcula como:

DEMANDA_RATIO = ( DEMANDA_TOTAL / MESES )

El objetivo de este parámetro es crear un valor que permita comparar la demanda general entre estaciones, independiente de si tienen 24 meses de funcionamiento o 12 meses de entrada en vigencia.

In [15]:
#Creando columna ANIOMES
data_Tracks['ANIOMES'] = data_Tracks['ANIO'].astype(str)+("0"+data_Tracks['MES'].astype(str)).str[-2:]
data_Tracks

Unnamed: 0,ESTACION,ANIO,MES,DIA,HORA,FECHA,DIA_SEMANA,AM_PM,TEMPORADA,TEMPORADA_NUM,...,TEMPERATURA,VIENTO,PRESION,HUMEDAD,PRECIPITACION_1h,PRECIPITACION_3h,DESC_TIEMPO,DESC_TIEMPO_detalle,DEMANDA,ANIOMES
0,1,2017,3,31,23,2017-03-31 23:00:00,6,PM,INVIERNO,1,...,12.58,7.72,1020.0,44.0,0.0,0.0,Clouds,few clouds,7,201703
1,2,2017,3,31,23,2017-03-31 23:00:00,6,PM,INVIERNO,1,...,12.58,7.72,1020.0,44.0,0.0,0.0,Clouds,few clouds,3,201703
2,3,2017,3,31,23,2017-03-31 23:00:00,6,PM,INVIERNO,1,...,12.58,7.72,1020.0,44.0,0.0,0.0,Clouds,few clouds,1,201703
3,4,2017,3,31,23,2017-03-31 23:00:00,6,PM,INVIERNO,1,...,12.58,7.72,1020.0,44.0,0.0,0.0,Clouds,few clouds,1,201703
4,5,2017,3,31,23,2017-03-31 23:00:00,6,PM,INVIERNO,1,...,12.58,7.72,1020.0,44.0,0.0,0.0,Clouds,few clouds,2,201703
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2912133,168,2020,2,29,23,2020-02-29 23:00:00,7,PM,INVIERNO,1,...,9.60,6.20,1015.0,69.0,0.0,0.0,Clouds,few clouds,2,202002
2912134,169,2020,2,29,23,2020-02-29 23:00:00,7,PM,INVIERNO,1,...,9.60,6.20,1015.0,69.0,0.0,0.0,Clouds,few clouds,8,202002
2912135,171,2020,2,29,23,2020-02-29 23:00:00,7,PM,INVIERNO,1,...,9.60,6.20,1015.0,69.0,0.0,0.0,Clouds,few clouds,2,202002
2912136,172,2020,2,29,23,2020-02-29 23:00:00,7,PM,INVIERNO,1,...,9.60,6.20,1015.0,69.0,0.0,0.0,Clouds,few clouds,1,202002


In [19]:
# Cálculo de meses consolidados totales para cada una de las estaciones

estacion_meses= data_Tracks.groupby('ESTACION').ANIOMES.nunique()
estacion_meses

ESTACION
1      36
2      32
3      36
4      36
5      36
       ..
171    36
172    35
173    35
174    36
175    34
Name: ANIOMES, Length: 172, dtype: int64

In [20]:
# Demanda TOTAL por estacion

estacion_tracks = data_Tracks.groupby('ESTACION').agg(DEMANDA=('DEMANDA', 'sum'))
estacion_tracks

Unnamed: 0_level_0,DEMANDA
ESTACION,Unnamed: 1_level_1
1,90192
2,42031
3,69541
4,48318
5,44461
...,...
171,58379
172,57917
173,19783
174,33287


Cálculo de DEMANDA_RATIO como la relación entre la DEMANDA_TOTAL y la cantidad de meses con movimientos para cada estación

In [21]:
estacion = pd.merge(estacion_tracks, estacion_meses, how='left', left_on='ESTACION', right_on='ESTACION')
estacion['DEMANDA_RATIO'] = estacion['DEMANDA']/estacion['ANIOMES']
estacion.sort_values('DEMANDA_RATIO')

Unnamed: 0_level_0,DEMANDA,ANIOMES,DEMANDA_RATIO
ESTACION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
28,12832,34,377.411765
119,17845,35,509.857143
29,18887,35,539.628571
173,19783,35,565.228571
144,21312,35,608.914286
...,...,...,...
149,120202,35,3434.342857
43,128966,36,3582.388889
175,126096,34,3708.705882
57,140045,36,3890.138889


### Concatenación de Estaciones con Datos Demográficos por Barrio

In [22]:
df_FinalCluster = pd.merge(estacion, df_EstacionDemog, how='left', left_on='ESTACION', right_on='ESTACION')
df_FinalCluster = df_FinalCluster.drop(columns=['Nro_Estacion', 'Fec_Alta', 'Distrito_Nombre_x', 'Barrio_Nombre_x', 'Calle', 'Direccion', 'Tipo_Reserva', 'DEMANDA', 'ANIOMES'])
df_FinalCluster


Unnamed: 0,ESTACION,DEMANDA_RATIO,Gis_X,Gis_Y,Distrito,Barrio,Plazas,Longitud,Latitud,Tasa_Paro,Renta_Media_Persona,Renta_Media_Hogar,Poblacion
0,1,2505.333333,44044361,447429065,1,01-06,30,-3.701998,40.417111,4309681391,1755000279,3505688536,7665.0
1,2,1313.468750,44048056,447430174,1,01-06,30,-3.701564,40.417213,4309681391,1755000279,3505688536,7665.0
2,3,1931.694444,44013483,447467823,1,01-05,24,-3.705674,40.420580,4283483014,1788222351,3534422565,33434.0
3,4,1342.166667,44001298,447576068,7,07-02,18,-3.707212,40.430322,3985756232,2139721963,4688235448,24770.0
4,5,1235.027778,4403964,447556536,1,01-05,24,-3.702674,40.428590,4283483014,1788222351,3534422565,33434.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
167,171,1621.638889,44305011,447567585,4,04-05,24,-3.671401,40.429772,3250073035,2372893596,544876824,21283.0
168,172,1654.771429,44263095,447872371,5,05-04,24,-3.676618,40.457200,3405684755,2703306823,6938919168,32201.0
169,173,565.228571,44202496,447840151,5,05-04,24,-3.683735,40.454255,3405684755,2703306823,6938919168,32201.0
170,174,924.638889,43947868,447394637,1,01-01,24,-3.713338,40.413940,5675629192,1944300705,3897353727,23708.0


In [23]:
df_FinalCluster.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 172 entries, 0 to 171
Data columns (total 13 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ESTACION             172 non-null    int64  
 1   DEMANDA_RATIO        172 non-null    float64
 2   Gis_X                172 non-null    object 
 3   Gis_Y                172 non-null    object 
 4   Distrito             172 non-null    int64  
 5   Barrio               172 non-null    object 
 6   Plazas               172 non-null    int64  
 7   Longitud             172 non-null    float64
 8   Latitud              172 non-null    float64
 9   Tasa_Paro            172 non-null    object 
 10  Renta_Media_Persona  172 non-null    object 
 11  Renta_Media_Hogar    172 non-null    object 
 12  Poblacion            172 non-null    float64
dtypes: float64(4), int64(3), object(6)
memory usage: 18.8+ KB


In [31]:
# Cambios de tipos de datos para variables numéricas

df_FinalCluster['Gis_X'] = df_FinalCluster.Gis_X.astype(str)
df_FinalCluster['Gis_X'] = [x.replace(',', '.') for x in df_FinalCluster['Gis_X']]
df_FinalCluster['Gis_X'] = df_FinalCluster.Gis_X.astype(float)

df_FinalCluster['Gis_Y'] = df_FinalCluster.Gis_Y.astype(str)
df_FinalCluster['Gis_Y'] = [x.replace(',', '.') for x in df_FinalCluster['Gis_Y']]
df_FinalCluster['Gis_Y'] = df_FinalCluster.Gis_Y.astype(float)

df_FinalCluster['Barrio'] = df_FinalCluster.Barrio.astype('string')

df_FinalCluster['Tasa_Paro'] = df_FinalCluster.Tasa_Paro.astype(str)
df_FinalCluster['Tasa_Paro'] = [x.replace(',', '.') for x in df_FinalCluster['Tasa_Paro']]
df_FinalCluster['Tasa_Paro'] = df_FinalCluster.Tasa_Paro.astype(float)

df_FinalCluster['Renta_Media_Persona'] = df_FinalCluster.Renta_Media_Persona.astype(str)
df_FinalCluster['Renta_Media_Persona'] = [x.replace(',', '.') for x in df_FinalCluster['Renta_Media_Persona']]
df_FinalCluster['Renta_Media_Persona'] = df_FinalCluster.Renta_Media_Persona.astype(float)

df_FinalCluster['Renta_Media_Hogar'] = df_FinalCluster.Renta_Media_Hogar.astype(str)
df_FinalCluster['Renta_Media_Hogar'] = [x.replace(',', '.') for x in df_FinalCluster['Renta_Media_Hogar']]
df_FinalCluster['Renta_Media_Hogar'] = df_FinalCluster.Renta_Media_Hogar.astype(float)

df_FinalCluster['Gis_X'] = df_FinalCluster.Gis_X.astype(str)
df_FinalCluster['Gis_X'] = [x.replace(',', '.') for x in df_FinalCluster['Gis_X']]
df_FinalCluster['Gis_X'] = df_FinalCluster.Gis_X.astype(float)

df_FinalCluster['Gis_Y'] = df_FinalCluster.Gis_Y.astype(str)
df_FinalCluster['Gis_Y'] = [x.replace(',', '.') for x in df_FinalCluster['Gis_Y']]
df_FinalCluster['Gis_Y'] = df_FinalCluster.Gis_Y.astype(float)


df_FinalCluster.info()

pd.set_option('display.max_rows', None)
df_FinalCluster


<class 'pandas.core.frame.DataFrame'>
Int64Index: 172 entries, 0 to 171
Data columns (total 13 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ESTACION             172 non-null    int64  
 1   DEMANDA_RATIO        172 non-null    float64
 2   Gis_X                172 non-null    float64
 3   Gis_Y                172 non-null    float64
 4   Distrito             172 non-null    int64  
 5   Barrio               172 non-null    string 
 6   Plazas               172 non-null    int64  
 7   Longitud             172 non-null    float64
 8   Latitud              172 non-null    float64
 9   Tasa_Paro            172 non-null    float64
 10  Renta_Media_Persona  172 non-null    float64
 11  Renta_Media_Hogar    172 non-null    float64
 12  Poblacion            172 non-null    float64
dtypes: float64(9), int64(3), string(1)
memory usage: 18.8 KB


Unnamed: 0,ESTACION,DEMANDA_RATIO,Gis_X,Gis_Y,Distrito,Barrio,Plazas,Longitud,Latitud,Tasa_Paro,Renta_Media_Persona,Renta_Media_Hogar,Poblacion
0,1,2505.333333,440443.61,4474290.65,1,01-06,30,-3.701998,40.417111,4.309681,17550.00279,35056.88536,7665.0
1,2,1313.46875,440480.56,4474301.74,1,01-06,30,-3.701564,40.417213,4.309681,17550.00279,35056.88536,7665.0
2,3,1931.694444,440134.83,4474678.23,1,01-05,24,-3.705674,40.42058,4.283483,17882.22351,35344.22565,33434.0
3,4,1342.166667,440012.98,4475760.68,7,07-02,18,-3.707212,40.430322,3.985756,21397.21963,46882.35448,24770.0
4,5,1235.027778,440396.4,4475565.36,1,01-05,24,-3.702674,40.42859,4.283483,17882.22351,35344.22565,33434.0
5,6,2531.638889,440447.06,4475539.6,1,01-04,27,-3.702074,40.428362,4.374558,23759.66607,48202.79271,18072.0
6,7,1473.111111,440754.26,4475071.08,1,01-04,24,-3.698409,40.424163,4.374558,23759.66607,48202.79271,18072.0
7,8,1696.166667,440811.97,4475187.49,1,01-04,21,-3.69774,40.425216,4.374558,23759.66607,48202.79271,18072.0
8,9,2879.194444,441020.56,4475463.16,7,07-04,24,-3.695307,40.427714,2.958769,26957.00651,68836.5746,19900.0
9,10,1562.333333,439813.51,4474129.47,1,01-01,24,-3.709409,40.415613,5.675629,19443.00705,38973.53727,23708.0


## 3. Exportar DataFrame a CSV

In [36]:
df_FinalCluster.to_csv('../../data/df_FinalCluster.csv', index=False)