# Preprocesamiento de Base de Datos de Aerolíneas

Este notebook realiza el preprocesamiento de datos de operaciones de control de tráfico aéreo (ATC) para análisis posterior.

## Objetivos:
- Cargar y explorar los datos de operaciones ATC
- Extraer columnas relevantes para el análisis
- Organizar los datos por fecha
- Identificar aerolíneas únicas en el dataset

In [1]:
import pandas as pd

## 1. Carga de Datos

Importamos la librería pandas y cargamos el dataset CSV que contiene información sobre operaciones de control de tráfico aéreo.

**Nota:** Se muestra una advertencia sobre columnas con tipos de datos mixtos, lo cual es común en datasets grandes y diversos.

In [3]:
df = pd.read_csv("data/ATC csvs/atc_atcoperation_202512301506.csv")
df

  df = pd.read_csv("data/ATC csvs/atc_atcoperation_202512301506.csv")


Unnamed: 0,id,created,time,dof,fpl_number,atd,ata,eobt,etd,eta,...,sobt,arr_rwy,arr_stand,arr_taxi,dep_taxi,sta,std,flight_rules,type_of_flight,wtc
0,1,2024-06-06 17:24:25.602 +0000,2022-12-12 01:02:49.000 +0000,2022-12-12 00:50:00.000 +0000,22081,2022-12-12 00:59:16.000 +0000,1970-01-01 00:00:00.000 +0000,2022-12-12 00:50:00.000 +0000,1970-01-01 00:00:00.000 +0000,2022-12-12 02:01:16.000 +0000,...,,,,,,,,,,
1,2,2024-06-06 17:24:25.603 +0000,2022-12-12 01:47:17.000 +0000,2022-12-12 01:45:00.000 +0000,22145,2022-12-11 15:23:00.000 +0000,1970-01-01 00:00:00.000 +0000,2022-12-11 15:30:00.000 +0000,1970-01-01 00:00:00.000 +0000,2022-12-12 02:27:26.000 +0000,...,,,,,,,,,,
2,3,2024-06-06 17:24:25.603 +0000,2022-12-11 23:56:57.000 +0000,2022-12-11 22:46:00.000 +0000,22151,2022-12-11 14:09:00.000 +0000,1970-01-01 00:00:00.000 +0000,2022-12-11 14:06:00.000 +0000,1970-01-01 00:00:00.000 +0000,2022-12-12 00:18:59.000 +0000,...,,,,,,,,,,
3,4,2024-06-06 17:24:25.604 +0000,2022-12-11 23:49:07.000 +0000,2022-12-11 22:54:00.000 +0000,22154,1970-01-01 00:00:00.000 +0000,1970-01-01 00:00:00.000 +0000,2022-12-11 13:53:00.000 +0000,1970-01-01 00:00:00.000 +0000,2022-12-12 02:12:00.000 +0000,...,,,,,,,,,,
4,5,2024-06-06 17:24:25.604 +0000,2022-12-12 00:07:07.000 +0000,2022-12-12 00:05:00.000 +0000,22161,1970-01-01 00:00:00.000 +0000,1970-01-01 00:00:00.000 +0000,2022-12-11 14:53:00.000 +0000,1970-01-01 00:00:00.000 +0000,2022-12-12 00:46:35.000 +0000,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
634710,795846261,2025-12-30 15:05:53.621 +0000,2025-12-29 23:01:36.000 +0000,2025-12-29 21:52:00.000 +0000,295797,2025-12-29 21:47:00.000 +0000,,2025-12-29 21:44:00.000 +0000,,2025-12-30 00:18:22.000 +0000,...,,,,,,,,I,S,M
634711,795757819,2025-12-30 15:05:53.621 +0000,2025-12-29 21:48:25.000 +0000,2025-12-29 21:52:00.000 +0000,295977,,,2025-12-29 21:13:00.000 +0000,,2025-12-29 22:33:47.000 +0000,...,,,,,,,,I,G,M
634712,795854391,2025-12-30 15:05:53.632 +0000,2025-12-29 22:50:17.000 +0000,2025-12-29 22:55:00.000 +0000,295914,,,2025-12-29 22:45:00.000 +0000,,2025-12-30 01:04:57.000 +0000,...,,,,,,,,I,S,M
634713,795765832,2025-12-30 15:05:53.622 +0000,2025-12-29 21:53:29.000 +0000,2025-12-29 21:54:00.000 +0000,295733,,,2025-12-29 21:45:00.000 +0000,,2025-12-29 22:48:56.000 +0000,...,,,,,,,,I,S,M


## 2. Extracción de Columnas Relevantes

Seleccionamos las columnas más importantes para nuestro análisis:

- **time**: Timestamp del registro
- **gufi**: Identificador único de vuelo
- **acid**: Identificador de aeronave (Aircraft ID)
- **flight_type**: Tipo de vuelo (D=Doméstico, O=Internacional, etc.)
- **airline_code**: Código de aerolínea
- **aircraft_type**: Tipo de aeronave
- **fl**: Nivel de vuelo (altitud)
- **enter_point/exit_point**: Puntos de entrada/salida del espacio aéreo
- **planned_route**: Ruta planificada
- **reg**: Registro de la aeronave

In [4]:
df_extracted = df[["time", "gufi", "acid", "flight_type", "airline_code", "aircraft_type", "fl", "enter_point", "enter_time", "exit_point", "exit_time" , "planned_route", "reg"]]
df_extracted

Unnamed: 0,time,gufi,acid,flight_type,airline_code,aircraft_type,fl,enter_point,enter_time,exit_point,exit_time,planned_route,reg
0,2022-12-12 01:02:49.000 +0000,JAF111.MUVR.MMUN.221212,JAF111,D,JAF,B788,38000.00000,MUVR,2022-12-12 01:02:49.000 +0000,NOSAT,2022-12-12 01:37:00.000 +0000,DCT UHA UR522 MUPKI UB879 NOSAT NOSAT1D,OOLOE
1,2022-12-12 01:47:17.000 +0000,CFG2116.EDDF.MMUN.221211,CFG2116,O,CFG,A332,40000.00000,CANOA,2022-12-12 01:47:17.000 +0000,NOSAT,2022-12-12 02:13:31.000 +0000,CANOA UB879 OLABI UB879 NOSAT B879 XOPUT UB879...,DAIYB
2,2022-12-11 23:56:57.000 +0000,CFG2206.EDDF.MUHG.221211,CFG2206,A,CFG,B763,34000.00000,GHANN,2022-12-11 23:56:57.000 +0000,MUHG,1970-01-01 00:00:00.000 +0000,GHANN BEMUV5,DABUC
3,2022-12-11 23:49:07.000 +0000,EDW12.LSZH.MROC.221211,EDW12,O,EDW,A343,38000.00000,GHANN,2022-12-11 23:49:07.000 +0000,VIKRO,2022-12-12 00:14:00.000 +0000,GHANN UL347 VIKRO DCT DAGUD DCT SPP UP798 COLO...,HBJMD
4,2022-12-12 00:07:07.000 +0000,TAP277.LPPT.MMUN.221211,TAP277,O,TAP,A333,38000.00000,CANOA,2022-12-12 00:07:07.000 +0000,NOSAT,2022-12-12 00:33:40.000 +0000,CANOA UB879 NOSAT NOSAT1B,ECNTX
...,...,...,...,...,...,...,...,...,...,...,...,...,...
634710,2025-12-29 23:01:36.000 +0000,LPE2473.KMCO.SPJC.251229,LPE2473,O,LPE,A20N,35098.42632,TANIA,2025-12-29 23:01:36.000 +0000,GAXER,,TANIA272004 DCT GAXER UL780 TORIL/N0453F370 UL...,CCBHA
634711,2025-12-29 21:48:25.000 +0000,N111FJ.KFXE.MMUN.251229,N111FJ,O,N111FJ,F900,28097.11376,CANOA,2025-12-29 21:48:25.000 +0000,NOSAT,2025-12-29 22:19:49.000 +0000,CANOA UB879 NOSAT DCT,
634712,2025-12-29 22:50:17.000 +0000,UAL1854.MWCR.KIAH.251229,UAL1854,O,UAL,B739,18897.63840,ATUVI,2025-12-29 22:50:17.000 +0000,ALURU,2025-12-29 23:30:01.000 +0000,DCT ATUVI UL674 NUDIS/N0451F380 UL674 KEHLI A7...,N45440
634713,2025-12-29 21:53:29.000 +0000,CAY106.MWCR.KMIA.251229,CAY106,O,CAY,B38M,21998.03220,RIKEL,2025-12-29 21:53:29.000 +0000,IKBIX,2025-12-29 22:26:44.000 +0000,RIKEL2 RIKEL UG877 UCL UG448 IKBIX Y183 PEAKY ...,VPCIW


## 3. Ordenamiento por Tiempo

Ordenamos los datos cronológicamente por la columna 'time' para facilitar el análisis temporal y asegurar que los datos estén en secuencia lógica.

In [5]:
df = df_extracted.sort_values("time")
df

Unnamed: 0,time,gufi,acid,flight_type,airline_code,aircraft_type,fl,enter_point,enter_time,exit_point,exit_time,planned_route,reg
28,2022-12-11 22:51:33.000 +0000,NKS236.SKRG.KFLL.221211,NKS236,O,NKS,A20N,24000.00000,BEMOL,2022-12-11 22:51:33.000 +0000,BORDO,1970-01-01 00:00:00.000 +0000,MRN5B MRN W65 BUTAL/N0470F340 UP778 CTG UM779 ...,N926NK
118,2022-12-11 23:02:23.000 +0000,VPCXA.KMTH.MWCR.221211,VPCXA,O,VPCXA,DHC6,13000.00000,FUNDI,2022-12-11 23:02:23.000 +0000,ATUVI,2022-12-12 00:34:18.000 +0000,FUNDI DCT ALVEK DCT UCL DCT ATUVI DCT GCM DCT,
68,2022-12-11 23:08:44.000 +0000,AAL1028.TNCA.KMIA.221211,AAL1028,O,AAL,B38M,34000.00000,GELOG,2022-12-11 23:08:44.000 +0000,ZEUSS,2022-12-12 00:00:56.000 +0000,DIBO1F DIBOK UL795 ALTIB UM779 ZEUSS VIICE2,N324RA
51,2022-12-11 23:23:54.000 +0000,CEY974.MUHA.MUHG.221211,CEY974,N,CEY,A321,35000.00000,MUHA,2022-12-11 23:23:54.000 +0000,2-,2022-12-12 00:14:38.000 +0000,KAVUL5A XOPLI J2 ANETU BEMUV5,9HAMG
57,2022-12-11 23:30:32.000 +0000,AVA031.KMIA.SKRG.221211,AVA031,O,AVA,A320,35000.00000,URSUS,2022-12-11 23:30:32.000 +0000,PUTUL,2022-12-12 00:03:15.000 +0000,URSUS UP406 AKPEK AKPE2C,N398AV
...,...,...,...,...,...,...,...,...,...,...,...,...,...
634578,2025-12-30 14:28:24.000 +0000,DAL2014.KATL.MMUN.251230,DAL2014,O,DAL,A321,33999.34492,SHARQ,2025-12-30 14:28:24.000 +0000,NOSAT,2025-12-30 14:46:11.000 +0000,SHARQ UM463 WALKY UB879 NOSAT DCT CUN DCT,N335DN
634552,2025-12-30 14:28:44.000 +0000,FFT19.KMIA.MGGT.251230,FFT19,O,FFT,A21N,33999.34492,MAXIM,2025-12-30 14:28:44.000 +0000,NUKAN,2025-12-30 14:59:58.000 +0000,MAXIM UG765 TIKIS/N0449F320 DCT GT555 DCT,N616FR
634634,2025-12-30 14:31:08.000 +0000,AAL2678.MUHA.KMIA.251230,AAL2678,D,AAL,A319,0.00000,IHA,2025-12-30 14:31:08.000 +0000,MAXIM,2025-12-30 14:49:12.000 +0000,EPMAR4A MAXIM SNDBR3,N737US
634635,2025-12-30 14:35:32.000 +0000,SHARK67.MUGM.MKJP.251230,SHARK67,D,SHARK67,C130,14898.29444,N19190590W075405770,2025-12-30 14:35:32.000 +0000,N18550969W076011837,2025-12-30 14:52:37.000 +0000,DCT ERICC MLY,


## 4. Agrupación Diaria de Datos

Agrupamos los datos por fecha para obtener un resumen diario de:

- **acids_list**: Lista de identificadores de aeronave únicos por día
- **flight_types**: Tipos de vuelo operados ese día
- **airlines**: Aerolíneas que operaron ese día
- **aircraft_types**: Tipos de aeronave utilizados

Esto nos permite analizar patrones diarios y tendencias temporales.

In [6]:
# Convertir la columna time a datetime y extraer la fecha
df_extracted['date'] = pd.to_datetime(df_extracted['time']).dt.date

# Agrupar por fecha y obtener valores únicos para cada columna
daily_acids = df_extracted.groupby('date').agg({
    'acid': lambda x: list(x.unique()),
    'flight_type': lambda x: list(x.unique()),
    'airline_code': lambda x: list(x.unique()),
    'aircraft_type': lambda x: list(x.unique())
}).reset_index()

daily_acids.columns = ['date', 'acids_list', 'flight_types', 'airlines', 'aircraft_types']

# Mostrar los primeros resultados
daily_acids.head()

Unnamed: 0,date,acids_list,flight_types,airlines,aircraft_types
0,2022-12-11,"[CFG2206, EDW12, AFR650, SWG645, N999GC, GTI85...","[A, O, D, N]","[CFG, EDW, AFR, SWG, N999GC, GTI, TPA, AVA, NK...","[B763, A343, B77W, B38M, GLF4, B744, A332, A20..."
1,2022-12-12,"[JAF111, CFG2116, TAP277, EVE813, EVE825, AEA0...","[D, O, A, N, P]","[JAF, CFG, TAP, EVE, AEA, AFR, N670CP, N95TG, ...","[B788, A332, A333, A359, B789, B772, CL30, E55..."
2,2022-12-13,"[VCV3498, TFF967, AEA051, AFR820, CFG2184, NOS...","[A, O, N, D, P]","[VCV, TFF, AEA, AFR, CFG, NOS, GTI, EDW, CEY, ...","[E190, GLF5, B788, B772, A332, B789, B744, A34..."
3,2022-12-14,"[IBE6405, VIV876, UPS385, AVA201, AAL1532, SWG...","[O, A, D]","[IBE, VIV, UPS, AVA, AAL, SWG, XBFXT, FWI, CMP...","[A359, A320, B763, A319, B738, TBM9, A20N, A33..."
4,2022-12-15,"[CMP441, LOT6533, AEA051, AFR820, EVE825, AFR6...","[O, A, D]","[CMP, LOT, AEA, AFR, EVE, SWG, CFG, GTI, TSC, ...","[B39M, B788, B772, A333, B77W, B738, B763, B74..."


In [None]:
# Guardar los datos horarios en un archivo CSV
daily_acids.to_csv("data/ATC csvs/atc_dailyacids_202512301506.csv", index=False)
print("Datos horarios guardados en 'data/ATC csvs/atc_dailyacids_202512301506.csv'")

## 4.1 Agrupación Horaria de Datos

Para un análisis más granular, agrupamos los datos por fecha y hora. Esto nos permite:

- **flight_count**: Número de vuelos por hora
- **flight_types**: Tipos de vuelo operados en esa hora
- **airlines**: Aerolíneas activas en esa hora
- **aircraft_types**: Tipos de aeronave utilizados
- **flight_ids**: Lista de identificadores únicos de vuelo

Este análisis horario es útil para identificar patrones de tráfico aéreo, horas pico y variaciones temporales más detalladas.

In [7]:
# Convertir la columna time a datetime y extraer fecha y hora
df_extracted['datetime'] = pd.to_datetime(df_extracted['time'])
df_extracted['date'] = df_extracted['datetime'].dt.date
df_extracted['hour'] = df_extracted['datetime'].dt.hour

# Agrupar por fecha y hora para obtener análisis horario
hourly_data = df_extracted.groupby(['date', 'hour']).agg({
    'acid': 'count',  # Número de vuelos
    'flight_type': lambda x: list(x.unique()),
    'airline_code': lambda x: list(x.unique()),
    'aircraft_type': lambda x: list(x.unique()),
    'gufi': lambda x: list(x.unique())  # Lista de vuelos únicos
}).reset_index()

hourly_data.columns = ['date', 'hour', 'flight_count', 'flight_types', 'airlines', 'aircraft_types', 'flight_ids']

# Mostrar los primeros resultados
hourly_data.head(10)

Unnamed: 0,date,hour,flight_count,flight_types,airlines,aircraft_types,flight_ids
0,2022-12-11,22,1,[O],[NKS],[A20N],[NKS236.SKRG.KFLL.221211]
1,2022-12-11,23,22,"[A, O, D, N]","[CFG, EDW, AFR, SWG, N999GC, GTI, TPA, AVA, IB...","[B763, A343, B77W, B38M, GLF4, B744, A332, A20...","[CFG2206.EDDF.MUHG.221211, EDW12.LSZH.MROC.221..."
2,2022-12-12,0,45,"[O, A, D, N, P]","[TAP, EVE, AEA, N670CP, TSC, LRC, ACA, SWG, NK...","[A333, A359, B789, CL30, A21N, A320, BCS3, B38...","[TAP277.LPPT.MMUN.221211, EVE813.LEMD.MMUN.221..."
3,2022-12-12,1,45,"[D, O, A, N]","[JAF, CFG, EVE, AFR, TSC, ACA, CAY, SWG, LTG, ...","[B788, A332, A359, B772, A321, A333, B789, B38...","[JAF111.MUVR.MMUN.221212, CFG2116.EDDF.MMUN.22..."
4,2022-12-12,2,26,"[D, O, A, N]","[TSC, TOM, ROU, CFG, SWG, TAO, CUB, N156VP, SW...","[A21N, B788, A321, B763, B738, B38M, AT76, AT7...","[TSC957.MUHG.CYYZ.221212, TOM115.MKJS.EGCC.221..."
5,2022-12-12,3,20,"[D, A, O]","[TSC, SWG, NOS, TOM, ROU, LAN, TAP, SWQ, AEA, ...","[A332, B38M, B789, A319, B738, B788, A333, B73...","[TSC805.MUCC.CYYZ.221212, SWG630.CYOW.MUVR.221..."
6,2022-12-12,4,18,"[O, D, A]","[JAF, UPS, N716CG, EVE, IBE, UAL, CMP, LAE, AA...","[B788, B763, FA8X, A359, B738, B772, B38M, B78...","[JAF111.MMUN.EBBR.221212, UPS383.SEQM.KMIA.221..."
7,2022-12-12,5,9,"[O, D]","[N734JU, CFG, JBU, DAL, SWG, NKS]","[G280, A332, A320, B763, B738, A319, A321]","[N734JU.SKBO.KORL.221212, CFG2117.MMUN.EDDF.22..."
8,2022-12-12,6,17,"[O, A]","[NAC, ROU, KYE, GTI, ACA, AVA, LCO, LPE, JBU, ...","[B763, A319, B744, A333, A320, A21N, B739, A32...","[NAC863.SPJC.KMIA.221212, ROU1876.CYYZ.MUHA.22..."
9,2022-12-12,7,12,"[O, A]","[CFBNS, UAL, DAL, AVA, ACA, LAE, IBE, JBU, SHH...","[FA7X, B789, B764, A20N, B763, A359, A320, B75...","[CFBNS.SCEL.CYUL.221212, UAL844.SBGR.KORD.2212..."


In [9]:
# Crear columna time combinando date y hour
hourly_data['time'] = pd.to_datetime(hourly_data['date'].astype(str) + ' ' + hourly_data['hour'].astype(str) + ':00:00')

# Mostrar las primeras filas para verificar la nueva columna
print("DataFrame con nueva columna 'time':")
hourly_data.head(10)

DataFrame con nueva columna 'time':


Unnamed: 0,date,hour,flight_count,flight_types,airlines,aircraft_types,flight_ids,time
0,2022-12-11,22,1,[O],[NKS],[A20N],[NKS236.SKRG.KFLL.221211],2022-12-11 22:00:00
1,2022-12-11,23,22,"[A, O, D, N]","[CFG, EDW, AFR, SWG, N999GC, GTI, TPA, AVA, IB...","[B763, A343, B77W, B38M, GLF4, B744, A332, A20...","[CFG2206.EDDF.MUHG.221211, EDW12.LSZH.MROC.221...",2022-12-11 23:00:00
2,2022-12-12,0,45,"[O, A, D, N, P]","[TAP, EVE, AEA, N670CP, TSC, LRC, ACA, SWG, NK...","[A333, A359, B789, CL30, A21N, A320, BCS3, B38...","[TAP277.LPPT.MMUN.221211, EVE813.LEMD.MMUN.221...",2022-12-12 00:00:00
3,2022-12-12,1,45,"[D, O, A, N]","[JAF, CFG, EVE, AFR, TSC, ACA, CAY, SWG, LTG, ...","[B788, A332, A359, B772, A321, A333, B789, B38...","[JAF111.MUVR.MMUN.221212, CFG2116.EDDF.MMUN.22...",2022-12-12 01:00:00
4,2022-12-12,2,26,"[D, O, A, N]","[TSC, TOM, ROU, CFG, SWG, TAO, CUB, N156VP, SW...","[A21N, B788, A321, B763, B738, B38M, AT76, AT7...","[TSC957.MUHG.CYYZ.221212, TOM115.MKJS.EGCC.221...",2022-12-12 02:00:00
5,2022-12-12,3,20,"[D, A, O]","[TSC, SWG, NOS, TOM, ROU, LAN, TAP, SWQ, AEA, ...","[A332, B38M, B789, A319, B738, B788, A333, B73...","[TSC805.MUCC.CYYZ.221212, SWG630.CYOW.MUVR.221...",2022-12-12 03:00:00
6,2022-12-12,4,18,"[O, D, A]","[JAF, UPS, N716CG, EVE, IBE, UAL, CMP, LAE, AA...","[B788, B763, FA8X, A359, B738, B772, B38M, B78...","[JAF111.MMUN.EBBR.221212, UPS383.SEQM.KMIA.221...",2022-12-12 04:00:00
7,2022-12-12,5,9,"[O, D]","[N734JU, CFG, JBU, DAL, SWG, NKS]","[G280, A332, A320, B763, B738, A319, A321]","[N734JU.SKBO.KORL.221212, CFG2117.MMUN.EDDF.22...",2022-12-12 05:00:00
8,2022-12-12,6,17,"[O, A]","[NAC, ROU, KYE, GTI, ACA, AVA, LCO, LPE, JBU, ...","[B763, A319, B744, A333, A320, A21N, B739, A32...","[NAC863.SPJC.KMIA.221212, ROU1876.CYYZ.MUHA.22...",2022-12-12 06:00:00
9,2022-12-12,7,12,"[O, A]","[CFBNS, UAL, DAL, AVA, ACA, LAE, IBE, JBU, SHH...","[FA7X, B789, B764, A20N, B763, A359, A320, B75...","[CFBNS.SCEL.CYUL.221212, UAL844.SBGR.KORD.2212...",2022-12-12 07:00:00


In [10]:
# Análisis estadístico de los datos horarios
print("Estadísticas de vuelos por hora:")
print(f"Promedio de vuelos por hora: {hourly_data['flight_count'].mean():.2f}")
print(f"Máximo de vuelos en una hora: {hourly_data['flight_count'].max()}")
print(f"Mínimo de vuelos en una hora: {hourly_data['flight_count'].min()}")
print(f"Total de horas con actividad: {len(hourly_data)}")

# Horas con más tráfico
busiest_hours = hourly_data.nlargest(5, 'flight_count')
print("\nHoras con mayor tráfico:")
print(busiest_hours[['date', 'hour', 'flight_count']])

Estadísticas de vuelos por hora:
Promedio de vuelos por hora: 37.48
Máximo de vuelos en una hora: 146
Mínimo de vuelos en una hora: 1
Total de horas con actividad: 16934

Horas con mayor tráfico:
             date  hour  flight_count
8794   2025-01-04    16           146
10423  2025-03-22    16           139
8770   2025-01-03    16           137
9751   2025-02-22    16           137
9129   2025-01-25    16           135


In [13]:
# Guardar los datos horarios en un archivo CSV
hourly_data.to_csv("data/ATC csvs/atc_hourlyacids_202512301506.csv", index=False)
print("Datos horarios guardados en 'data/ATC csvs/atc_hourlyacids_202512301506.csv'")

Datos horarios guardados en 'hourly_flight_data.csv'


## 5. Análisis de Aerolíneas Únicas

Intentamos extraer códigos de aerolínea de las listas almacenadas como strings. Filtramos solo aquellos códigos que tienen exactamente 3 caracteres (formato estándar IATA).

**Resultado:** Se encontraron 536 aerolíneas únicas, lo que indica una gran diversidad de operadores en el espacio aéreo analizado.

In [None]:
import ast

# Convertir los strings que parecen listas a listas reales
def string_to_list(s):
    try:
        return ast.literal_eval(s)
    except:
        return []

# Obtener el conjunto de todas las aerolíneas de las listas en la columna airlines
all_airlines = set()
for airlines_string in df['airlines']:
    airlines_list = string_to_list(airlines_string)
    airlines_list = [air for air in airlines_list if len(air) == 3] 
    all_airlines.update(airlines_list)

print(f"Total de aerolíneas únicas: {len(all_airlines)}")
print("\nAerolíneas encontradas:")
print(all_airlines)


Total de aerolíneas únicas: 536

Aerolíneas encontradas:
{'DAL', 'HRN', 'KFB', 'LWG', 'BBQ', 'PXT', 'LME', 'FAR', 'VIV', 'WJA', 'HFM', 'HVN', 'TPA', 'REA', 'CYX', 'CUL', 'MBK', 'N3W', 'WFL', 'MVJ', 'REE', 'PVA', 'AXH', 'RHH', 'SWA', 'JCT', 'LAL', 'ETA', 'TOM', 'AUA', 'FIC', 'XOJ', 'WWI', 'GLO', 'KFE', 'DAE', 'CRR', 'GKY', 'NRC', 'IJM', 'AAL', 'SUS', 'CFA', 'CEY', 'ACW', 'ULC', 'SAA', 'GMO', 'ADS', 'RUC', 'ANX', 'SDM', 'FEX', 'CSJ', 'VNB', 'ALZ', 'TKM', 'VCG', 'NEW', 'QFX', 'POV', 'PJZ', 'CSB', 'GTV', 'FBZ', 'MMD', 'HTS', 'WPT', 'GRP', 'FDX', 'GJE', 'RSD', 'LRQ', 'SLM', 'FAE', 'QQE', 'BQA', 'NJM', 'FLE', 'WSP', 'DRL', 'CCK', 'KFA', 'GPD', 'AMF', 'ABP', 'AAH', 'EAL', 'HRT', 'SLI', 'QTR', 'SHH', 'SON', 'ALE', 'PBR', 'FXT', 'QAF', 'GJW', 'DOW', 'AME', 'MTN', 'AMX', 'FAD', 'VJA', 'NUM', 'MAD', 'POE', 'BAW', 'ARE', 'LRT', 'VTU', 'SCX', 'NKS', 'TAO', 'N1F', 'ALL', 'PJV', 'MXY', 'FAG', 'UAL', 'DPJ', 'FAP', 'IFC', 'RAX', 'JBU', 'HVY', 'CBC', 'NOJ', 'LAK', 'BOV', 'KNT', 'TEK', 'SVL', 'SWQ', 'AFR