In [1]:
# To ensure Chinese characters are displayed correctly
options(encoding = "UTF-8")
Sys.setlocale("LC_CTYPE", "en_US.UTF-8")

# Progress storage

During the research process, you will need to save your progress and come back the next data to pick up where you left off. This is a good practice to avoid losing your work. In this section, we will learn how to save and load data in R.

In [4]:
flights <- list(
    data = list(
        file = "data/international_flights.json",
        meta = list(
            name = "國際航空定期時刻表",
            source_link = "https://data.gov.tw/dataset/161167"
        )
    )
)

# Exercise: understand your data

- what's the name of the dataset? where is it from? 

# Import data

In [7]:
# Read JSON file

filepath = flights$data$file
flightsData <- jsonlite::fromJSON(filepath)

# Store progress

In [None]:
flights$data

# Data acquantance

## Data structure

- Type of storage: an atomic vector, a list, a data frame, or a matrix.  
- Class: numeric, character, integer, list, data frame, or matrix.  

In [12]:
typeof(flightsData)
class(flightsData)

### Two classes of collective data 

- Observation by observation (obo): mostly a `list` class.
- Feature by feature (fbf): mostly a `data.frame` class.

[3.1.1 Json data](https://tpemartin.github.io/NTPU-R-for-Data-Science-EN/element-values.html#json-data)  
[4.2.4 Data frame](https://tpemartin.github.io/NTPU-R-for-Data-Science-EN/operations-on-atomic-vectors.html#data-frame)

[Benefit of data frame](https://hyp.is/K1GeGme-Ee6m2vNkdp7lnw/tpemartin.github.io/NTPU-R-for-Data-Science-EN/operations-on-atomic-vectors.html)

In [15]:
# Observation by observation
concerts_obo <-
  jsonlite::fromJSON("https://cloud.culture.tw/frontsite/trans/SearchShowAction.do?method=doFindTypeJ&category=17", simplifyDataFrame = F)

# Feature by feature
concerts_fbf <-
  jsonlite::fromJSON("https://cloud.culture.tw/frontsite/trans/SearchShowAction.do?method=doFindTypeJ&category=17", simplifyDataFrame = T)

In [None]:
concerts_obo[[1]]


In [19]:

concerts_fbf[[1]]

## Names of columns

In [13]:
names(flightsData)

Each name represents:

- `AirlineID`: an identification number assigned by IATA to identify a unique airline (carrier).
- `ScheduleStartDate`: the start date of the flight schedule season for which the row of data is relevant to.
- `ScheduleEndDate`: the end date of the flight schedule season for which the row of data is relevant to.
- `FlightNumber`: the flight number assigned by the carrier.
- `DepartureAirportID`: an identification number assigned by IATA to identify a unique airport.
- `ArrivalAirportID`: an identification number assigned by IATA to identify a unique airport.
- `DepartureTime`: the scheduled departure time of the flight.
- `ArrivalTime`: the scheduled arrival time of the flight.
- `Monday` to `Sunday`: the days of the week on which the flight operates based on departure date.
- `CodeShare`: a code share flight is a flight booked through one airline but operated by another airline (as indicated by the carrier code).



## Check the first few rows

In [14]:
head(flightsData)

Unnamed: 0_level_0,AirlineID,ScheduleStartDate,ScheduleEndDate,FlightNumber,DepartureAirportID,DepartureTime,CodeShare,ArrivalAirportID,ArrivalTime,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday,Sunday,UpdateTime,VersionID,Terminal,num_codeShare
Unnamed: 0_level_1,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<int>
1,3U,2023-10-13,2023-10-15,3U3783,CKG,15:00,,TSA,18:00,False,False,False,False,True,False,True,2023-10-10T08:26:07+08:00,1111,,0
2,3U,2023-10-20,2023-10-22,3U3783,CKG,15:00,,TSA,18:00,False,False,False,False,True,False,True,2023-10-10T08:26:07+08:00,1111,,0
3,3U,2023-10-27,2023-10-27,3U3783,CKG,15:00,,TSA,18:00,False,False,False,False,True,False,False,2023-10-10T08:26:07+08:00,1111,,0
4,3U,2023-10-13,2023-10-15,3U3784,TSA,19:00,,CKG,22:15,False,False,False,False,True,False,True,2023-10-10T08:26:07+08:00,1111,,0
5,3U,2023-10-20,2023-10-22,3U3784,TSA,19:00,,CKG,22:15,False,False,False,False,True,False,True,2023-10-10T08:26:07+08:00,1111,,0
6,3U,2023-10-27,2023-10-27,3U3784,TSA,19:00,,CKG,22:15,False,False,False,False,True,False,False,2023-10-10T08:26:07+08:00,1111,,0


In [None]:
install.packages("airportr")


In [23]:

head(airportr::airports)

OpenFlights ID,Name,City,IATA,ICAO,Country,Country Code,Country Code (Alpha-2),Country Code (Alpha-3),Latitude,Longitude,Altitude,UTC,DST,Timezone,Type,Source
<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>
1,Goroka Airport,Goroka,GKA,AYGA,Papua New Guinea,598,PG,PNG,-6.08169,145.392,5282,10,U,Pacific/Port_Moresby,airport,OurAirports
2,Madang Airport,Madang,MAG,AYMD,Papua New Guinea,598,PG,PNG,-5.20708,145.789,20,10,U,Pacific/Port_Moresby,airport,OurAirports
3,Mount Hagen Kagamuga Airport,Mount Hagen,HGU,AYMH,Papua New Guinea,598,PG,PNG,-5.82679,144.296,5388,10,U,Pacific/Port_Moresby,airport,OurAirports
4,Nadzab Airport,Nadzab,LAE,AYNZ,Papua New Guinea,598,PG,PNG,-6.569803,146.726,239,10,U,Pacific/Port_Moresby,airport,OurAirports
5,Port Moresby Jacksons International Airport,Port Moresby,POM,AYPY,Papua New Guinea,598,PG,PNG,-9.44338,147.22,146,10,U,Pacific/Port_Moresby,airport,OurAirports
6,Wewak International Airport,Wewak,WWK,AYWK,Papua New Guinea,598,PG,PNG,-3.58383,143.669,19,10,U,Pacific/Port_Moresby,airport,OurAirports


In [30]:
flightsData$ArrivalAirportID |> unlist() |> table() |> sort(decreasing = T)


 TPE  PVG  KHH  HKG  NRT  BKK  KIX  ICN  TSA  SIN  MFM  FUK  HAN  BWN  LAX  SGN 
2268  156  140  138  134  120  120  108   99   82   70   66   63   60   60   59 
 SFO  MNL  KUL  OKA  PEK  CEB  PUS  SDJ  DAD  CTS  HND  SZX  CAN  KMQ  SHA  HGH 
  57   54   50   49   42   41   37   36   35   33   33   33   27   27   24   23 
 XMN  DMK  RMQ  GMP  NGO  PEN  BKI  JFK  DXB  NKG  FOC  TFU  YVR  CGK  CKG  CNX 
  23   22   22   21   21   21   19   18   16   16   15   15   15   12   12   12 
 MEL  ORD  PNH  TAO  YYZ  BNE  TAE  CRK  DPS  LHR  SEA  SYD  TAK  WUH  VIE  AKL 
  12   12   12   12   12   11   10    9    9    9    9    9    9    9    8    6 
 AMS  CJJ  CJU  FCO  FRA  GAJ  HIJ  IAH  IST  KMJ  MUC  NGB  KIJ  MXP  TKS  CXR 
   6    6    6    6    6    6    6    6    6    6    6    6    5    5    5    4 
 PRG  AKJ  CDG  CGO  HKD  HKT  HNA  HSG  HUI  IBR  KCZ  KLO  MPH  OKJ  ONT  PPS 
   4    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3 
 RGN  ROR  TNN  AOJ  IZO  T

In [36]:
han <- airportr::airport_lookup("HAN")

han 

“data set ‘airports’ not found”


> Warning message could be ignored most of the time since the command still goes through. 