In [1]:
# To ensure Chinese characters are displayed correctly
options(encoding = "UTF-8")
Sys.setlocale("LC_CTYPE", "en_US.UTF-8")

# Progress storage

During the research process, you will need to save your progress and come back the next data to pick up where you left off. This is a good practice to avoid losing your work. In this section, we will learn how to save and load data in R.

In [2]:
flights <- list(
    data = list(
        file = "data/international_flights.json",
        meta = list(
            name = "國際航空定期時刻表",
            source_link = "https://data.gov.tw/dataset/161167"
        )
    )
)

- we will constantly save our research progress under `flights` list.  
- when we want to save our progress, run:

In [3]:
saveRDS(flights, "data/flights.rds")

- if you turn off your computer and want to come back the next day, run:

In [None]:
flights = readRDS("data/flights.rds")

# Exercise: understand your data

- what's the name of the dataset? where is it from? 

# Import data

In [7]:
# Read JSON file

filepath = flights$data$file
flightsData <- jsonlite::fromJSON(filepath)

# Store progress

In [None]:
flights$data

# Data acquantance

## Data structure

- Type of storage: an atomic vector, a list, a data frame, or a matrix.  
- Class: numeric, character, integer, list, data frame, or matrix.  

In [12]:
typeof(flightsData)
class(flightsData)

### Two classes of collective data 

- Observation by observation (obo): mostly a `list` class.
- Feature by feature (fbf): mostly a `data.frame` class.

[3.1.1 Json data](https://tpemartin.github.io/NTPU-R-for-Data-Science-EN/element-values.html#json-data)  
[4.2.4 Data frame](https://tpemartin.github.io/NTPU-R-for-Data-Science-EN/operations-on-atomic-vectors.html#data-frame)

[Benefit of data frame](https://hyp.is/K1GeGme-Ee6m2vNkdp7lnw/tpemartin.github.io/NTPU-R-for-Data-Science-EN/operations-on-atomic-vectors.html)

In [53]:
person1 <- list(
    name = "John",
    age = 30,
    married = TRUE
)
person2 <- list(
    name = "Mary",
    age = 25,
    married = FALSE
)
person3 <- list(
    name = "Tom",
    age = 35,
    married = TRUE
)

# observation by observation stacking
data_obo <- list(person1, person2, person3)

In [54]:
names = c("John", "Mary", "Tom")
ages = c(30, 25, 35)
isMarried = c(TRUE, FALSE, TRUE)

# feature by feature stacking
data_fbf <- list(
    name = names, 
    age = ages, 
    married = isMarried)

- Feature by feature: each element has the same length.

In [None]:
length(data_fbf$name)
length(data_fbf$age)
length(data_fbf$married)

- Within each feature, its values are of the same type.

In [57]:
typeof(data_fbf$name)
typeof(data_fbf$age)
typeof(data_fbf$married)

For fbf stacking, mostly we will use a new way to store data: `data.frame` class.

### Data frame

In [56]:
df <- list2DF(data_fbf) # use list2DF to convert a list fbf into a data frame


name,age,married
<chr>,<dbl>,<lgl>
John,30,True
Mary,25,False
Tom,35,True


In [None]:

df <- data.frame(
    name = names, 
    age = ages, 
    married = isMarried
) # directly forming a data frame

In [15]:
# Observation by observation
concerts_obo <-
  jsonlite::fromJSON("https://cloud.culture.tw/frontsite/trans/SearchShowAction.do?method=doFindTypeJ&category=17", simplifyDataFrame = F)

# Feature by feature
concerts_fbf <-
  jsonlite::fromJSON("https://cloud.culture.tw/frontsite/trans/SearchShowAction.do?method=doFindTypeJ&category=17", simplifyDataFrame = T)

In [51]:
head(concerts_obo,3)


In [52]:
head(concerts_fbf)

Unnamed: 0_level_0,version,UID,title,category,showInfo,showUnit,discountInfo,descriptionFilterHtml,imageUrl,masterUnit,⋯,supportUnit,otherUnit,webSales,sourceWebPromote,comment,editModifyDate,sourceWebName,startDate,endDate,hitRate
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<list>,<chr>,<chr>,<chr>,<chr>,<list>,⋯,<list>,<list>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>
1,1.4,64b83da673f77c49dcd265d4,國際沈文程日 十二月九彼下暗巡迴演唱會,17,"2023/12/09 19:30:00 , 高雄市鹽埕區真愛路1號 , 高雄流行音樂中心 海音館 , Y , 3600、3200、2800、2400、2000、1600、1200、800, 22.6181635 , 120.2885658 , 2023/12/09 23:00:00",,,,,,⋯,,,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P07VL56F,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P07VL56F,,,年代,2023/12/09,2023/12/09,39
2,1.4,64dfbac673f77c2b484e677e,「歌之饗宴8-破繭而出 SAYA 張惠春」演唱會,17,"2023/10/28 19:00:00 , 2023/10/29 19:00:00 , 台北市信義區信義路五段一號 , 台北市信義區信義路五段一號 , 台北國際會議中心大會堂 , 台北國際會議中心大會堂 , Y , Y , 4000、3600、3200、2800、2400、2000、1600, 4000、3600、3200、2800、2400、2000、1600, 25.0336111 , 25.0336111 , 121.5608333 , 121.5608333 , 2023/10/28 23:59:00 , 2023/10/29 23:59:00",,,,,,⋯,,,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P091DXL3,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P091DXL3,,,年代,2023/10/28,2023/10/29,133
3,1.4,64de6cdd73f77c2b484e6718,時也 運也 命也 2024楊哲公益巡迴演唱會,17,"2024/01/20 19:30:00 , 台中市南區興大路145號 , 台中中興大學惠蓀堂 , Y , 6800、5800、3800、2800、1800、1600、1200、800, 24.1234884 , 120.6769368 , 2024/01/20 21:30:00",,,,,,⋯,,,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P09843UA,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P09843UA,,,年代,2024/01/20,2024/01/20,117
4,1.4,64e7b90573f77c2b484e693c,黃金歲月青春99演唱會,17,"2023/12/02 15:00:00 , 台中市南區興大路145號 , 台中中興大學惠蓀堂 , Y , 5200、4200、3600、3000、2400、1800、1500、1200, 24.1234884 , 120.6769368 , 2023/12/02 23:59:00",,,,,,⋯,,,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P093YMH8,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P093YMH8,,,年代,2023/12/02,2023/12/02,28
5,1.4,64e50d8973f77c2b484e68a5,師與徒聖歌演唱會-濟公本色,17,"2023/12/23 19:30:00 , 2023/12/24 14:30:00 , 2024/01/06 19:30:00 , 2024/01/07 14:30:00 , 高雄市小港區學府路115號 , 高雄市小港區學府路115號 , 臺中市豐原區圓環東路782號 , 臺中市豐原區圓環東路782號 , 高雄市立社會教育館演藝廳 , 高雄市立社會教育館演藝廳 , 臺中市葫蘆墩文化中心演奏廳, 臺中市葫蘆墩文化中心演奏廳, Y , Y , Y , Y , 1200、1000、800、600 , 1200、1000、800、600 , 1200、1000、800、600 , 1200、1000、800、600 , 22.5652855 , 22.5652855 , 24.2528099 , 24.2528099 , 120.3592853 , 120.3592853 , 120.73019 , 120.73019 , 2023/12/23 23:00:00 , 2023/12/24 23:00:00 , 2024/01/06 23:00:00 , 2024/01/07 23:00:00",,,,,,⋯,,,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P090G7AF,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P090G7AF,,,年代,2023/12/23,2024/01/07,26
6,1.4,64f0ec2573f77c2b484e6bf8,2024夏川里美出道25週年十全十美台北演唱會,17,"2024/02/03 19:30:00 , 台北市信義區信義路五段一號 , 台北國際會議中心大會堂 , Y , 5000、4500、4000、3500、3000、2500、2000, 25.0336111 , 121.5608333 , 2024/02/03 23:59:00",,,,,,⋯,,,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P09ETYZR,https://ticket.com.tw/Application/UTK02/UTK0201_.aspx?PRODUCT_ID=P09ETYZR,,,年代,2024/02/03,2024/02/03,36


## Names of columns

In [13]:
names(flightsData)

Each name represents:

- `AirlineID`: an identification number assigned by IATA to identify a unique airline (carrier).
- `ScheduleStartDate`: the start date of the flight schedule season for which the row of data is relevant to.
- `ScheduleEndDate`: the end date of the flight schedule season for which the row of data is relevant to.
- `FlightNumber`: the flight number assigned by the carrier.
- `DepartureAirportID`: an identification number assigned by IATA to identify a unique airport.
- `ArrivalAirportID`: an identification number assigned by IATA to identify a unique airport.
- `DepartureTime`: the scheduled departure time of the flight.
- `ArrivalTime`: the scheduled arrival time of the flight.
- `Monday` to `Sunday`: the days of the week on which the flight operates based on departure date.
- `CodeShare`: a code share flight is a flight booked through one airline but operated by another airline (as indicated by the carrier code).



## Check the first few rows

In [14]:
head(flightsData)

Unnamed: 0_level_0,AirlineID,ScheduleStartDate,ScheduleEndDate,FlightNumber,DepartureAirportID,DepartureTime,CodeShare,ArrivalAirportID,ArrivalTime,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday,Sunday,UpdateTime,VersionID,Terminal,num_codeShare
Unnamed: 0_level_1,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<int>
1,3U,2023-10-13,2023-10-15,3U3783,CKG,15:00,,TSA,18:00,False,False,False,False,True,False,True,2023-10-10T08:26:07+08:00,1111,,0
2,3U,2023-10-20,2023-10-22,3U3783,CKG,15:00,,TSA,18:00,False,False,False,False,True,False,True,2023-10-10T08:26:07+08:00,1111,,0
3,3U,2023-10-27,2023-10-27,3U3783,CKG,15:00,,TSA,18:00,False,False,False,False,True,False,False,2023-10-10T08:26:07+08:00,1111,,0
4,3U,2023-10-13,2023-10-15,3U3784,TSA,19:00,,CKG,22:15,False,False,False,False,True,False,True,2023-10-10T08:26:07+08:00,1111,,0
5,3U,2023-10-20,2023-10-22,3U3784,TSA,19:00,,CKG,22:15,False,False,False,False,True,False,True,2023-10-10T08:26:07+08:00,1111,,0
6,3U,2023-10-27,2023-10-27,3U3784,TSA,19:00,,CKG,22:15,False,False,False,False,True,False,False,2023-10-10T08:26:07+08:00,1111,,0


In [30]:
flightsData$ArrivalAirportID |> unlist() |> table() |> sort(decreasing = T)


 TPE  PVG  KHH  HKG  NRT  BKK  KIX  ICN  TSA  SIN  MFM  FUK  HAN  BWN  LAX  SGN 
2268  156  140  138  134  120  120  108   99   82   70   66   63   60   60   59 
 SFO  MNL  KUL  OKA  PEK  CEB  PUS  SDJ  DAD  CTS  HND  SZX  CAN  KMQ  SHA  HGH 
  57   54   50   49   42   41   37   36   35   33   33   33   27   27   24   23 
 XMN  DMK  RMQ  GMP  NGO  PEN  BKI  JFK  DXB  NKG  FOC  TFU  YVR  CGK  CKG  CNX 
  23   22   22   21   21   21   19   18   16   16   15   15   15   12   12   12 
 MEL  ORD  PNH  TAO  YYZ  BNE  TAE  CRK  DPS  LHR  SEA  SYD  TAK  WUH  VIE  AKL 
  12   12   12   12   12   11   10    9    9    9    9    9    9    9    8    6 
 AMS  CJJ  CJU  FCO  FRA  GAJ  HIJ  IAH  IST  KMJ  MUC  NGB  KIJ  MXP  TKS  CXR 
   6    6    6    6    6    6    6    6    6    6    6    6    5    5    5    4 
 PRG  AKJ  CDG  CGO  HKD  HKT  HNA  HSG  HUI  IBR  KCZ  KLO  MPH  OKJ  ONT  PPS 
   4    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3 
 RGN  ROR  TNN  AOJ  IZO  T

## Understand column values

### `...AirportID`

In [None]:
install.packages("airportr")


In [49]:
library(airportr)
han <- airport_lookup("HAN", output_type=c("city"))

han 

In [48]:
tpe_han <- airport_distance("TPE","HAN")

tpe_han

> Warning message could be ignored most of the time since the command still goes through. 