# Creating data frame objects

## Objectives

- Creating data frames
- Using RCall.jl to integrate with R
- Understanding the Table.jl interface
- Plotting a correlation matrix
- Constructing a data frame interactively by adding rows to it
- Serialize Julia objects

## Recap

```julia
using DataFrames, Downloads, CodecZstd, CSV
file = "new_puzzles.csv.zst"
url = "https://database.lichess.org/lichess_db_puzzle.csv.zst"
if isfile(file)
    @info "File already downloaded"
else
    @info "Downloading file"
    Downloads.download(url, file)
end
compressed = read(file)
plain = transcode(ZstdDecompressor, compressed)
puzzles = CSV.read(plain, delim=",", maxwarnings=0, DataFrame,
                   header=["PuzzleId", "FEN", "Moves", "Rating",
                           "RatingDeviation", "Popularity", "NbPlays", "Themes",
                           "GameUrl", "OpeningTags"]);
compressed = plain = nothing

using Plots
plot([histogram(puzzles[!, col]; label=col) for col in ["Rating", "RatingDeviation", "Popularity", "NbPlays"]]..., layout=(2, 2))

# data cleaning
using Statistics
# puzzles played enough times
f1 = puzzles.NbPlays .> median(puzzles.NbPlays)
# rating not extreme
f2 = median(puzzles.Rating) .< puzzles.Rating .< quantile(puzzles.Rating, 0.99)
qc = f1 .&& f2
dat = puzzles[qc, [:Rating, :Popularity]]

plot([histogram(dat[!, col]; label=col) for col in ["Rating", "Popularity"]]..., layout=(2, 1))
describe(dat)