In [2]:
using CSV, DataFrames

In [27]:
data = CSV.read("./Data/sample.csv",DataFrame)
first(data,10)

Unnamed: 0_level_0,1,"Eldon Base for stackable storage shelf, platinum",Muhammed MacIntyre
Unnamed: 0_level_1,Int64,String,String31
1,2,"1.7 Cubic Foot Compact ""Cube"" Office Refrigerators",Barry French
2,3,"Cardinal Slant-D\xae Ring Binder, Heavy Gauge Vinyl",Barry French
3,4,R380,Clay Rozendal
4,5,Holmes HEPA Air Purifier,Carlos Soltero
5,6,G.E. Longer-Life Indoor Recessed Floodlight Bulbs,Carlos Soltero
6,7,"Angle-D Binders with Locking Rings, Label Holders",Carl Jackson
7,8,"SAFCO Mobile Desk Side File, Wire Frame",Carl Jackson
8,9,"SAFCO Commercial Wire Shelving, Black",Monica Federle
9,10,Xerox 198,Dorothy Badders
10,11,Xerox 1980,Neola Schneider


### CSV.read()

- passes the input to a valid sink type that will be the recipient of the input data, such as a DataFrame, without making extra copies.

### CSV.File()

- materializes the input to a valid sink type. This means that the input will be copied before being passed to a valid sink.

Our advice is to almost exclusively use ```CSV.read()```.

[Puma.ai](https://tutorials.pumas.ai/html/DataWranglingInJulia/04-read_data.html#reading-a-single-csv-file)

### Custom CSV File Specifications

- CSV.jl might sometimes fail to correctly guess the specifications of the underlying file. This is where you’ll have to pass the specifications yourself. Fortunately, CSV.read() has the following arguments:

- **delim**: either a character, e.g. ';', or string, e.g. "\t", that will be used to indicate how the values are delimited in a file. If empty, CSV.jl will try to guess and detect from the first 10 rows of the file.

- **decimal**: a character that will be used to parse floating point numbers. If ',', then 3,14 will be parsed as a float. The default is a dot, '.'.

- **missingstring**: By default, every blank value in a CSV file will be parsed as a missing value. Sometimes missing values will be hardcoded as some other value instead of a blank value. For example, it could be a dot "." or a string "NA". In fact, any value can be specified to represent missing.

In [18]:
readlines("./Data/sample2.csv")[1:5]

5-element Vector{String}:
 "1;\"Eldon Base for stackable sto"[93m[1m ⋯ 62 bytes ⋯ [22m[39m"avut;Storage & Organization;0.8"
 "2;\"1.7 Cubic Foot Compact Cube "[93m[1m ⋯ 51 bytes ⋯ [22m[39m"6;68.02;Nunavut Appliances;0.58"
 "3;\"Cardinal Slant-D® Ring Binde"[93m[1m ⋯ 66 bytes ⋯ [22m[39m"ers and Binder Accessories;0.39"
 "4;R380 Clay Rozendal;483;1198.9"[93m[1m ⋯ 24 bytes ⋯ [22m[39m"lephones and Communication;0.58"
 "5;Holmes HEPA Air Purifier;Carl"[93m[1m ⋯ 23 bytes ⋯ [22m[39m".78;5.94;Nunavut;Appliances;0.5"

In [43]:
df = CSV.read("./Data/sample2.csv",DataFrame; delim=';')

Unnamed: 0_level_0,ID,Product Name,Customer Name,Date
Unnamed: 0_level_1,Int64,String,String31,Int64
1,1,Eldon Base for stackable storage shelf platinum,Muhammed MacIntyre,3
2,2,Cubic Foot Compact Cube Office Refrigerators,Barry French,293
3,3,Cardinal Slant-D Ring Binder Heavy Gauge Vinyl,Barry French,293
4,4,R380 Clay Rozendal,Barry French,483
5,5,Holmes HEPA Air Purifier,Carlos Soltero,515
6,6,G.E. Longer-Life Indoor Recessed Floodlight Bulbs,Carlos Soltero,515
7,7,Angle-D Binders with Locking Rings Label Holders,Carl Jackson,613
8,8,SAFCO Mobile Desk Side File Wire Frame,Carl Jackson,613
9,9,SAFCO Commercial Wire Shelving; Black,Monica Federle,643
10,10,Xerox 198,Dorothy Badders,678


### Specifying Custom Types for Columns while Reading a CSV File

Sometimes you’ll want to overrule the automatic type detection that CSV.jl will infer for the columns present in the CSV file. This can be done with the keyword argument types. It accepts several inputs, but the easiest and most customizable is a Julia dictionary, where the keys are either an integer (for the column indices) or a string/symbol (for column names) and the values are the desired types.

In [48]:
custom_read = CSV.read("./Data/sample2.csv",DataFrame; delim=';', types = Dict(:"Date" => String))

Unnamed: 0_level_0,ID,Product Name,Customer Name,Date
Unnamed: 0_level_1,Int64,String,String31,String
1,1,Eldon Base for stackable storage shelf platinum,Muhammed MacIntyre,3
2,2,Cubic Foot Compact Cube Office Refrigerators,Barry French,293
3,3,Cardinal Slant-D Ring Binder Heavy Gauge Vinyl,Barry French,293
4,4,R380 Clay Rozendal,Barry French,483
5,5,Holmes HEPA Air Purifier,Carlos Soltero,515
6,6,G.E. Longer-Life Indoor Recessed Floodlight Bulbs,Carlos Soltero,515
7,7,Angle-D Binders with Locking Rings Label Holders,Carl Jackson,613
8,8,SAFCO Mobile Desk Side File Wire Frame,Carl Jackson,613
9,9,SAFCO Commercial Wire Shelving; Black,Monica Federle,643
10,10,Xerox 198,Dorothy Badders,678


### Selecting and Dropping Columns while Reading a CSV File

By default, ```CSV.read()``` will parse and read all the columns in a CSV file. If you want to select or drop specific columns you can do so with the following arguments. Both accept a vector of either strings or integers that represents the names and indices, respectively, of the columns to select or drop.

- select
- drop

In [52]:
selected = CSV.read("./Data/sample2.csv", DataFrame; delim=';', select= [1,3,6])

Unnamed: 0_level_0,ID,Customer Name,Order Quantity
Unnamed: 0_level_1,Int64,String31,Float64
1,1,Muhammed MacIntyre,38.94
2,2,Barry French,208.16
3,3,Barry French,8.69
4,4,Barry French,195.99
5,5,Carlos Soltero,21.78
6,6,Carlos Soltero,6.64
7,7,Carl Jackson,7.3
8,8,Carl Jackson,42.76
9,9,Monica Federle,138.14
10,10,Dorothy Badders,4.98


Notice that we can be clever with the indices. If we want to select a range of indices we can materialize it into a vector using the ```collect()```.

In [53]:
selected_collect = CSV.read("./Data/sample2.csv", DataFrame; delim=';', select=collect(1:6))

Unnamed: 0_level_0,ID,Product Name,Customer Name,Date
Unnamed: 0_level_1,Int64,String,String31,Int64
1,1,Eldon Base for stackable storage shelf platinum,Muhammed MacIntyre,3
2,2,Cubic Foot Compact Cube Office Refrigerators,Barry French,293
3,3,Cardinal Slant-D Ring Binder Heavy Gauge Vinyl,Barry French,293
4,4,R380 Clay Rozendal,Barry French,483
5,5,Holmes HEPA Air Purifier,Carlos Soltero,515
6,6,G.E. Longer-Life Indoor Recessed Floodlight Bulbs,Carlos Soltero,515
7,7,Angle-D Binders with Locking Rings Label Holders,Carl Jackson,613
8,8,SAFCO Mobile Desk Side File Wire Frame,Carl Jackson,613
9,9,SAFCO Commercial Wire Shelving; Black,Monica Federle,643
10,10,Xerox 198,Dorothy Badders,678


In [54]:
droped = CSV.read("./Data/sample2.csv", DataFrame; delim=';', drop=["Product Name"])

Unnamed: 0_level_0,ID,Customer Name,Date,Order Priority,Order Quantity,Sales,Discount
Unnamed: 0_level_1,Int64,String31,Int64,Float64,Float64,Float64,String7
1,1,Muhammed MacIntyre,3,-213.25,38.94,35.0,Nunavut
2,2,Barry French,293,457.81,208.16,68.02,Nunavut
3,3,Barry French,293,46.71,8.69,2.99,Nunavut
4,4,Barry French,483,1198.97,195.99,3.99,Nunavut
5,5,Carlos Soltero,515,30.94,21.78,5.94,Nunavut
6,6,Carlos Soltero,515,4.43,6.64,4.95,Nunavut
7,7,Carl Jackson,613,-54.04,7.3,7.72,Nunavut
8,8,Carl Jackson,613,127.7,42.76,6.22,Nunavut
9,9,Monica Federle,643,-695.26,138.14,35.0,Nunavut
10,10,Dorothy Badders,678,-226.36,4.98,8.33,Nunavut


### Writing CSV Files

In order to write CSV files, you’ll use the ```CSV.write()``` function which can be used in two ways:

By passing a file path as a string as the first argument and a table (such as a DataFrame) as the second argument.

In [58]:
my_df =  DataFrame(A=1:2:100, B=repeat(1:50), C=1:50)
CSV.write("./Data/my_file.csv", my_df)

"./Data/my_file.csv"

where ```my_df``` is a DataFrame.

By *“piping”* the table (such as a DataFrame ) into the ```CSV.write()``` function and also specifying a file path as a string as the first argument:

In [None]:
my_df |> CSV.write("data/my_file.csv")