# Data

Let's see how to download and work with data in Julia


In [1]:
using Pkg

Pkg.add("BenchmarkTools")
Pkg.add("DataFrames")
Pkg.add("DelimitedFiles")
Pkg.add("CSV")
Pkg.add("XLSX")

[32m[1m   Updating[22m[39m registry at `C:\Users\tirth\.julia\registries\General`

[?25l


[32m[1m   Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`




[32m[1m  Resolving[22m[39m package versions...
[32m[1m   Updating[22m[39m `C:\Users\tirth\.julia\environments\v1.4\Project.toml`
[90m [no changes][39m
[32m[1m   Updating[22m[39m `C:\Users\tirth\.julia\environments\v1.4\Manifest.toml`
[90m [no changes][39m
[32m[1m  Resolving[22m[39m package versions...
[32m[1m   Updating[22m[39m `C:\Users\tirth\.julia\environments\v1.4\Project.toml`
[90m [no changes][39m
[32m[1m   Updating[22m[39m `C:\Users\tirth\.julia\environments\v1.4\Manifest.toml`
[90m [no changes][39m
[32m[1m  Resolving[22m[39m package versions...
[32m[1m   Updating[22m[39m `C:\Users\tirth\.julia\environments\v1.4\Project.toml`
[90m [no changes][39m
[32m[1m   Updating[22m[39m `C:\Users\tirth\.julia\environments\v1.4\Manifest.toml`
[90m [no changes][39m
[32m[1m  Resolving[22m[39m package versions...
[32m[1m   Updating[22m[39m `C:\Users\tirth\.julia\environments\v1.4\Project.toml`
[90m [no changes][39m
[32m[1m   Updating[2

In [2]:
using BenchmarkTools
using DataFrames
using DelimitedFiles
using CSV
using XLSX

### Downloading Data

In [3]:
?download # Inspect the signature and docs of download method

search: [0m[1md[22m[0m[1mo[22m[0m[1mw[22m[0m[1mn[22m[0m[1ml[22m[0m[1mo[22m[0m[1ma[22m[0m[1md[22m



```
download(url::AbstractString, [localfile::AbstractString])
```

Download a file from the given url, optionally renaming it to the given local file name. If no filename is given this will download into a randomly-named file in your temp directory. Note that this function relies on the availability of external tools such as `curl`, `wget` or `fetch` to download the file and is provided for convenience. For production use or situations in which more options are needed, please use a package that provides the desired functionality instead.

Returns the filename of the downloaded file.

---

```
download(url::Union{AbstractString, AbstractPath}, localfile::AbstractPath)
```

Download a file from the remote url and save it to the localfile path.

NOTE: Not downloading into a `localfile` directory matches the base julia behaviour. https://github.com/rofinn/FilePathsBase.jl/issues/48


In [4]:
p = download("https://raw.githubusercontent.com/nassarhuda/easy_data/master/programming_languages.csv", "programminglanguages.csv")

"programminglanguages.csv"

In [5]:
# Uncomment below lines when in Linux machine
# ;wget "https://raw.githubusercontent.com/nassarhuda/easy_data/master/programming_languages.csv"
# ;head programminglanguages.csv

### Reading data from a text file

In [6]:
?readdlm

search: [0m[1mr[22m[0m[1me[22m[0m[1ma[22m[0m[1md[22m[0m[1md[22m[0m[1ml[22m[0m[1mm[22m [0m[1mr[22m[0m[1me[22m[0m[1ma[22m[0m[1md[22m[0m[1md[22mir



```
readdlm(source, T::Type; options...)
```

The columns are assumed to be separated by one or more whitespaces. The end of line delimiter is taken as `\n`.

# Examples

```jldoctest
julia> using DelimitedFiles

julia> x = [1; 2; 3; 4];

julia> y = [5; 6; 7; 8];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x y])
       end;

julia> readdlm("delim_file.txt", Int64)
4×2 Array{Int64,2}:
 1  5
 2  6
 3  7
 4  8

julia> readdlm("delim_file.txt", Float64)
4×2 Array{Float64,2}:
 1.0  5.0
 2.0  6.0
 3.0  7.0
 4.0  8.0

julia> rm("delim_file.txt")
```

---

```
readdlm(source, delim::AbstractChar, T::Type; options...)
```

The end of line delimiter is taken as `\n`.

# Examples

```jldoctest
julia> using DelimitedFiles

julia> x = [1; 2; 3; 4];

julia> y = [1.1; 2.2; 3.3; 4.4];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x y], ',')
       end;

julia> readdlm("delim_file.txt", ',', Float64)
4×2 Array{Float64,2}:
 1.0  1.1
 2.0  2.2
 3.0  3.3
 4.0  4.4

julia> rm("delim_file.txt")
```

---

```
readdlm(source; options...)
```

The columns are assumed to be separated by one or more whitespaces. The end of line delimiter is taken as `\n`. If all data is numeric, the result will be a numeric array. If some elements cannot be parsed as numbers, a heterogeneous array of numbers and strings is returned.

# Examples

```jldoctest
julia> using DelimitedFiles

julia> x = [1; 2; 3; 4];

julia> y = ["a"; "b"; "c"; "d"];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x y])
       end;

julia> readdlm("delim_file.txt")
4×2 Array{Any,2}:
 1  "a"
 2  "b"
 3  "c"
 4  "d"

julia> rm("delim_file.txt")
```

---

```
readdlm(source, delim::AbstractChar; options...)
```

The end of line delimiter is taken as `\n`. If all data is numeric, the result will be a numeric array. If some elements cannot be parsed as numbers, a heterogeneous array of numbers and strings is returned.

# Examples

```jldoctest
julia> using DelimitedFiles

julia> x = [1; 2; 3; 4];

julia> y = [1.1; 2.2; 3.3; 4.4];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x y], ',')
       end;

julia> readdlm("delim_file.txt", ',')
4×2 Array{Float64,2}:
 1.0  1.1
 2.0  2.2
 3.0  3.3
 4.0  4.4

julia> z = ["a"; "b"; "c"; "d"];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x z], ',')
       end;

julia> readdlm("delim_file.txt", ',')
4×2 Array{Any,2}:
 1  "a"
 2  "b"
 3  "c"
 4  "d"

julia> rm("delim_file.txt")
```

---

```
readdlm(source, delim::AbstractChar, eol::AbstractChar; options...)
```

If all data is numeric, the result will be a numeric array. If some elements cannot be parsed as numbers, a heterogeneous array of numbers and strings is returned.

---

```
readdlm(source, delim::AbstractChar, T::Type, eol::AbstractChar; header=false, skipstart=0, skipblanks=true, use_mmap, quotes=true, dims, comments=false, comment_char='#')
```

Read a matrix from the source where each line (separated by `eol`) gives one row, with elements separated by the given delimiter. The source can be a text file, stream or byte array. Memory mapped files can be used by passing the byte array representation of the mapped segment as source.

If `T` is a numeric type, the result is an array of that type, with any non-numeric elements as `NaN` for floating-point types, or zero. Other useful values of `T` include `String`, `AbstractString`, and `Any`.

If `header` is `true`, the first row of data will be read as header and the tuple `(data_cells, header_cells)` is returned instead of only `data_cells`.

Specifying `skipstart` will ignore the corresponding number of initial lines from the input.

If `skipblanks` is `true`, blank lines in the input will be ignored.

If `use_mmap` is `true`, the file specified by `source` is memory mapped for potential speedups. Default is `true` except on Windows. On Windows, you may want to specify `true` if the file is large, and is only read once and not written to.

If `quotes` is `true`, columns enclosed within double-quote (") characters are allowed to contain new lines and column delimiters. Double-quote characters within a quoted field must be escaped with another double-quote.  Specifying `dims` as a tuple of the expected rows and columns (including header, if any) may speed up reading of large files.  If `comments` is `true`, lines beginning with `comment_char` and text following `comment_char` in any line are ignored.

# Examples

```jldoctest
julia> using DelimitedFiles

julia> x = [1; 2; 3; 4];

julia> y = [5; 6; 7; 8];

julia> open("delim_file.txt", "w") do io
           writedlm(io, [x y])
       end

julia> readdlm("delim_file.txt", '\t', Int, '\n')
4×2 Array{Int64,2}:
 1  5
 2  6
 3  7
 4  8

julia> rm("delim_file.txt")
```


In [7]:
#=
Shorthand for Read delimited => reads delimited files.
readdlm(
    source,
    delim::AbstractChar,
    T::Type,
    eol::AbstractChar,
    header=false,
    skipstart=0,
    skipblancks=true,
    use_mmap,
    quotes=true,
    dims,
    comments=false,
    commentchar='#'
)
=#
P, H = readdlm("programminglanguages.csv", ',', header=true)

(Any[1951 "Regional Assembly Language"; 1952 "Autocode"; … ; 2012 "Julia"; 2014 "Swift"], AbstractString["year" "language"])

In [8]:
@show typeof(P)
P[1:10, :]

typeof(P) = Array{Any,2}


10×2 Array{Any,2}:
 1951  "Regional Assembly Language"
 1952  "Autocode"
 1954  "IPL"
 1955  "FLOW-MATIC"
 1957  "FORTRAN"
 1957  "COMTRAN"
 1958  "LISP"
 1958  "ALGOL 58"
 1959  "FACT"
 1959  "COBOL"

In [9]:
@show typeof(H)
H

typeof(H) = Array{AbstractString,2}


1×2 Array{AbstractString,2}:
 "year"  "language"

In [10]:
#=
Shorthand for Write delimied data => writes data in delimeted format
writedlm
=#
writedlm("data/progamminglanguages_copy.txt", P, '-')

### Reading data using CSV

In [11]:
C = CSV.read("programminglanguages.csv");

In [12]:
@show typeof(C)
C[1:10, :]

typeof(C) = DataFrame


Unnamed: 0_level_0,year,language
Unnamed: 0_level_1,Int64,String
1,1951,Regional Assembly Language
2,1952,Autocode
3,1954,IPL
4,1955,FLOW-MATIC
5,1957,FORTRAN
6,1957,COMTRAN
7,1958,LISP
8,1958,ALGOL 58
9,1959,FACT
10,1959,COBOL


In [13]:
names(C)

2-element Array{String,1}:
 "year"
 "language"

In [14]:
@show C.year
@show C.language
describe(C)

C.year = [1951, 1952, 1954, 1955, 1957, 1957, 1958, 1958, 1959, 1959, 1959, 1962, 1962, 1962, 1963, 1964, 1964, 1964, 1966, 1967, 1968, 1969, 1970, 1970, 1972, 1972, 1972, 1973, 1975, 1978, 1980, 1983, 1984, 1984, 1984, 1985, 1986, 1986, 1986, 1987, 1988, 1988, 1989, 1990, 1991, 1991, 1993, 1993, 1994, 1995, 1995, 1995, 1995, 1995, 1995, 1997, 2000, 2001, 2001, 2002, 2003, 2003, 2005, 2006, 2007, 2009, 2010, 2011, 2011, 2011, 2011, 2012, 2014]
C.language = ["Regional Assembly Language", "Autocode", "IPL", "FLOW-MATIC", "FORTRAN", "COMTRAN", "LISP", "ALGOL 58", "FACT", "COBOL", "RPG", "APL", "Simula", "SNOBOL", "CPL", "Speakeasy", "BASIC", "PL/I", "JOSS", "BCPL", "Logo", "B", "Pascal", "Forth", "C", "Smalltalk", "Prolog", "ML", "Scheme", "SQL ", "C++ ", "Ada", "Common Lisp", "MATLAB", "dBase III", "Eiffel", "Objective-C", "LabVIEW ", "Erlang", "Perl", "Tcl", "Wolfram Language ", "FL ", "Haskell", "Python", "Visual Basic", "Lua", "R", "CLOS ", "Ruby", "Ada 95", "Java", "Delphi ", "JavaSc

Unnamed: 0_level_0,variable,mean,min,median,max,nunique,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Union…,Nothing,DataType
1,year,1982.99,1951,1986.0,2014,,,Int64
2,language,,ALGOL 58,,dBase III,73.0,,String


In [15]:
@btime P, H = readdlm("programminglanguages.csv", ',', header=true);
@btime C = CSV.read("programminglanguages.csv");

  241.200 μs (323 allocations: 51.14 KiB)
  356.600 μs (190 allocations: 19.58 KiB)


In [16]:
# Writing a CSV file using CSV
# Call to `DataFrame` converts a non-dataframe
# object to a dataframe object.
CSV.write("data/programminglanguages_copy.csv", DataFrame(P))

"programminglanguages_copy.csv"

### Reading a XLSX file

In [17]:
T = XLSX.readdata("data/zillow_data_download_april2020.xlsx",
    "Sale_counts_city", # sheet name
    "A1:P9" # cell range
)

9×16 Array{Any,2}:
      "RegionID"  "RegionName"    …      "2009-01"      "2009-02"
  6181            "New York"             missing        missing
 12447            "Los Angeles"      1523           1514
 39051            "Houston"          1776           1918
 17426            "Chicago"          1897           1618
  6915            "San Antonio"   …   856           1016
 13271            "Philadelphia"     1152           1029
 40326            "Phoenix"          1891           2174
 18959            "Las Vegas"        2087           2097

In [18]:
G = XLSX.readtable("data/zillow_data_download_april2020.xlsx", "Sale_counts_city");

In [19]:
G[1] # data

148-element Array{Any,1}:
 Any[6181, 12447, 39051, 17426, 6915, 13271, 40326, 18959, 54296, 38128  …  396952, 397236, 398030, 398104, 398357, 398712, 398716, 399081, 737789, 760882]
 Any["New York", "Los Angeles", "Houston", "Chicago", "San Antonio", "Philadelphia", "Phoenix", "Las Vegas", "San Diego", "Dallas"  …  "Barnard Plantation", "Windsor Place", "Stockbridge", "Mattamiscontis", "Chase Stream", "Bowdoin College Grant West", "Summerset", "Long Pond", "Hideout", "Ebeemee"]
 Any["New York", "California", "Texas", "Illinois", "Texas", "Pennsylvania", "Arizona", "Nevada", "California", "Texas"  …  "Maine", "Missouri", "Wisconsin", "Maine", "Maine", "Maine", "South Dakota", "Maine", "Utah", "Maine"]
 Any[1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  28750, 28751, 28752, 28753, 28754, 28755, 28756, 28757, 28758, 28759]
 Any[missing, 1446, 2926, 2910, 1479, 1609, 1310, 1618, 772, 1158  …  0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
 Any[missing, 1705, 3121, 3022, 1529, 1795, 1519, 1856, 1057, 1232  …  0, 0, 0, 0

In [20]:
G[2] # Header

148-element Array{Symbol,1}:
 :RegionID
 :RegionName
 :StateName
 :SizeRank
 Symbol("2008-03")
 Symbol("2008-04")
 Symbol("2008-05")
 Symbol("2008-06")
 Symbol("2008-07")
 Symbol("2008-08")
 Symbol("2008-09")
 Symbol("2008-10")
 Symbol("2008-11")
 ⋮
 Symbol("2019-03")
 Symbol("2019-04")
 Symbol("2019-05")
 Symbol("2019-06")
 Symbol("2019-07")
 Symbol("2019-08")
 Symbol("2019-09")
 Symbol("2019-10")
 Symbol("2019-11")
 Symbol("2019-12")
 Symbol("2020-01")
 Symbol("2020-02")

In [21]:
G[1][1][1:10] # takes first column of data and shows the first 10 entries

10-element Array{Any,1}:
  6181
 12447
 39051
 17426
  6915
 13271
 40326
 18959
 54296
 38128

In [22]:
# We can use ellipsis after the name of a tuple to unpack it!!!
D = DataFrame(G...)

Unnamed: 0_level_0,RegionID,RegionName,StateName,SizeRank,2008-03,2008-04,2008-05
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any,Any
1,6181,New York,New York,1,missing,missing,missing
2,12447,Los Angeles,California,2,1446,1705,1795
3,39051,Houston,Texas,3,2926,3121,3220
4,17426,Chicago,Illinois,4,2910,3022,2937
5,6915,San Antonio,Texas,5,1479,1529,1582
6,13271,Philadelphia,Pennsylvania,6,1609,1795,1709
7,40326,Phoenix,Arizona,7,1310,1519,1654
8,18959,Las Vegas,Nevada,8,1618,1856,1961
9,54296,San Diego,California,9,772,1057,1195
10,38128,Dallas,Texas,10,1158,1232,1240


In [23]:
?innerjoin

search: [0m[1mi[22m[0m[1mn[22m[0m[1mn[22m[0m[1me[22m[0m[1mr[22m[0m[1mj[22m[0m[1mo[22m[0m[1mi[22m[0m[1mn[22m



```
innerjoin(df1, df2; on, makeunique = false,
          validate = (false, false))
innerjoin(df1, df2, dfs...; on, makeunique = false,
          validate = (false, false))
```

Perform an inner join of two or more data frame objects and return a `DataFrame` containing the result. An inner join includes rows with keys that match in all passed data frames.

# Arguments

  * `df1`, `df2`, `dfs...`: the `AbstractDataFrames` to be joined

# Keyword Arguments

  * `on` : A column name to join `df1` and `df2` on. If the columns on which `df1` and `df2` will be joined have different names, then a `left=>right` pair can be passed. It is also allowed to perform a join on multiple columns, in which case a vector of column names or column name pairs can be passed (mixing names and pairs is allowed). If more than two data frames are joined then only a column name or a vector of column names are allowed. `on` is a required argument.
  * `makeunique` : if `false` (the default), an error will be raised if duplicate names are found in columns not joined on; if `true`, duplicate names will be suffixed with `_i` (`i` starting at 1 for the first duplicate).
  * `validate` : whether to check that columns passed as the `on` argument  define unique keys in each input data frame (according to `isequal`).  Can be a tuple or a pair, with the first element indicating whether to  run check for `df1` and the second element for `df2`.  By default no check is performed.

When merging `on` categorical columns that differ in the ordering of their levels, the ordering of the left data frame takes precedence over the ordering of the right data frame.

If more than two data frames are passed, the join is performed recursively with left associativity. In this case the `validate` keyword argument is applied recursively with left associativity.

See also: [`leftjoin`](@ref), [`rightjoin`](@ref), [`outerjoin`](@ref),           [`semijoin`](@ref), [`antijoin`](@ref), [`crossjoin`](@ref).

# Examples

```julia
julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"])
3×2 DataFrame
│ Row │ ID    │ Name      │
│     │ Int64 │ String    │
├─────┼───────┼───────────┤
│ 1   │ 1     │ John Doe  │
│ 2   │ 2     │ Jane Doe  │
│ 3   │ 3     │ Joe Blogs │

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"])
3×2 DataFrame
│ Row │ ID    │ Job    │
│     │ Int64 │ String │
├─────┼───────┼────────┤
│ 1   │ 1     │ Lawyer │
│ 2   │ 2     │ Doctor │
│ 3   │ 4     │ Farmer │

julia> innerjoin(name, job, on = :ID)
2×3 DataFrame
│ Row │ ID    │ Name     │ Job    │
│     │ Int64 │ String   │ String │
├─────┼───────┼──────────┼────────┤
│ 1   │ 1     │ John Doe │ Lawyer │
│ 2   │ 2     │ Jane Doe │ Doctor │

julia> job2 = DataFrame(identifier = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"])
3×2 DataFrame
│ Row │ identifier │ Job    │
│     │ Int64      │ String │
├─────┼────────────┼────────┤
│ 1   │ 1          │ Lawyer │
│ 2   │ 2          │ Doctor │
│ 3   │ 4          │ Farmer │

julia> innerjoin(name, job2, on = :ID => :identifier)
2×3 DataFrame
│ Row │ ID    │ Name     │ Job    │
│     │ Int64 │ String   │ String │
├─────┼───────┼──────────┼────────┤
│ 1   │ 1     │ John Doe │ Lawyer │
│ 2   │ 2     │ Jane Doe │ Doctor │

julia> innerjoin(name, job2, on = [:ID => :identifier])
2×3 DataFrame
│ Row │ ID    │ Name     │ Job    │
│     │ Int64 │ String   │ String │
├─────┼───────┼──────────┼────────┤
│ 1   │ 1     │ John Doe │ Lawyer │
│ 2   │ 2     │ Jane Doe │ Doctor │
```


In [24]:
# Joining the dataframes in Julia
foods = ["apple", "cucumber", "tomato", "banana"]
calories = [105, 47, 22, 105]
prices = [0.85, 1.6, 0.8, 0.6]
dataframe_calories = DataFrame(item=foods, calories=calories)
@show dataframe_calories
dataframe_prices   = DataFrame(item=foods, price=prices)
@show dataframe_prices

# See also: leftjoin, rightjoin, outerjoin, semijoin, antijoin, crossjoin.
df_inner = innerjoin(dataframe_calories, dataframe_prices, on=:item)
@show df_inner
df_outer = outerjoin(dataframe_calories, dataframe_prices, on=:item)
@show df_outer;

dataframe_calories = 4×2 DataFrame
│ Row │ item     │ calories │
│     │ String   │ Int64    │
├─────┼──────────┼──────────┤
│ 1   │ apple    │ 105      │
│ 2   │ cucumber │ 47       │
│ 3   │ tomato   │ 22       │
│ 4   │ banana   │ 105      │
dataframe_prices = 4×2 DataFrame
│ Row │ item     │ price   │
│     │ String   │ Float64 │
├─────┼──────────┼─────────┤
│ 1   │ apple    │ 0.85    │
│ 2   │ cucumber │ 1.6     │
│ 3   │ tomato   │ 0.8     │
│ 4   │ banana   │ 0.6     │
df_inner = 4×3 DataFrame
│ Row │ item     │ calories │ price   │
│     │ String   │ Int64    │ Float64 │
├─────┼──────────┼──────────┼─────────┤
│ 1   │ apple    │ 105      │ 0.85    │
│ 2   │ cucumber │ 47       │ 1.6     │
│ 3   │ tomato   │ 22       │ 0.8     │
│ 4   │ banana   │ 105      │ 0.6     │
df_outer = 4×3 DataFrame
│ Row │ item     │ calories │ price    │
│     │ String?  │ Int64?   │ Float64? │
├─────┼──────────┼──────────┼──────────┤
│ 1   │ apple    │ 105      │ 0.85     │
│ 2   │ cucumber │ 47    

In [25]:
# Writting data to disk using XLSX.writetable
# Run this on your own risk! This line crased my Jupyter
# notebook the first time I ran it.
# XLSX.writetable("data/sales_data_cpy.")

In [26]:
println("Please tell me it didn't crash on me ass")

Please tell me it didn't crash on me ass


# Processing Data in Julia!!

In [27]:
P # Remember the programming languages dataset. We will process that

73×2 Array{Any,2}:
 1951  "Regional Assembly Language"
 1952  "Autocode"
 1954  "IPL"
 1955  "FLOW-MATIC"
 1957  "FORTRAN"
 1957  "COMTRAN"
 1958  "LISP"
 1958  "ALGOL 58"
 1959  "FACT"
 1959  "COBOL"
 1959  "RPG"
 1962  "APL"
 1962  "Simula"
    ⋮  
 2003  "Scala"
 2005  "F#"
 2006  "PowerShell"
 2007  "Clojure"
 2009  "Go"
 2010  "Rust"
 2011  "Dart"
 2011  "Kotlin"
 2011  "Red"
 2011  "Elixir"
 2012  "Julia"
 2014  "Swift"

Let's try to answer some quick questions :
 - In which year was a language invented??
 - How many languages were created in a given year?

In [28]:
function year_invented(language::String, df)
    # findfirst finds the first true value in
    # a bool array.
    loc = findfirst(df[:, 2] .== language)
    return df[loc, 1]
end

year_invented (generic function with 1 method)

In [29]:
year_invented("Julia", P)

2012

In [30]:
function year_invented_with_exception(language::String, df)
    loc = findfirst(df[:, 2] .== language)
    !isnothing(loc) && return df[loc, 1]
    error("$language not found in the dataframe")
end

year_invented_with_exception (generic function with 1 method)

In [31]:
year_invented_with_exception("tirth is the best", P)

ErrorException: tirth is the best not found in the dataframe

In [32]:
function all_in_year(year::Int, df)
    loc = findall(df[:, 1] .== year)
    !isnothing(loc) && return df[loc, :]
    error("ERROR: $year not found in the dataframe")
end

all_in_year (generic function with 1 method)

In [33]:
all_in_year(2011, P)

4×2 Array{Any,2}:
 2011  "Dart"
 2011  "Kotlin"
 2011  "Red"
 2011  "Elixir"