### Introduction to DataFrames
Bogumił Kamiński, July 25, 2018

Let's get started by loading the DataFrames package.

In [1]:
using DataFrames

### Constructors and conversion
#### Constructors
In this section, you'll see many ways to create a DataFrame using the DataFrame() constructor.

First, we could create an empty DataFrame,

In [2]:
DataFrame()

##### Or we could call the constructor using keyword arguments to add columns to the DataFrame.

In [5]:
DataFrame(A=1:3,B=rand(3),C=randstring.([3,3,3]))

Add `using Random` to your imports.
  likely near In[5]:1


Unnamed: 0,A,B,C
1,1,0.534296,VV7
2,2,0.0300075,h6C
3,3,0.128319,xg6


In [7]:
x=Dict("A"=>[1,2],"B"=>["a","b"],"C"=>[true,false])
DataFrame(x)

Unnamed: 0,A,B,C
1,1,a,True
2,2,b,False


Rather than explicitly creating a dictionary first, as above, we could pass DataFrame arguments with the syntax of dictionary key-value pairs.

Note that in this case, we use symbols to denote the column names and arguments are not sorted. For example, :A, the symbol, produces A, the name of the first column here:

In [8]:
DataFrame(:A=>rand(2),:B=>["a","b"],:C=>[true,false])

Unnamed: 0,A,B,C
1,0.691821,a,True
2,0.232557,b,False


Pass a second argument to give the columns names.

In [9]:
DataFrame([1:3,4:6,7:9],[:A,:B,:C])

Unnamed: 0,A,B,C
1,1,4,7
2,2,5,8
3,3,6,9



Here we create a DataFrame from a matrix,

In [10]:
DataFrame(rand(3,4))

Unnamed: 0,x1,x2,x3,x4
1,0.926772,0.547195,0.648547,0.503928
2,0.917894,0.266197,0.30814,0.740986
3,0.0381707,0.699405,0.993821,0.527116


In [11]:
DataFrame(rand(3,4), Symbol.('a':'d'))

Unnamed: 0,a,b,c,d
1,0.0500302,0.806752,0.565299,0.835141
2,0.510626,0.456285,0.715045,0.797076
3,0.128075,0.471924,0.427107,0.941769


We can also construct an uninitialized DataFrame.

Here we pass column types, names and number of rows; we get missing in column :C because Any >: Missing.

In [14]:
DataFrame([Int,Float64,String],[:A,:B,:C],2)

Unnamed: 0,A,B,C
1,139929756403920,6.91345e-310,#undef
2,139928372720784,6.91345e-310,#undef


This syntax gives us a quick way to create homogenous DataFrame.

In [16]:
DataFrame(Int,3,5)

Unnamed: 0,x1,x2,x3,x4,x5
1,281479271677952,281479271677952,282574488338432,281479271677952,281479271677952
2,139925739536384,0,139925739536384,139925739536384,139925739536384
3,0,0,0,0,0


In [17]:
DataFrame([Int, Float64], 4)

Unnamed: 0,x1,x2
1,282574488338432,1.39069e-309
2,139925739536384,6.91325e-310
3,0,0.0
4,0,0.0
