# SPB IX - Data access

---

In this task you'll learn about the exploration and manipulation of tabular data using `data frames`. For this, first we take a look at the core building block of data frames: the `series`. These then get combined to a data frame.

### The following documentation may be helpful in completing the tasks:

* Deedle: https://fslab.org/Deedle


### Referencing the necessary libraries

The following cell must be executed once, otherwise you can not use the referenced libraries:

In [None]:
#r "nuget: Deedle, 3.0.0-beta.1"
#r "nuget: Deedle.Interactive, 3.0.0-beta.1"
open Deedle

### Working with Deedle
Should you see this error message:
```
9_Data_exploration_using_FSharp.fsx(113,5): error FS0030: Value restriction. The value 'cpw'' has been inferred to have generic type val cpw' : Series<(string * int),'_a>      
Either define 'cpw'' as a simple data term, make it a function with explicit arguments or, if you do not intend for it to be generic, add a type annotation.
```
Then you should resort to an explicit type annotation.
Instead of:
```
let cpw' = persons |> Frame.getCol "cpw"
```
You should use:
```
let cpw' :Series<int,float> = persons |> Frame.getCol "cpw"


In [None]:
Series.ofValues ["Kevin";"Lukas";"Benedikt";"Michael"]

# Task 1: Basics

## Task 1.1
Use the function `Series.mapValues` to triple the values of `"coffeesPerWeek"`.

In [None]:
let coffeesPerWeek  = Series.ofValues [14;16;5;1] 

## Task 1.2
Create a frame based on the 3 given Series and bind it to the name `"persons"`.


In [None]:
let firstNames      = Series.ofValues ["Kevin";"Lukas";"Benedikt";"Michael"]
let lastNames       = Series.ofValues ["Schneider";"Weil";"Venn";"Schroda"]  
let group           = Series.ofValues ["CSB";"CSB";"CSB";"MBS"] 


## Task 1.3
Add a newly created series named `"teasPerWeek"` and the given series `"coffeesPerWeek"` as columns to the frame. Bind the resulting frame to a new name.
Tip: Create a `Series<int,int>` first. Use `Frame.addCol`



In [None]:
// example: 
// persons is the result of Task 1.2
// persons
// |> Frame.addCol "coffeesPerWeek" coffeesPerWeek

## Task 1.4
Add the columns `"teasPerWeek"` and `"coffeesPerWeek"`. Add the resulting series as a column with the title `"completeConsume"` to the previously created frame.

Tip 1: This task can be solved in several ways.

Tip 2: Via `Series.values` you can access the values of each Series. Then you could iterate over both collections with `Seq.map2`. 



## Task 1.5
Determine the sum of `"completeConsume"`.



# Task 2: Frame Operations




## Task 2.1
Group the rows of the frame from Task 1.3 according to the elements of the `"group"` column.
Tip: Explicit type annotation (see: [Working with Deedle](#Working-with-Deedle)). 

## Task 2.2
Results tables often contain more than 40 columns. However, only a few are interesting for individual analysis. 
It is therefore often useful to create a frame that contains fewer columns. Use the function `Frame.sliceCols` to create a frame on the base of the 
frame from Task 1.3 to create a frame that contains only the columns `"firstNames"` and `"teasPerWeek"`.



## Task 2.3
Many times you want to aggregate based on groupings. Calculate the sum of the `"teasPerWeek"` column for each group.
Tip: Extract the `"teasPerWeek"` column from the result of Task 2.1. Proceed as demonstrated in the lecture. 



## Task 2.4
Often you want to save intermediate results. Save the frame from task 1.3 as a CSV file. Use `';'` as separator.



## Task 2.5
Use the function `Frame.ReadCsv` to read the file again.

