# mdata

This repo contains:

- a [dataframe](https://github.com/mchav/dataframe) example taken from a
  kaggle run.
- [perf](https://github.com/tonyday567/perf) which is being used to
  measure performance of common usage patterns.
- [chart-svg](https://github.com/tonyday567/chart-svg) \| dataframe
  integration and development
- a live chart build using
  [prettychart](https://github.com/tonyday567/prettychart)
- some CI infrastructure to begin to measure integration.
- some pandoc conversion experiments: from org =\> markdown =\> ipynb
  (This is not catered for in nbconvert or jupytext wrt outputs)

# pandocs

We're after t3.md == t3b.md

``` bash
pandoc -i t3.md -f markdown -o t3.ipynb -t ipynb
pandoc -i t3.ipynb -f ipynb -o t3b.md -t markdown
```

# Imports

``` haskell-ng
:r

:set -XNoImplicitPrelude
:set -XImportQualifiedPost
:set -Wno-type-defaults
:set -Wno-name-shadowing
:set -XOverloadedLabels
:set -XOverloadedStrings
:set -XTupleSections
:set -XQuasiQuotes

-- base, text & bytestring encoding (compatability check, also)
import Prelude as P
import NumHask.Prelude qualified as N
import Control.Category ((>>>))
import Data.Function
import Data.Maybe
import Data.Bool
import Data.List qualified as List
import Control.Monad
import Data.Bifunctor
import Data.ByteString.Char8 qualified as C
import Data.Text qualified as T

-- prettyprinter (dev help)
import Prettyprinter

-- common dataframe imports
import DataFrame qualified as D
import DataFrame.Functions qualified as F
import DataFrame.Internal.Expression qualified as D
import DataFrame.Internal.Statistics qualified as D
import qualified Data.Vector.Algorithms.Intro as VA
import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Unboxed.Mutable as VUM

-- common chart-svg imports
import Chart
import Prettychart
import Chart.Examples
import Optics.Core hiding ((|>),(<|))
import Control.Lens qualified as Lens
import Data.Data.Lens qualified as Lens

-- dev helpers
import Perf
import Flow

-- functions not yet transferred elsewhere
import MData

-- example data from https://www.kaggle.com/competitions/playground-series-s5e11
dfTest <- D.readCsv "data/s5e11/test.csv"

```

## Live charts

This gives you a browser page and live charting capabilities.

``` haskell-ng
(display, quit) <- startChartServer (Just "mdata")
disp x = display $ x & set (#markupOptions % #markupHeight) (Just 250) & set (#hudOptions % #frames % ix 1 % #item % #buffer) 0.1
```

<http://localhost:9160/>

testing, testing; one, two, three

``` haskell-ng
disp unitExample
```

## dataframe creation

idiomatic dataframe style?

``` haskell-ng
df0 = mempty |> D.insert "item" ["person","woman","man","camera","tv"] |> D.insert "value" [20,23.1,31,16,10]
v = F.col @Double "value"
xs = D.columnAsList @Double "value" df0
xs' = (/ sum xs) <$> xs
df = D.insert "prop" xs' df0
df
```

### expr method

``` haskell-ng
df0 = mempty |> D.insert "item" ["person","woman","man","camera","tv"] |> D.insert "value" [20,23.1,31,16,10]
v = F.col @Double "value"
prop e = e / F.sum e
df = D.derive "prop" (prop v) df0
df
```

### F.sum bug?

``` haskell-ng
df0 = mempty |> D.insert "item" ["person","woman","man","camera","tv"] |> D.insert "value" [20,23.1,31,16,10]
v = F.col @Double "value"
df = D.derive "sum" (F.sum v) df0
df
```

## stacked bar

### version 1: single stacked vertical bar chart

``` haskell-ng
ls = T.pack <$> D.columnAsList @String "item" df
vs = D.columnAsList @Double "prop" df
bd = BarData (fmap pure vs) ["item"] ls
```

``` haskell-ng
bc = barChart (defaultBarOptions |> set #displayValues False |> set #barStacked Stacked |> set (#barRectStyles % each % #borderSize) 0) bd
disp bc
writeChartOptions "other/bar1.svg" bc
```

![](attachment:other/bar1.svg)

### version 2: skinny

``` haskell-ng
bc' = Lens.transformOnOf Lens.template Lens.uniplate (over chroma' (*1.5) .> over opac' (*0.6)) bc |> set (#markupOptions % #chartAspect) (FixedAspect 0.4)

disp (bc')
writeChartOptions "other/bar2.svg" bc'
```

![](attachment:other/bar2.svg)

### version 3: remove legend and embed labels

``` haskell-ng

acc0 = List.scanl' (+) 0 vs <> [1]
mids = zipWith (\a0 a1 -> (a0+a1)/2) acc0 (List.drop 1 acc0)
ct = zipWith (\c (t,a) -> TextChart (defaultTextStyle |> set #size 0.05 |> set #color (palette c |> over lightness' (*0.6))) [(t, Point zero (0.5-a))]) [0..] (zip ls mids)

bc'' = bc' |> set (#hudOptions % #legends) mempty |> over #chartTree (<> named "labels" ct)

disp (bc'')
writeChartOptions "other/bar3.svg" bc''

```

![](attachment:other/bar3.svg)

## pie secants

Pie chart convention starts at the y-axis and lays out secant slices
clockwise.

\`ra\` maps (0,1) (the proportional pie slice) into a point on a unit
circle (by this convetion).

``` haskell-ng
ra = (+(-0.25)) .> (*(-2 * pi)) .> ray @(Point Double)
secantPie (Secant o r a0 a1) = singletonPie o (ArcPosition (o N.+ ra a0) (o N.+ ra a1) (ArcInfo (Point r r) 0 False True))
```

This is a very common scan for a Column.

``` haskell-ng
acc0 = List.scanl' (+) 0 vs <> [1]
mids = zipWith (\a0 a1 -> (a0+a1)/2) acc0 (List.drop 1 acc0)

xs = zipWith (\a0 a1 -> secantPie (Secant (0.05 N.*| ra ((a0+a1)/2)) one a0 a1)) acc0 (List.drop 1 acc0)

cs = zipWith (\c x -> PathChart (defaultPathStyle |> set #borderSize 0 |> set #color (paletteO c 0.3)) x) [0..] xs

ct = zipWith (\c (t,a) -> TextChart (defaultTextStyle |> set #size 0.05 |> set #color (palette c & over lightness' (*0.6))) [(t, 0.7 N.*| ra a)]) [0..] (zip ls mids)
co = (mempty :: ChartOptions) & set (#markupOptions % #chartAspect) ChartAspect & set #chartTree ((cs <> ct) |> unnamed)
disp co
writeChartOptions "other/pie.svg" co
```

![](attachment:other/pie.svg)

![](attachment:other/pie.svg)

# kaggle example

## Initial build

``` bash
cabal init  --non-interactive mdata -d "base,dataframe,perf,chart-svg,prettychart,vector"
```

## dataframe check

``` haskell-ng
D.describeColumns df
```

``` haskell-ng
D.summarize df
```

# chart dev

## boxPlot example

``` haskell-ng
c0 = (either (error . show) id) (D.columnAsDoubleVector "interest_rate" df)
ch = boxPlot defaultBoxPlotOptions c0
writeChartOptions "other/box1.svg" ch
disp ch
```

![](attachment:other/box1.svg)

## scatterPlot example

``` example
True
```

``` haskell-ng
c0 = (either (error . show) id) (D.columnAsDoubleVector "interest_rate" df)
c1 = (either (error . show) id) (D.columnAsDoubleVector "loan_amount" df)

ch = GlyphChart defaultGlyphStyle (Prelude.take 1000 $ zipWith Point (VU.toList c0) (VU.toList c1))

ch' = (mempty :: ChartOptions) & set #chartTree (named "scatterPlot" [ch]) & set #hudOptions defaultHudOptions & set (#hudOptions % #titles) [(Priority 8 (defaultTitleOptions "interest_rate" & set #place PlaceBottom & set (#style % #size) 0.06)),(Priority 8 (defaultTitleOptions "loan_amount" & set #place PlaceLeft & set (#style % #size) 0.06 & set #buffer 0.1))]

writeChartOptions "other/scatter1.svg" ch'
disp ch'
```

Using MData.scatterPlot

``` haskell-ng
v0 = (either (error . show) id) (D.columnAsDoubleVector "interest_rate" df)
v1 = (either (error . show) id) (D.columnAsDoubleVector "loan_amount" df)
ch = scatterPlot defaultScatterPlotOptions (Just "interest_rate", v0) (Just "loan_amount", v1)

writeChartOptions "other/scatter1.svg" ch
disp ch
```

![](attachment:other/scatter1.svg)

# reference

Comparable python:

<https://www.kaggle.com/code/ravitejagonnabathula/predicting-loan-payback>

notebook best practice:

<https://marimo.io/blog/lessons-learned>

converting to ipynb:

<https://pandoc.org/installing.html>

``` bash
pandoc readme.md -o mdata.ipynb
```

chart-svg api tree

<https://hackage-content.haskell.org/package/chart-svg-0.8.2.1/docs/other/ast.svg>

# (deprecated) testing snippets

## file read testing

It's a good chunky first example.

``` haskell-ng
s <- readFile "other/test.csv"
length s
```

``` haskell-ng
rf = readFile "other/test.csv"
(m,n) <- tickIO (length <$> rf)
print n
toSecs m
```

``` haskell-ng
(m,df) <- tickIO (D.readCsv "other/test.csv")
print $ toSecs m
:t df
```

Example data is from
<https://www.kaggle.com/competitions/playground-series-s5e11>

## get a Column and compute quartiles.

``` haskell-ng
c = (either (error . show) id) (columnAsDoubleVector "interest_rate" df)
:t c
q4s = VU.toList $ quantiles' (VU.fromList [0,1,2,3,4]) 4 c
:t q4s
q4s
```

## box plot constructor

A box plot is:

- (maybe) a vertical tick at the min
- a LineChart from min to q1
- a RectChart from q1 to q2
- a RectChart from q2 to q3
- a LineChart q3 to max
- (maybe) a vertical tick at the max

``` haskell-ng
l1 = LineChart defaultLineStyle [[Point (q4s !! 0) 0.5, Point (q4s !! 1) 0.5]]
l2 = LineChart defaultLineStyle [[Point (q4s !! 3) 0.5, Point (q4s !! 4) 0.5]]
r1 = RectChart defaultRectStyle [Rect (q4s !! 1) (q4s !! 2) 0 1]
r2 = RectChart defaultRectStyle [Rect (q4s !! 2) (q4s !! 3) 0 1]
```

``` haskell-ng
c = (mempty :: ChartOptions) & set #hudOptions defaultHudOptions & set #chartTree (unnamed [l1,r1,r2,l2])
```

``` haskell-ng
disp c
```

``` haskell-ng
writeChartOptions "other/c.svg" c
```

![](attachment:other/c.svg)

## vertical version

``` haskell-ng
qs = q4s
l1 = LineChart defaultLineStyle [[Point 0.5 (qs !! 0), Point 0.5 (qs !! 1)]]
l2 = LineChart defaultLineStyle [[Point 0.5 (qs !! 3), Point 0.5 (qs !! 4)]]
r1 = RectChart defaultRectStyle [Rect 0 1 (qs !! 1) (qs !! 2)]
r2 = RectChart defaultRectStyle [Rect 0 1 (qs !! 2) (qs !! 3)]
```

``` haskell-ng
c = (mempty :: ChartOptions) & set (#markupOptions % #chartAspect) (FixedAspect 0.25) & set #hudOptions defaultHudOptions & over (#hudOptions % #axes) (Prelude.drop 1) & set #chartTree (named "boxplot" [l1,r1,r2,l2])
disp c
```