# Ruby World Conference 2016, Matsue City

# Scientific Computing in Ruby

# IRuby

## This is an iruby notebook.

## It is a browser based Ruby REPL

## Each 'cell' can take Markdown or Ruby code as input.

## Your output can be any Ruby object or you can write a `#to_html` method to print Ruby objects in good looking HTML right in your browser.

In [2]:
result = {}
result[:rows] = 4
result[:cols] = 4
result[:num] = 5

{:rows=>"4", :cols=>"4", :num=>"5", :done=>true}

In [3]:
require 'matrix'

num = result[:num]
rows = result[:rows]
cols = result[:cols]

Matrix[*[[num]*rows]*cols]

Matrix[[5, 5, 5, 5], [5, 5, 5, 5], [5, 5, 5, 5], [5, 5, 5, 5]]

# NMatrix

In [4]:
require 'nmatrix'

true

## Create an 4x4 NMatrix containing 64 bit floating point numbers.

### Specify the shape, elements and dtype. The storage type is dense by default.

In [5]:
NMatrix.new([4,4], [1,2,3,4]*4, dtype: :float64)

#<NMatrix:0x8cd2d34 shape:[4,4] dtype:float64 stype:dense>

### Specify the data type as `:int8` to create a matrix of 8 bit numbers.

In [6]:
n = NMatrix.new([4,4], [1,2,3,129]*4, dtype: :int8, stype: :yale)

#<NMatrix:0x8b41b00 shape:[4,4] dtype:int8 stype:yale capacity:17>

### Select elements with the `#[]` operator.

In [7]:
n[0,3]

-127

### You can also assign with `#[]=`.

In [8]:
n[0,1] = 56
n

#<NMatrix:0x8b41b00 shape:[4,4] dtype:int8 stype:yale capacity:17>

### ...but cannot expand the size of the NMatrix, unlike Ruby Array which is expandable.

In [9]:
n[0,4] = 43

RangeError: slice is larger than matrix in dimension 1 (slice component 2)

### Load the NMatrix ATLAS plugin.

ATLAS (Automatically Tuned Linear Algebra Software) is a very fast C library for linear algebra. NMatrix exposes almost all of it's functions through Ruby.

In [10]:
require 'nmatrix/atlas'

true

### Use the `NMatrix#dot` function for performing matrix multiplication

In [11]:
require 'benchmark'

true

In [12]:
Benchmark.bm do |x|
  [5, 10, 50, 100, 150, 200].each do |size|
    x.report("nm-atlas with size #{size}") do
      n = NMatrix.new([size,size], [1]*size*size, dtype: :float32)
      n.dot(n)
    end
  end
end
nil

       user     system      total        real
nm-atlas with size 5  0.000000   0.000000   0.000000 (  0.042805)
nm-atlas with size 10  0.000000   0.000000   0.000000 (  0.000070)
nm-atlas with size 50  0.010000   0.000000   0.010000 (  0.003241)
nm-atlas with size 100  0.000000   0.000000   0.000000 (  0.034316)
nm-atlas with size 150  0.010000   0.000000   0.010000 (  0.006806)
nm-atlas with size 200  0.010000   0.000000   0.010000 (  0.014408)


### Now let's benchmark the same multiplication with Ruby `Matrix#*` (which is the same as `NMatrix#dot`).

In [13]:
require 'matrix'
Benchmark.bm do |x|
  [5, 10, 50, 100, 150, 200].each do |size|
    x.report("ruby matrix with size #{size}") do
      n = Matrix[*[[1]*size]*size]
      n * n
    end
  end
end
nil

       user     system      total        real
ruby matrix with size 5  0.000000   0.000000   0.000000 (  0.000192)
ruby matrix with size 10  0.000000   0.000000   0.000000 (  0.000910)
ruby matrix with size 50  0.090000   0.000000   0.090000 (  0.088591)
ruby matrix with size 100  0.430000   0.000000   0.430000 (  0.431833)
ruby matrix with size 150  1.320000   0.000000   1.320000 (  1.317862)
ruby matrix with size 200  3.290000   0.000000   3.290000 (  3.291083)


### Solving systems of linear equations with NMatrix

#### Say you have the following the three equations..

$$x + y − z = 4$$

$$x − 2y + 3z = −6$$

$$2x + 3y + z = 7$$

#### These can be expressed as the following matrices for representing co-efficients and varibles:

In [14]:
coeffs = NMatrix.new([3,3],
  [1, 1,-1,
   1,-2, 3,
   2, 3, 1], dtype: :float32)

#<NMatrix:0x857d7a0 shape:[3,3] dtype:float32 stype:dense>

In [15]:
rhs = NMatrix.new([3,1],
  [4,
  -6,
   7], dtype: :float32)

#<NMatrix:0x851932c shape:[3,1] dtype:float32 stype:dense>

#### Compute the solution with NMatrix#solve method. Uses the ATLAS xGESV function internally.

In [16]:
solution = coeffs.solve(rhs)

#<NMatrix:0x8485348 shape:[3,1] dtype:float32 stype:dense>


# Nyaplot

In [17]:
require 'nyaplot'

"if(window['d3'] === undefined ||\n   window['Nyaplot'] === undefined){\n    var path = {\"d3\":\"https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min\",\"downloadable\":\"http://cdn.rawgit.com/domitry/d3-downloadable/master/d3-downloadable\"};\n\n\n\n    var shim = {\"d3\":{\"exports\":\"d3\"},\"downloadable\":{\"exports\":\"downloadable\"}};\n\n    require.config({paths: path, shim:shim});\n\n\nrequire(['d3'], function(d3){window['d3']=d3;console.log('finished loading d3');require(['downloadable'], function(downloadable){window['downloadable']=downloadable;console.log('finished loading downloadable');\n\n\tvar script = d3.select(\"head\")\n\t    .append(\"script\")\n\t    .attr(\"src\", \"http://cdn.rawgit.com/domitry/Nyaplotjs/master/release/nyaplot.js\")\n\t    .attr(\"async\", true);\n\n\tscript[0][0].onload = script[0][0].onreadystatechange = function(){\n\n\n\t    var event = document.createEvent(\"HTMLEvents\");\n\t    event.initEvent(\"load_nyaplot\",false,false);\n\t    win

true

## Plot a simple line graph of a sine function.

In [18]:
x = Array.new(360) {|i| i}
y = x.map { |i| Math::sin(i*Math::PI/180)}

p = Nyaplot::Plot.new
p.add(:line, x, y)
p.show

## Plot multiple line and scatter plots on the same diagram with legend.

In [19]:
# Curate data
x = Array.new(360) {|i| i}
siny = x.map { |i| Math::sin(i*Math::PI/180) }
cosy = x.map { |i| Math::cos(i*Math::PI/180) }

# Plot this
p = Nyaplot::Plot.new
p.add(:line, x, siny).color("#FF0000").title("SINE WAVE")
p.add(:scatter, x, cosy).color("#00FF00").title("COSINE WAVE")
p.legend true
p.show

## Plot bar graph and histogram separately but filter the X axis of the line graph.

In [20]:
# Curate dat
require 'countries'
data = {
  :band => [],
  :popularity => [],
  :country => []
  }
bands = ['Metallica', 'Megadeth', 'Iron Maiden', 'Porcupine Tree']
countries = ISO3166::Country.find_all_countries_by_region('Asia')
50.times do |idx|
  data[:band] << bands.sample
  data[:popularity] << rand(100)
  data[:country] << countries[idx].name
end

df = Nyaplot::DataFrame.new(data)

band,popularity,country
Metallica,47,United Arab Emirates
Porcupine Tree,40,Afghanistan
Metallica,62,Armenia
Porcupine Tree,52,Azerbaijan
Metallica,94,Bangladesh
Porcupine Tree,62,Bahrain
Porcupine Tree,72,Brunei Darussalam
Iron Maiden,69,Bhutan
Metallica,38,China
Porcupine Tree,31,Cyprus


In [21]:
# Histogram
popularity = Nyaplot::Plot.new
popularity.add_with_df(df, :histogram, :popularity)
popularity.configure do
  x_label('Popularity')
  y_label('Frequency')
  filter({target:'x'})
  yrange([0,10])
end

# Bar Graph
band = Nyaplot::Plot.new
band.add_with_df(df, :bar, :band)
band.configure do
  x_label('Band name')
  y_label('Frequency')
end

frame = Nyaplot::Frame.new
frame.add(popularity)
frame.add(band)
frame.show

### Verify that the plots are correct

In [22]:
df.filter {|row| row[:popularity] < 20}

band,popularity,country
Porcupine Tree,7,Georgia
Porcupine Tree,19,Hong Kong
Metallica,5,Mongolia
Iron Maiden,16,"Palestine, State of"
Metallica,7,Singapore
Iron Maiden,0,Thailand
Megadeth,18,Uzbekistan


# daru (Data Analysis in RUby)

## daru is a Ruby gem for analysis, manipulation and cleaning of data. It works well with all the above gems and makes it very easy to perform complex data analysis, cleaning and visualization tasks.

In [23]:
require 'daru'

true

## A simple case of `Daru::Vector`

In [24]:
index = Daru::Index.new(['a','b','ef','gh','i', 'j'])
vec = Daru::Vector.new([1,4,6,4,3,7], index: index)

Daru::Vector(6),Daru::Vector(6).1
a,1
b,4
ef,6
gh,4
i,3
j,7


### Select an element by index

In [25]:
vec['gh']

4

In [26]:
vec['a', 'gh']

Daru::Vector(2),Daru::Vector(2).1
a,1
gh,4


## A simple case of Daru::DataFrame

If you leave out the index, it will index from 0 to size-1 by default.

In [27]:
df = Daru::DataFrame.new({
  a: [1,2,3,4,5],
  b: 'a'..'e',
  c: Array.new(5) {|i| i}
  })
df

Daru::DataFrame(5x3),Daru::DataFrame(5x3),Daru::DataFrame(5x3),Daru::DataFrame(5x3)
Unnamed: 0_level_1,a,b,c
0,1,a,0
1,2,b,1
2,3,c,2
3,4,d,3
4,5,e,4


Data can be indexed with a Daru::Index or one of it's subclasses:

* `Daru::MultiIndex`
* `Daru::DateTimeIndex`
* `Daru::CategoricalIndex`

## A MultiIndex is a hierarchical index

In [28]:
multi_index = Daru::MultiIndex.from_tuples([
  [:a, :b, :c],
  [:a, :b, :d],
  [:a, :b, :p],
  [:a, :q, :p],
  [:b, :r, :f],
  [:c, :o, :t],
  [:c, :p, :w]
  ])

Daru::MultiIndex(7x3),Daru::MultiIndex(7x3),Daru::MultiIndex(7x3)
a,b,c
a,b,d
a,b,p
a,q,p
b,r,f
c,o,t
c,p,w


## It allows you create and query hierarchically named data

In [29]:
vec = Daru::Vector.new([1,2,3]*2 << 66, index: multi_index)

Daru::Vector(7),Daru::Vector(7).1,Daru::Vector(7).2,Daru::Vector(7).3
a,b,c,1
a,b,d,2
a,b,p,3
a,q,p,1
b,r,f,2
c,o,t,3
c,p,w,66


## You can select data by specifying the level of nesting in the #[] method.

In [30]:
vec[:a, :b]

Daru::Vector(3),Daru::Vector(3).1
c,1
d,2
p,3


In [31]:
vec[:a]

Daru::Vector(4),Daru::Vector(4).1,Daru::Vector(4).2
b,c,1
b,d,2
b,p,3
q,p,1


## The DateTimeIndex allows you to index timestamp-based data like stock prices

In [32]:
date_time = Daru::DateTimeIndex.date_range(start: '2011', end: '2013', freq: '3D')

#<Daru::DateTimeIndex(244, frequency=3D) 2011-01-01T00:00:00+00:00...2012-12-30T00:00:00+00:00>

In [33]:
vec = Daru::Vector.new([15]*date_time.size, index: date_time)
vec.head

Daru::Vector(10),Daru::Vector(10).1
2011-01-01T00:00:00+00:00,15
2011-01-04T00:00:00+00:00,15
2011-01-07T00:00:00+00:00,15
2011-01-10T00:00:00+00:00,15
2011-01-13T00:00:00+00:00,15
2011-01-16T00:00:00+00:00,15
2011-01-19T00:00:00+00:00,15
2011-01-22T00:00:00+00:00,15
2011-01-25T00:00:00+00:00,15
2011-01-28T00:00:00+00:00,15


### Query data based on date.

In [34]:
vec['2011-1'..'2011-2-10']

Daru::Vector(14),Daru::Vector(14).1
2011-01-01T00:00:00+00:00,15
2011-01-04T00:00:00+00:00,15
2011-01-07T00:00:00+00:00,15
2011-01-10T00:00:00+00:00,15
2011-01-13T00:00:00+00:00,15
2011-01-16T00:00:00+00:00,15
2011-01-19T00:00:00+00:00,15
2011-01-22T00:00:00+00:00,15
2011-01-25T00:00:00+00:00,15
2011-01-28T00:00:00+00:00,15


In [35]:
vec['2012']

Daru::Vector(122),Daru::Vector(122).1
2012-01-02T00:00:00+00:00,15
2012-01-05T00:00:00+00:00,15
2012-01-08T00:00:00+00:00,15
2012-01-11T00:00:00+00:00,15
2012-01-14T00:00:00+00:00,15
2012-01-17T00:00:00+00:00,15
2012-01-20T00:00:00+00:00,15
2012-01-23T00:00:00+00:00,15
2012-01-26T00:00:00+00:00,15
2012-01-29T00:00:00+00:00,15


### Create a dataframe and sort it based on a particular column.

In [36]:
df = Daru::DataFrame.new({
  name: ['Tokyo', 'Mumbai', 'Kyoto', 'Singapore', 'Osaka'],
  temp:  [20,35,25,30,24] 
  }, order: [:name, :temp])

df.sort!([:temp])

Daru::DataFrame(5x2),Daru::DataFrame(5x2),Daru::DataFrame(5x2)
Unnamed: 0_level_1,name,temp
0,Tokyo,20
4,Osaka,24
2,Kyoto,25
3,Singapore,30
1,Mumbai,35


## Plotting can be easily done with daru and nyaplot

In [38]:
df.plot type: :bar, x: :name, y: :temp do |plot|
  plot.x_label "City"
  plot.y_label "Temperature"
  plot.yrange [0,50]
end