# Myria-Python & IPython

<img src="overview.png" style="height: 300px"/>

### To install `Myria-Python`:

```
git clone https://github.com/uwescience/myria-python
cd myria-python
sudo python setup.py install
```

### Or:

```
pip install myria-python
```



## 1. Connecting to Myria

In [47]:
from myria import *
import numpy

# Create a connection to a Myria EC2 cluster
connection = MyriaConnection(
    rest_url='http://ec2-52-26-242-67.us-west-2.compute.amazonaws.com:8753',
    execution_url='http://ec2-52-26-242-67.us-west-2.compute.amazonaws.com:8080')

In [48]:
connection

<myria.connection.MyriaConnection at 0x114177c50>

## 2. Myria: Connections, Relations, and Queries (and Schemas and Plans)

In [49]:
# How many datasets are there on the server?
print len(connection.datasets())

5


In [24]:
# Let's look at the datasets...
print connection.datasets()

[{u'created': u'2016-01-26T09:36:53.787Z', u'numTuples': 100, u'uri': u'http://ec2-52-33-72-224.us-west-2.compute.amazonaws.com:8753/dataset/user-public/program-adhoc/relation-CC', u'howPartitioned': {u'workers': [1, 2], u'pf': None}, u'queryId': 88, u'relationKey': {u'userName': u'public', u'relationName': u'CC', u'programName': u'adhoc'}, u'schema': {u'columnNames': [u'id2', u'min_min_component_id'], u'columnTypes': [u'LONG_TYPE', u'LONG_TYPE']}}, {u'created': u'2016-01-26T08:57:32.045Z', u'numTuples': 2715, u'uri': u'http://ec2-52-33-72-224.us-west-2.compute.amazonaws.com:8753/dataset/user-public/program-adhoc/relation-JustX', u'howPartitioned': {u'workers': [1, 2], u'pf': None}, u'queryId': 6, u'relationKey': {u'userName': u'public', u'relationName': u'JustX', u'programName': u'adhoc'}, u'schema': {u'columnNames': [u'x'], u'columnTypes': [u'LONG_TYPE']}}, {u'created': u'2016-01-26T09:03:45.437Z', u'numTuples': 2715, u'uri': u'http://ec2-52-33-72-224.us-west-2.compute.amazonaws.com:

### Three parts to a relation name:

In [50]:
# What's the name of the first relation?
name = connection.datasets()[0]['relationKey']
name

{u'programName': u'adhoc',
 u'relationName': u'TwitterCC',
 u'userName': u'public'}

In [51]:
# Let's upload a dataset...
query = MyriaQuery.submit(
'''T1 = load("https://goo.gl/YqKALA",csv(schema(a:int, b:int),skip=0));store(T1, TwitterK2, [a, b]);''', connection=connection)
query.status

u'SUCCESS'

In [52]:
print query.status
print len(connection.datasets())
print connection.datasets()[-1]

SUCCESS
6
{u'created': u'2016-01-26T17:49:07.159Z', u'numTuples': -1, u'uri': u'http://ec2-52-26-242-67.us-west-2.compute.amazonaws.com:8753/dataset/user-public/program-logs/relation-Sending', u'howPartitioned': {u'workers': [1, 2], u'pf': None}, u'queryId': 2, u'relationKey': {u'userName': u'public', u'relationName': u'Sending', u'programName': u'logs'}, u'schema': {u'columnNames': [u'queryId', u'subQueryId', u'fragmentId', u'nanoTime', u'numTuples', u'destWorkerId'], u'columnTypes': [u'LONG_TYPE', u'INT_TYPE', u'INT_TYPE', u'LONG_TYPE', u'LONG_TYPE', u'INT_TYPE']}}


In [53]:
# Let's try another query...
query = MyriaQuery.submit(
'''T1 = scan(TwitterK);
  T2 = scan(TwitterK);
  Joined = [from T1, T2
            emit T1.$0 as src, T1.$1 as link, T2.$1 as dst];
  store(Joined, TwoHopsInTwitter);
''', connection=connection)
query.status

u'SUCCESS'

### Setting a default connection

In [54]:
# Set the default connection for the session
MyriaRelation.DefaultConnection = connection

# Myria IPython Extensions

## 1. Loading the Extension

In [55]:
%load_ext myria

The myria extension is already loaded. To reload it, use:
  %reload_ext myria


## 2. Configuration Options

In [69]:
%config MyriaExtension

MyriaExtension options
--------------------
MyriaExtension.execution_url=<Unicode>
    Current: u'https://demo.myria.cs.washington.edu'
    Myria web API endpoint URL
MyriaExtension.language=<Unicode>
    Current: u'MyriaL'
    Language for Myria queries
MyriaExtension.rest_url=<Unicode>
    Current: u'https://rest.myria.cs.washington.edu:1776'
    Myria REST API endpoint URL
MyriaExtension.timeout=<Int>
    Current: 60
    Query timeout (in seconds)


The really important one:

In [68]:
%config timeout=120

## 3. Ambient Connection to Myria

View `connect` arguments:

In [65]:
%connect?

Connect to the EC2 cluster:

In [66]:
%connect http://ec2-52-26-242-67.us-west-2.compute.amazonaws.com:8753 http://ec2-52-26-242-67.us-west-2.compute.amazonaws.com:8080
            
# This is just the IPython equivalent of setting the default MyriaConnection!

<myria.connection.MyriaConnection at 0x212095950>

## 4. Executing Queries

In [70]:
%%query
const partition: 0.5;
const epsilon: 0.0000106;

def mod(x, n): x - int(x/n)*n;
def cell(v): int((v - mod(v, partition)) * (1/partition));
def is_ghost(xoffset, yoffset, zoffset):
  case when xoffset = 0 and
            yoffset = 0 and
            zoffset = 0 then 0 else 1 end;
def is_replicated(x, y, z, xoffset, yoffset, zoffset):
  is_ghost(xoffset, yoffset, zoffset) = 0 or
  cell(x + epsilon*xoffset) != cell(x) or
  cell(y + epsilon*yoffset) != cell(y) or
  cell(z + epsilon*zoffset) != cell(z);
def distance(x1, x2, y1, y2, z1, z2): sqrt((x1-x2)*(x1-x2) +
                                           (y1-y2)*(y1-y2) +
                                           (z1-z2)*(z1-z2));

points = load("https://s3-us-west-2.amazonaws.com/uwdb/sampleData/sampleCrossmatch/points.txt",
              csv(schema(id:int,
                         x:float,
                         y:float,
                         z:float), skip=0));
permutations = load("https://s3-us-west-2.amazonaws.com/myria/permutations",
                    csv(schema(xoffset:int,
                               yoffset:int,
                               zoffset:int), skip=0));

-- Partition into a grid with edges of size partition
-- Replicate any point that falls within epsilon of a partition boundary

partitions = [from points, permutations
              where is_replicated(x, y, z, xoffset, yoffset, zoffset)
              emit id, x, y, z,
                   cell(x) + xoffset as px,
                   cell(y) + yoffset as py,
                   cell(z) + zoffset as pz,
                   is_ghost(xoffset, yoffset, zoffset) as ghost];

--store(partitions, partitions, [px, py, pz]);

-------------------------------------------

--partitions = scan(partitions);

-- Cross product on partition + ghost cells; no shuffle required
local = [from partitions left,
              partitions right
         where left.px = right.px and
               left.py = right.py and
               left.pz = right.pz
         emit *];

-- Calculate distances within each local pair and filter outliers
distances = [from local
             where id < id1 and
                   ghost = 0 and
                   distance(x, x1, y, y1, z, z1) <= epsilon
             emit id as id1,
                  id1 as id2, ghost, ghost1,
                  distance(x, x1, y, y1, z, z1)];

store(distances, distances);


Unnamed: 0,_COLUMN4_,ghost,ghost1,id1,id2
0,6e-06,0,0,4,104
1,6e-06,0,0,6,106
2,5e-06,0,0,16,116
3,4e-06,0,0,17,117
4,8e-06,0,0,18,118
5,8e-06,0,0,21,121
6,7e-06,0,0,26,126
7,9e-06,0,0,27,127
8,9e-06,0,0,30,130
9,6e-06,0,0,36,136


In [71]:
%%query
E = scan(TwitterK);
V = select distinct E.$0 from E;
CC = [from V emit V.$0 as node_id, V.$0 as component_id];
do
  new_CC = [from E, CC where E.$0 = CC.$0 emit E.$1, CC.$1] + CC;
  new_CC = [from new_CC emit new_CC.$0, MIN(new_CC.$1)];
  delta = diff(CC, new_CC);
  CC = new_CC;
while [from delta emit count(*) > 0];
comp = [from CC emit CC.$1 as id, count(CC.$0) as cnt];
store(comp, TwitterCC);


Unnamed: 0,cnt,id
0,3,498
1,378,12
2,2,443
3,1,724
4,1,877
5,1,975
6,1,419
7,3,395
8,5,510
9,1,220


In [72]:
# Grab the results of the most recent execution
query = _

In [73]:
query

Unnamed: 0,cnt,id
0,3,498
1,378,12
2,2,443
3,1,724
4,1,877
5,1,975
6,1,419
7,3,395
8,5,510
9,1,220


## 6. Plans and Delayed Execution

You can use `%plan` magic to compile a plan without immediately executing it:

In [74]:
%%plan 
T1 = scan(TwitterK);
T2 = [from T1 where $0 >= 999 emit $0];
store(T2, JustX);

{u'language': u'MyriaL',
 u'logicalRa': u'Store(public:adhoc:JustX)[Apply(a=$0)[Select(($0 >= 999))[Scan(public:adhoc:TwitterK)]]]',
 u'plan': {u'fragments': [{u'operators': [{u'opId': 0,
      u'opName': u'MyriaScan(public:adhoc:TwitterK)',
      u'opType': u'TableScan',
      u'relationKey': {u'programName': u'adhoc',
       u'relationName': u'TwitterK',
       u'userName': u'public'}},
     {u'argChild': 0,
      u'argPredicate': {u'rootExpressionOperator': {u'left': {u'columnIdx': 0,
         u'type': u'VARIABLE'},
        u'right': {u'type': u'CONSTANT',
         u'value': u'999',
         u'valueType': u'LONG_TYPE'},
        u'type': u'GTEQ'}},
      u'opId': 1,
      u'opName': u'MyriaSelect(($0 >= 999))',
      u'opType': u'Filter'},
     {u'argChild': 1,
      u'emitExpressions': [{u'outputName': u'a',
        u'rootExpressionOperator': {u'columnIdx': 0, u'type': u'VARIABLE'}}],
      u'opId': 2,
      u'opName': u'MyriaApply(a=$0)',
      u'opType': u'Apply'},
     {u'argChil

In [75]:
plan = _
result = MyriaQuery.submit_plan(plan).to_dataframe()
result

Unnamed: 0,a
0,999


# Myria in your own EC2 Cluster!

# Where to find more information:

#### Documentation
[Myria Website](http://myria.cs.washington.edu/)<br /> 
[Myria Python](http://myria.cs.washington.edu/docs/myriapython.html)<br /> 
[Additional Language Documentation](http://myria.cs.washington.edu/docs/myriaql.html)<br /> 
[This Notebook](https://github.com/uwescience/myria-python/blob/master/ipnb%20examples/myria%20examples.ipynb) 

#### Repositories
[Myria](github.com/uwescience/myria)<br /> 
[Myria-Python](github.com/uwescience/myria-python)<br /> 
[Myria-EC2](github.com/uwescience/myria-ec2)

#### Mailing List
[myria-users@cs.washington.edu](mailto:myria-users@cs.washington.edu)

## IPython
[Homepage](http://ipython.org/)

## Pandas/Dataframes
[Homepage](http://pandas.pydata.org/)