# Fire up GraphLab Create


In [1]:
import graphlab as gl

A newer version of GraphLab Create (v1.6.1) is available! Your current version is v1.6.
New features in 1.6:
- Time Series data type
- Model tuning in Canvas
- Churn prediction toolkit
- Product sentiment analysis toolkit
- DBSCAN for clustering toolkit
- Record linker for data matching toolkit
- Frequent pattern mining toolkit
- Support adaptive Predictive Services model serving through endpoint policies
- Distributed Machine Learning in EC2
- Interface between DataFrames and SFrames in scala

Notable performance improvements:
- Improve service latency for all supervised learning models
- Improved performance of nearest neighbor toolkit by constructing a similarity graph directly
- Fast approximation of nearest neighbors through locality-sensitive hashing
- More efficient and faster access of data in S3
- Improved performance of distributed graph analytics

For detailed release notes please visit:
https://dato.com/download/release-notes.html

-
You can use pip to upgrade the graphlab-

# Load a tabular data set

In [4]:
sf = gl.SFrame('people-example.csv')

PROGRESS: Finished parsing file /home/hades/devel/machine-learning/people-example.csv
PROGRESS: Parsing completed. Parsed 7 lines in 0.011141 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[str,str,str,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Finished parsing file /home/hades/devel/machine-learning/people-example.csv
PROGRESS: Parsing completed. Parsed 7 lines in 0.006086 secs.


# SFrame basics

In [5]:
sf # we can view first few lines of table

First Name,Last Name,Country,age
Bob,Smith,United States,24
Alice,Williams,Canada,23
Malcolm,Jone,England,22
Felix,Brown,USA,23
Alex,Cooper,Poland,23
Tod,Campbell,United States,22
Derek,Ward,Switzerland,25


In [7]:
sf.head()

First Name,Last Name,Country,age
Bob,Smith,United States,24
Alice,Williams,Canada,23
Malcolm,Jone,England,22
Felix,Brown,USA,23
Alex,Cooper,Poland,23
Tod,Campbell,United States,22
Derek,Ward,Switzerland,25


# GraphLab Canvas

In [8]:
# Take any data stracture in GraphLab Create
sf.show()

Canvas is accessible via web browser at the URL: http://localhost:60156/index.html
Opening Canvas in default web browser.


In [9]:
gl.canvas.set_target('ipynb')



In [10]:
sf['age'].show(view="Categorical")

# Inspect some columns of data set


In [12]:
sf['Country']

dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

In [13]:
sf['age']

dtype: int
Rows: 7
[24, 23, 22, 23, 23, 22, 25]

In [14]:
sf['age'].mean()


23.142857142857146

In [15]:
sf['age'].max()

25

# Create new columns in our SFrame


In [16]:
sf

First Name,Last Name,Country,age
Bob,Smith,United States,24
Alice,Williams,Canada,23
Malcolm,Jone,England,22
Felix,Brown,USA,23
Alex,Cooper,Poland,23
Tod,Campbell,United States,22
Derek,Ward,Switzerland,25


In [19]:
sf['Full Name'] = sf['First Name'] + ' ' + sf['Last Name']

In [20]:
sf

First Name,Last Name,Country,age,Full Name
Bob,Smith,United States,24,Bob Smith
Alice,Williams,Canada,23,Alice Williams
Malcolm,Jone,England,22,Malcolm Jone
Felix,Brown,USA,23,Felix Brown
Alex,Cooper,Poland,23,Alex Cooper
Tod,Campbell,United States,22,Tod Campbell
Derek,Ward,Switzerland,25,Derek Ward


In [22]:
sf['age'] * sf['age']

dtype: int
Rows: 7
[576, 529, 484, 529, 529, 484, 625]

# Use the apply function to do a advanced transformation


In [23]:
sf['Country']

dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

In [24]:
sf['Country'].show()

In [29]:
def transform_country(country):
    if country == "USA":
        return "United States"
    else:
        return country

In [30]:
transform_country("Brazil")

'Brazil'

In [31]:
transform_country("USA")

'United States'

In [32]:
sf['Country'].apply(transform_country)

dtype: str
Rows: 7
['United States', 'Canada', 'England', 'United States', 'Poland', 'United States', 'Switzerland']

In [33]:
sf["Country"] = sf['Country'].apply(transform_country)

In [36]:
sf.show()