## H2OFrame and Pandas DataFrame

**Windows上でH2OFrameで日本語を扱う場合、文字化けが発生する様子（解決方法不明）**  
**Linux上でファイルを読み込む場合は[UTF8指定](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/getting-data-into-h2o.html?highlight=utf#getting-data-into-your-h2o-cluster)のこと**

In [1]:
import h2o
print(h2o.__version__)

import pandas as pd

3.30.1.2


In [2]:
#> H2Oクラスターの開始
h2o.init()
#h2o.init(url='http://localhost:54321')

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: java version "1.8.0_261"; Java(TM) SE Runtime Environment (build 1.8.0_261-b12); Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)
  Starting server from /home/ec2-user/anaconda3/envs/h2o_3/lib/python3.6/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmp43mpkakm
  JVM stdout: /tmp/tmp43mpkakm/h2o_ec2_user_started_from_python.out
  JVM stderr: /tmp/tmp43mpkakm/h2o_ec2_user_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,02 secs
H2O_cluster_timezone:,Asia/Tokyo
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.30.1.2
H2O_cluster_version_age:,1 month and 7 days
H2O_cluster_name:,H2O_from_python_ec2_user_v1t2ji
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,3.399 Gb
H2O_cluster_total_cores:,4
H2O_cluster_allowed_cores:,4


In [3]:
#h2o.cluster().list_timezones()
#h2o.cluster().timezone = "Asia/Tokyo"
h2o.cluster().timezone

'UTC'

In [4]:
h2o.cluster().datafile_parser_timezone

'UTC'

### <span style="color:blue">Creating H2OFrame and Pandas DataFrame from dict</span>

In [20]:
businessman = {
    'Name':['Yamada','Suzuki','Sato','Tanaka'],
    'Age':[30,35,27,45],
    'Monthly_Income':[50,38,25,47]
}

In [16]:
type(businessman)

dict

- Pandas DataFrame

In [21]:
df_panda = pd.DataFrame(businessman)
df_panda

Unnamed: 0,Name,Age,Monthly_Income
0,Yamada,30,50
1,Suzuki,35,38
2,Sato,27,25
3,Tanaka,45,47


In [18]:
type(df_panda)

pandas.core.frame.DataFrame

- H2OFrame

In [22]:
df_h2o = h2o.H2OFrame(businessman)
df_h2o

Parse progress: |█████████████████████████████████████████████████████████| 100%


Name,Age,Monthly_Income
Yamada,30,50
Suzuki,35,38
Sato,27,25
Tanaka,45,47




In [10]:
type(df_h2o)

h2o.frame.H2OFrame

### <span style="color:blue">Converting H2OFrame and Pandas DataFrame</span>

- from H2OFrame to DataFrame

In [67]:
df_h2o.as_data_frame()

Unnamed: 0,Name,Age,Monthly_Income
0,Yamada,30,50
1,Suzuki,35,38
2,Sato,27,25
3,Tanaka,45,47


In [38]:
type(df_h2o.as_data_frame())

pandas.core.frame.DataFrame

- from DataFrame to H2OFrame

In [69]:
h2o.H2OFrame.from_python(df_panda)

Parse progress: |█████████████████████████████████████████████████████████| 100%


Name,Age,Monthly_Income
Yamada,30,50
Suzuki,35,38
Sato,27,25
Tanaka,45,47




In [70]:
type(h2o.H2OFrame.from_python(df_panda))

Parse progress: |█████████████████████████████████████████████████████████| 100%


h2o.frame.H2OFrame

### <span style="color:blue">Loading from file</span>

- Pandas read_csv method

In [11]:
# sjis file
pd.read_csv('../data/businessman_sjis.csv', encoding='sjis')

Unnamed: 0,Name,Name_JP,Age,Monthly_Income
0,Yamada,山田,30,50
1,Suzuki,鈴木,35,38
2,Sato,佐藤,27,25
3,Tanaka,田中,45,47


In [12]:
# utf8 file
pd.read_csv('../data/businessman_utf8.csv')

Unnamed: 0,Name,Name_JP,Age,Monthly_Income
0,Yamada,山田,30,50
1,Suzuki,鈴木,35,38
2,Sato,佐藤,27,25
3,Tanaka,田中,45,47


- H2OFrame import_file method

In [13]:
h2o.import_file('../data/businessman_sjis.csv')

Parse progress: |█████████████████████████████████████████████████████████| 100%


Name,Name_JP,Age,Monthly_Income
Yamada,�R�c,30,50
Suzuki,���,35,38
Sato,����,27,25
Tanaka,�c��,45,47




In [14]:
h2o.import_file('../data/businessman_utf8.csv')

Parse progress: |█████████████████████████████████████████████████████████| 100%


Name,Name_JP,Age,Monthly_Income
Yamada,山田,30,50
Suzuki,鈴木,35,38
Sato,佐藤,27,25
Tanaka,田中,45,47




### <span style="color:blue">Exporting H2OFrame</span>

In [72]:
h2o.export_file(df_h2o, "df_h2o.csv")

Export File progress: |███████████████████████████████████████████████████| 100%


In [23]:
h2o.cluster().shutdown()

H2O session _sid_940f closed.
