# ストリーミング波形データの読み込み

`read`を`readStream`に変更するだけで、Delta Lake Tableをライブで更新されるストリーミングとして読み込むことができます。

**参考資料**
- [Delta Lake、Keras、MLflowを用いた機械学習による医療機器データのモニタリング \- Qiita](https://qiita.com/taka_yayoi/items/65e463a3eab84d4e2ce7)
- [Monitoring patient medical device data with ML \+ Delta Lake, Keras, and MLflow](https://databricks.com/blog/2019/09/12/monitor-medical-device-data-with-machine-learning-using-delta-lake-keras-and-mlflow-on-demand-webinar-and-faqs-now-available.html)

<table>
  <tr><th>作者</th><th>Databricks Japan</th></tr>
  <tr><td>日付</td><td>2021/7/9</td></tr>
  <tr><td>バージョン</td><td>1.0</td></tr>
  <tr><td>クラスター</td><td>8.3ML</td></tr>
</table>
<img style="margin-top:25px;" src="https://jixjiadatabricks.blob.core.windows.net/images/databricks-logo-small-new.png" width="140">

In [0]:
# 前のノートブック「3. データのストリーミング」とパスを揃えてください
stream_path = '/tmp/takaaki.yayoi@databricks.com/hls/ecg/streaming/'

df = spark.readStream.format('delta').load(stream_path)

## 到着データの確認

従来のデータフレームと同様にこのデータフレームを操作することができます。テーブルに対するクエリーの結果は更新され続けます。これは[`display`関数](https://docs.databricks.com/user-guide/visualizations/index.html#display-function)にも適用されます。データフレームをテーブルとして参照するたびに新たなレコードが表示されます。

In [0]:
display(df.drop('signals'))

record_id,patient_id,comments,time_interval
patient217-s0439_re,patient217,"Map(Ventriculography -> n/a, In hospital medication -> n/a, Left coronary artery stenoses (RIVA) -> n/a, Pulmonary artery pressure (laod) (mean) -> n/a, Peripheral blood Pressure (syst/diast) -> n/a, Medication after discharge -> n/a, Therapy -> , Cardiac index (load) -> n/a, Cardiac output (at rest) -> n/a, Previous infarction (2) date -> n/a, Hemodynamics -> , Pulmonary capillary wedge pressure (load) -> n/a, Pulmonary artery pressure (at rest) (syst/diast) -> n/a, Cardiac output (load) -> n/a, Acute infarction (localization) -> no, Right coronary artery stenoses (RCA) -> n/a, Number of coronary vessels involved -> unknown, age -> 62, ECG date -> 12/04/1996, sex -> female, Medication pre admission -> n/a, Pulmonary artery pressure (at rest) (mean) -> n/a, Stroke volume index (load) -> n/a, Pulmonary capillary wedge pressure (at rest) -> n/a, Aorta (at rest) (syst/diast) -> n/a, Lytic agent -> n/a, Dosage (lytic agent) -> n/a, Cardiac index (at rest) -> n/a, Left ventricular enddiastolic pressure -> n/a, Catheterization date -> n/a, Diagnose -> , Left coronary artery stenoses (RCX) -> n/a, Smoker -> no, Pulmonary artery pressure (laod) (syst/diast) -> n/a, Former infarction (localization) -> no, Previous infarction (1) date -> n/a, Echocardiography -> LV hypertrophy, slightly diminuished contractility, Hypokinesia od the interventricular septum (12mm). Left atrium slightly enlarged (50mm), normal valves, Additional medication -> n/a, Additional diagnoses -> Diabetes mellitus, Arterial hypertension, Reason for admission -> Bundle branch block, Aorta (at rest) mean -> n/a, Chest X-ray -> n/a, Stroke volume index (at rest) -> n/a, Start lysis therapy (hh.mm) -> n/a, Admission date -> n/a, Infarction date (acute) -> n/a, Infarction date -> n/a)",19
patient288-s0549_re,patient288,"Map(Ventriculography -> n/a, In hospital medication -> n/a, Left coronary artery stenoses (RIVA) -> n/a, Pulmonary artery pressure (laod) (mean) -> n/a, Peripheral blood Pressure (syst/diast) -> n/a, Medication after discharge -> n/a, Therapy -> , Cardiac index (load) -> n/a, Cardiac output (at rest) -> n/a, Previous infarction (2) date -> n/a, Hemodynamics -> , Pulmonary capillary wedge pressure (load) -> n/a, Pulmonary artery pressure (at rest) (syst/diast) -> n/a, Cardiac output (load) -> n/a, Acute infarction (localization) -> no, Right coronary artery stenoses (RCA) -> n/a, Number of coronary vessels involved -> unknown, age -> 67, ECG date -> 27/03/1997, sex -> male, Medication pre admission -> n/a, Pulmonary artery pressure (at rest) (mean) -> n/a, Stroke volume index (load) -> n/a, Pulmonary capillary wedge pressure (at rest) -> n/a, Aorta (at rest) (syst/diast) -> n/a, Lytic agent -> n/a, Dosage (lytic agent) -> n/a, Cardiac index (at rest) -> n/a, Left ventricular enddiastolic pressure -> n/a, Catheterization date -> n/a, Diagnose -> , Left coronary artery stenoses (RCX) -> n/a, Smoker -> unknown, Pulmonary artery pressure (laod) (syst/diast) -> n/a, Former infarction (localization) -> no, Previous infarction (1) date -> n/a, Echocardiography -> n/a, Additional medication -> n/a, Additional diagnoses -> Dilated Cardiomyopathy, Recurrent ventricular tachycardias, Reason for admission -> Cardiomyopathy, Aorta (at rest) mean -> n/a, Chest X-ray -> n/a, Stroke volume index (at rest) -> n/a, Start lysis therapy (hh.mm) -> n/a, Admission date -> n/a, Infarction date (acute) -> n/a, Infarction date -> n/a)",19
patient264-s0500_re,patient264,"Map(Ventriculography -> n/a, In hospital medication -> n/a, Left coronary artery stenoses (RIVA) -> n/a, Pulmonary artery pressure (laod) (mean) -> n/a, Peripheral blood Pressure (syst/diast) -> n/a, Medication after discharge -> n/a, Therapy -> , Cardiac index (load) -> n/a, Cardiac output (at rest) -> n/a, Previous infarction (2) date -> n/a, Hemodynamics -> , Pulmonary capillary wedge pressure (load) -> n/a, Pulmonary artery pressure (at rest) (syst/diast) -> n/a, Cardiac output (load) -> n/a, Acute infarction (localization) -> no, Right coronary artery stenoses (RCA) -> n/a, Number of coronary vessels involved -> unknown, age -> 45, ECG date -> 27/02/1997, sex -> male, Medication pre admission -> n/a, Pulmonary artery pressure (at rest) (mean) -> n/a, Stroke volume index (load) -> n/a, Pulmonary capillary wedge pressure (at rest) -> n/a, Aorta (at rest) (syst/diast) -> n/a, Lytic agent -> n/a, Dosage (lytic agent) -> n/a, Cardiac index (at rest) -> n/a, Left ventricular enddiastolic pressure -> n/a, Catheterization date -> n/a, Diagnose -> , Left coronary artery stenoses (RCX) -> n/a, Smoker -> unknown, Pulmonary artery pressure (laod) (syst/diast) -> n/a, Former infarction (localization) -> no, Previous infarction (1) date -> n/a, Echocardiography -> n/a, Additional medication -> n/a, Additional diagnoses -> no, Reason for admission -> Healthy control, Aorta (at rest) mean -> n/a, Chest X-ray -> n/a, Stroke volume index (at rest) -> n/a, Start lysis therapy (hh.mm) -> n/a, Admission date -> n/a, Infarction date (acute) -> n/a, Infarction date -> n/a)",19
patient245-s0474_re,patient245,"Map(Ventriculography -> n/a, In hospital medication -> n/a, Left coronary artery stenoses (RIVA) -> n/a, Pulmonary artery pressure (laod) (mean) -> n/a, Peripheral blood Pressure (syst/diast) -> n/a, Medication after discharge -> n/a, Therapy -> , Cardiac index (load) -> n/a, Cardiac output (at rest) -> n/a, Previous infarction (2) date -> n/a, Hemodynamics -> , Pulmonary capillary wedge pressure (load) -> n/a, Pulmonary artery pressure (at rest) (syst/diast) -> n/a, Cardiac output (load) -> n/a, Acute infarction (localization) -> no, Right coronary artery stenoses (RCA) -> n/a, Number of coronary vessels involved -> unknown, age -> 30, ECG date -> 15/11/1996, sex -> male, Medication pre admission -> n/a, Pulmonary artery pressure (at rest) (mean) -> n/a, Stroke volume index (load) -> n/a, Pulmonary capillary wedge pressure (at rest) -> n/a, Aorta (at rest) (syst/diast) -> n/a, Lytic agent -> n/a, Dosage (lytic agent) -> n/a, Cardiac index (at rest) -> n/a, Left ventricular enddiastolic pressure -> n/a, Catheterization date -> n/a, Diagnose -> , Left coronary artery stenoses (RCX) -> n/a, Smoker -> unknown, Pulmonary artery pressure (laod) (syst/diast) -> n/a, Former infarction (localization) -> no, Previous infarction (1) date -> n/a, Echocardiography -> n/a, Additional medication -> n/a, Additional diagnoses -> no, Reason for admission -> Healthy control, Aorta (at rest) mean -> n/a, Chest X-ray -> n/a, Stroke volume index (at rest) -> n/a, Start lysis therapy (hh.mm) -> n/a, Admission date -> n/a, Infarction date (acute) -> n/a, Infarction date -> n/a)",19
patient281-s0537_re,patient281,"Map(Ventriculography -> n/a, In hospital medication -> n/a, Left coronary artery stenoses (RIVA) -> n/a, Pulmonary artery pressure (laod) (mean) -> n/a, Peripheral blood Pressure (syst/diast) -> n/a, Medication after discharge -> n/a, Therapy -> , Cardiac index (load) -> n/a, Cardiac output (at rest) -> n/a, Previous infarction (2) date -> n/a, Hemodynamics -> , Pulmonary capillary wedge pressure (load) -> n/a, Pulmonary artery pressure (at rest) (syst/diast) -> n/a, Cardiac output (load) -> n/a, Acute infarction (localization) -> n/a, Right coronary artery stenoses (RCA) -> n/a, Number of coronary vessels involved -> n/a, age -> 68, ECG date -> 14/10/1994, sex -> male, Medication pre admission -> n/a, Pulmonary artery pressure (at rest) (mean) -> n/a, Stroke volume index (load) -> n/a, Pulmonary capillary wedge pressure (at rest) -> n/a, Aorta (at rest) (syst/diast) -> n/a, Lytic agent -> n/a, Dosage (lytic agent) -> n/a, Cardiac index (at rest) -> n/a, Left ventricular enddiastolic pressure -> n/a, Catheterization date -> n/a, Diagnose -> , Left coronary artery stenoses (RCX) -> n/a, Smoker -> n/a, Pulmonary artery pressure (laod) (syst/diast) -> n/a, Former infarction (localization) -> n/a, Previous infarction (1) date -> n/a, Echocardiography -> n/a, Additional medication -> n/a, Additional diagnoses -> n/a, Reason for admission -> n/a, Aorta (at rest) mean -> n/a, Chest X-ray -> n/a, Stroke volume index (at rest) -> n/a, Start lysis therapy (hh.mm) -> n/a, Admission date -> n/a, Infarction date (acute) -> n/a, Infarction date -> n/a)",19
patient261-s0497_re,patient261,"Map(Ventriculography -> n/a, In hospital medication -> n/a, Left coronary artery stenoses (RIVA) -> n/a, Pulmonary artery pressure (laod) (mean) -> n/a, Peripheral blood Pressure (syst/diast) -> n/a, Medication after discharge -> n/a, Therapy -> , Cardiac index (load) -> n/a, Cardiac output (at rest) -> n/a, Previous infarction (2) date -> n/a, Hemodynamics -> , Pulmonary capillary wedge pressure (load) -> n/a, Pulmonary artery pressure (at rest) (syst/diast) -> n/a, Cardiac output (load) -> n/a, Acute infarction (localization) -> no, Right coronary artery stenoses (RCA) -> n/a, Number of coronary vessels involved -> unknown, age -> 51, ECG date -> 27/02/1997, sex -> male, Medication pre admission -> n/a, Pulmonary artery pressure (at rest) (mean) -> n/a, Stroke volume index (load) -> n/a, Pulmonary capillary wedge pressure (at rest) -> n/a, Aorta (at rest) (syst/diast) -> n/a, Lytic agent -> n/a, Dosage (lytic agent) -> n/a, Cardiac index (at rest) -> n/a, Left ventricular enddiastolic pressure -> n/a, Catheterization date -> n/a, Diagnose -> , Left coronary artery stenoses (RCX) -> n/a, Smoker -> unknown, Pulmonary artery pressure (laod) (syst/diast) -> n/a, Former infarction (localization) -> inferior, Previous infarction (1) date -> 01-Nov-96, Echocardiography -> n/a, Additional medication -> n/a, Additional diagnoses -> unknown, Reason for admission -> Myocardial infarction, Aorta (at rest) mean -> n/a, Chest X-ray -> n/a, Stroke volume index (at rest) -> n/a, Start lysis therapy (hh.mm) -> n/a, Admission date -> n/a, Infarction date (acute) -> n/a, Infarction date -> n/a)",19
patient240-s0468_re,patient240,"Map(Ventriculography -> n/a, In hospital medication -> n/a, Left coronary artery stenoses (RIVA) -> n/a, Pulmonary artery pressure (laod) (mean) -> n/a, Peripheral blood Pressure (syst/diast) -> n/a, Medication after discharge -> n/a, Therapy -> , Cardiac index (load) -> n/a, Cardiac output (at rest) -> n/a, Previous infarction (2) date -> n/a, Hemodynamics -> , Pulmonary capillary wedge pressure (load) -> n/a, Pulmonary artery pressure (at rest) (syst/diast) -> n/a, Cardiac output (load) -> n/a, Acute infarction (localization) -> no, Right coronary artery stenoses (RCA) -> n/a, Number of coronary vessels involved -> unknown, age -> 28, ECG date -> 24/10/1996, sex -> male, Medication pre admission -> n/a, Pulmonary artery pressure (at rest) (mean) -> n/a, Stroke volume index (load) -> n/a, Pulmonary capillary wedge pressure (at rest) -> n/a, Aorta (at rest) (syst/diast) -> n/a, Lytic agent -> n/a, Dosage (lytic agent) -> n/a, Cardiac index (at rest) -> n/a, Left ventricular enddiastolic pressure -> n/a, Catheterization date -> n/a, Diagnose -> , Left coronary artery stenoses (RCX) -> n/a, Smoker -> unknown, Pulmonary artery pressure (laod) (syst/diast) -> n/a, Former infarction (localization) -> no, Previous infarction (1) date -> n/a, Echocardiography -> n/a, Additional medication -> n/a, Additional diagnoses -> no, Reason for admission -> Healthy control, Aorta (at rest) mean -> n/a, Chest X-ray -> n/a, Stroke volume index (at rest) -> n/a, Start lysis therapy (hh.mm) -> n/a, Admission date -> n/a, Infarction date (acute) -> n/a, Infarction date -> n/a)",19
patient196-s0002_re,patient196,"Map(Ventriculography -> n/a, In hospital medication -> n/a, Left coronary artery stenoses (RIVA) -> n/a, Pulmonary artery pressure (laod) (mean) -> n/a, Peripheral blood Pressure (syst/diast) -> n/a, Medication after discharge -> n/a, Therapy -> , Cardiac index (load) -> n/a, Cardiac output (at rest) -> n/a, Previous infarction (2) date -> n/a, Hemodynamics -> , Pulmonary capillary wedge pressure (load) -> n/a, Pulmonary artery pressure (at rest) (syst/diast) -> n/a, Cardiac output (load) -> n/a, Acute infarction (localization) -> no, Right coronary artery stenoses (RCA) -> n/a, Number of coronary vessels involved -> unknown, age -> 84, ECG date -> 13/08/1990, sex -> female, Medication pre admission -> n/a, Pulmonary artery pressure (at rest) (mean) -> n/a, Stroke volume index (load) -> n/a, Pulmonary capillary wedge pressure (at rest) -> n/a, Aorta (at rest) (syst/diast) -> n/a, Lytic agent -> n/a, Dosage (lytic agent) -> n/a, Cardiac index (at rest) -> n/a, Left ventricular enddiastolic pressure -> n/a, Catheterization date -> n/a, Diagnose -> , Left coronary artery stenoses (RCX) -> n/a, Smoker -> no, Pulmonary artery pressure (laod) (syst/diast) -> n/a, Former infarction (localization) -> unknown, Previous infarction (1) date -> n/a, Echocardiography -> n/a, Additional medication -> n/a, Additional diagnoses -> Arterial hypertension, Recurrent pulmonary oedema, Diabetes mellitus, Hyperlipoproteinemia, Reason for admission -> Unstable angina, Aorta (at rest) mean -> n/a, Chest X-ray -> n/a, Stroke volume index (at rest) -> n/a, Start lysis therapy (hh.mm) -> n/a, Admission date -> n/a, Infarction date (acute) -> n/a, Infarction date -> n/a)",19
patient033-s0121lre,patient033,"Map(Ventriculography -> Limited hypokinesia anterior wall and apex, In hospital medication -> Ca-antagonist Isosorbit-Mononitrate ASA Isosorbit-Dinitrate Colestyramin, Left coronary artery stenoses (RIVA) -> RIVA 80%., Pulmonary artery pressure (laod) (mean) -> 35 cmH2O, Peripheral blood Pressure (syst/diast) -> 100/65 mmHg, Medication after discharge -> ASA Isosorbit-Dinitrate Colestyramin, Therapy -> , Cardiac index (load) -> 6,8 l/min/sqrmBSA, Cardiac output (at rest) -> 6,03 l/min, Previous infarction (2) date -> n/a, Hemodynamics -> , Pulmonary capillary wedge pressure (load) -> 29 cmH2O, Pulmonary artery pressure (at rest) (syst/diast) -> 20/14 cmH2O, Cardiac output (load) -> 13,4 l/min, Acute infarction (localization) -> antero-septal, Right coronary artery stenoses (RCA) -> n/a, Number of coronary vessels involved -> 2, age -> 60, ECG date -> 30/01/1991, sex -> male, Medication pre admission -> -, Pulmonary artery pressure (at rest) (mean) -> 12 cmH2O, Stroke volume index (load) -> 70,5 ml/beat, Pulmonary capillary wedge pressure (at rest) -> 8 cmH2O, Aorta (at rest) (syst/diast) -> n/a, Lytic agent -> Streptokinase, Dosage (lytic agent) -> 1.5 Mio IE, Cardiac index (at rest) -> 3,05 l/min/sqrmBSA, Left ventricular enddiastolic pressure -> n/a, Catheterization date -> 28-Jan-91, Diagnose -> , Left coronary artery stenoses (RCX) -> RCX 60%, Smoker -> no, Pulmonary artery pressure (laod) (syst/diast) -> 56/19 cmH2O, Former infarction (localization) -> no, Previous infarction (1) date -> n/a, Echocardiography -> n/a, Additional medication -> Nitrate Heparin Triflupromazin, Additional diagnoses -> Hypercholesterinemia, Reason for admission -> Myocardial infarction, Aorta (at rest) mean -> n/a, Chest X-ray -> normal, Stroke volume index (at rest) -> 51.0 ml/beat, Start lysis therapy (hh.mm) -> 13, Admission date -> 20-Jan-91, Infarction date (acute) -> 20-Jan-91, Infarction date -> 20-Jan-91)",8
patient022-s0149lre,patient022,"Map(Ventriculography -> Akinesia inferior wall and apex, In hospital medication -> ASA Isosorbit-Mononitrate Ca-antagonist Diazepam, Left coronary artery stenoses (RIVA) -> No stenoses, Pulmonary artery pressure (laod) (mean) -> 16 cmH2O, Peripheral blood Pressure (syst/diast) -> 120/80 mmHg, Medication after discharge -> ASA, Therapy -> , Cardiac index (load) -> 7,1 l/min/sqrmBSA, Cardiac output (at rest) -> 7,3 l/min, Previous infarction (2) date -> n/a, Hemodynamics -> , Pulmonary capillary wedge pressure (load) -> 9 cmH2O, Pulmonary artery pressure (at rest) (syst/diast) -> 12/5 cmH2O, Cardiac output (load) -> 13,5 l/min, Acute infarction (localization) -> infero-lateral, Right coronary artery stenoses (RCA) -> RCA peripheral 20%, Number of coronary vessels involved -> 1, age -> 43, ECG date -> 12/04/1991, sex -> male, Medication pre admission -> -, Pulmonary artery pressure (at rest) (mean) -> 9 cmH2O, Stroke volume index (load) -> 56 ml/beat, Pulmonary capillary wedge pressure (at rest) -> 6 cmH2O, Aorta (at rest) (syst/diast) -> n/a, Lytic agent -> Urokinase, Dosage (lytic agent) -> 1.5 Mio IE, Cardiac index (at rest) -> 3,7 l/min/sqrmBSA, Left ventricular enddiastolic pressure -> 9 cmH2O, Catheterization date -> 06-Dec-90, Diagnose -> , Left coronary artery stenoses (RCX) -> RCX distal to ramus marginalis sinister_2 90%, Smoker -> yes, Pulmonary artery pressure (laod) (syst/diast) -> 23/11 cmH2O, Former infarction (localization) -> no, Previous infarction (1) date -> n/a, Echocardiography -> n/a, Additional medication -> Heparin ASA Isosorbit-Mononitrate Atropin Diazepam, Additional diagnoses -> no, Reason for admission -> Myocardial infarction, Aorta (at rest) mean -> n/a, Chest X-ray -> normal, Stroke volume index (at rest) -> 47.4 ml/beat, Start lysis therapy (hh.mm) -> 11, Admission date -> 29-Nov-90, Infarction date (acute) -> 29-Nov-90, Infarction date -> 29-Nov-90)",8


しかし、この機能はグラフにしたときにより興味深いものになります。ここでは、これまでに処理したレコード数をプロットしています。このグラフはライブで更新され、レコード数が一定の割合で増加する様子を確認できます。

In [0]:
display(df.groupBy(df.time_interval).count())

time_interval,count
128,39
330,108
22,9
209,63
372,124
47,16
140,43
177,53
416,138
259,82


これは全ての`display`のチャートタイプに適用されます。特に興味深いには、入院理由に基づくパイチャートです。患者が到着するたびに、分布が変化する様子を見て取れます。

In [0]:
display(df.groupBy(df.comments["Reason for admission"]).count())

comments[Reason for admission],count
Heart failure (NYHA 3),104
Cardiomyopathy,1072
Healthy control,5281
Myocardial infarction,28517
Valvular heart disease,494
Heart failure (NYHA 2),106
,1807
Myocarditis,182
Hypertrophy,331
Unstable angina,80


患者の年齢に関しても同様に動作します。

In [0]:
from pyspark.sql.types import IntegerType
from pyspark.sql.functions import trim

display(df.select(trim(df.comments["age"]).cast(IntegerType()).alias('age')))

age
60
43
60
50
40
63
43
60
43
60


## MLflowとストリーミングを組み合わせる

MLflowと構造化ストリーミングを用いることで、データの到着に合わせてモデルを適用でき、リアルタイムでのレポートを実現できます。まず最初に、トラッキングされたKerasモデルをMLflowのランからロードするために、[mlflow.keras](https://www.mlflow.org/docs/latest/python_api/mlflow.keras.html)ライブラリを使用します。以下のセルでは、使用したいモデルのランIDで更新する必要があります。

In [0]:
import mlflow.keras

run_id = "5060b23ff2fd4faa985c547d65042776" # ご自身のMLFlowランIDで更新してください
model_uri = "runs:/" + run_id + "/model"
model = mlflow.keras.load_model(model_uri=model_uri)

新たなデータの到着に合わせてスコアリングを行う簡単な方法は、UDFからKerasモデルを呼び出すというものです。以下のセルでは、それぞれの行がKerasモデルの入力に合致するように変換を行い、データをKerasっモデルに渡し、推論結果を返却するUDFを定義しています。効率を最大にするために、Spark、Python間の効率的な中間メモリである[Apache Arrow](https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html)を利用できるように、UDFを[Pandas UDF](https://docs.databricks.com/spark/latest/spark-sql/udf-python-pandas.html)として定義しています。

以下のセルで定義している`ModelWrapperPickable`はpysparkのUDF作成の際のエラーを回避するためのものです。

**参考資料**
- [apache spark \- Using tensorflow\.keras model in pyspark UDF generates a pickle error \- Stack Overflow](https://stackoverflow.com/questions/61096573/using-tensorflow-keras-model-in-pyspark-udf-generates-a-pickle-error)
- [Pickling Keras Models](http://zachmoshe.com/2017/04/03/pickling-keras-models.html)

In [0]:
class ModelWrapperPickable:

  def __init__(self, model):
    self.model = model

  def __getstate__(self):
    import tempfile
    import tensorflow
    
    model_str = ''
    with tempfile.NamedTemporaryFile(suffix='.hdf5', delete=True) as fd:
      tensorflow.keras.models.save_model(self.model, fd.name, overwrite=True)
      model_str = fd.read()
      d = { 'model_str': model_str }
      return d

  def __setstate__(self, state):
    import tempfile
    import tensorflow
    
    with tempfile.NamedTemporaryFile(suffix='.hdf5', delete=True) as fd:
      fd.write(state['model_str'])
      fd.flush()
      self.model = tensorflow.keras.models.load_model(fd.name)

In [0]:
model_wrapper = ModelWrapperPickable(model)

In [0]:
import numpy as np
import pandas as pd
import os
import mlflow.keras

def predict_using_model(signals_array):
    """
    1. Loads the ECG data from the records specified in df_data
    2. Divide the signal data in windows of size window_size (default of 2048 which is enough to capture 3 heart beats.)
    
    returns:
        dataX: contains windowed ecg data (shape = n_windwows, n_channels, window_size)
        dataY: containts label for each window
        record_list: If required also returns a list specifying the record name for each window, else is empty list.
    """
    
    window_size = 2048
    n_channels = 15
    n_windows = 0
    
    preds = []
    
    for signals in signals_array:
    
      n_windows = len(signals[0]) // window_size

      dataX = np.zeros((n_windows, n_channels, window_size))
    
      record_list = []
    
      # レコードの読み込み、シグナルデータの取得および転置
      l = signals.tolist()
      signal_data = np.array(l)
      n_rows = len(list(signal_data[0]))
      n_windows = n_rows // window_size
      dataX[0:n_windows] = np.array([signal_data[:,i*window_size:(i+1)*window_size] for i in range(n_windows)])
    
      predictions = model_wrapper.model.predict(dataX)
    
      class0 = 0
      class1 = 1
    
      for x in predictions:
        if x[0] > x[1]:
          class0 += 1
        else:
          class1 += 1
        
      preds.append(0 if class0 > class1 else 1)
    
    return pd.Series(preds)

関数を使う前には、UDFを登録する必要があります。

In [0]:
from pyspark.sql.types import ArrayType, IntegerType
from pyspark.sql import functions as F

predict_pudf = F.pandas_udf(predict_using_model, IntegerType())

In [0]:
predict_pudf(F.map_values(df.signals))

これで、ライブストリーミングのデータセットに対して継続的にモデルを適用して、患者が心臓疾患を持っているかどうかを予測できるようになりました！

0が(健康)、1が(疾患)となります。

In [0]:
display(df.select(predict_pudf(F.map_values(df.signals)).alias('prediction')))

prediction
1
1
1
1
1
0
1
1
1
1


# END