#Belajar PySpark - MapType pada Skema DataFrame

PySpark MapType memungkinkan kita untuk menggambarkan kolom yang berisi data dalam format map atau dictionary. Dalam artikel ini, kita akan membahas cara mendefinisikan MapType, cara mengakses elemennya, dan beberapa fungsi terkait seperti explode(), map_keys(), dan map_values().


In [1]:
%pip install pyspark

Collecting pyspark
  Downloading pyspark-3.5.0.tar.gz (316.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m316.9/316.9 MB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
  Created wheel for pyspark: filename=pyspark-3.5.0-py2.py3-none-any.whl size=317425345 sha256=916d52fcc07db3f308c331aea5afd3b7e324d318a87b623914ab8af140414aa1
  Stored in directory: /root/.cache/pip/wheels/41/4e/10/c2cf2467f71c678cfc8a6b9ac9241e5e44a01940da8fbb17fc
Successfully built pyspark
Installing collected packages: pyspark
Successfully installed pyspark-3.5.0


In [2]:
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, ArrayType, MapType, StringType, IntegerType
from pyspark.sql.functions import array_contains, explode, split, array

In [3]:
spark = SparkSession.builder.appName("Belajar PySpark - MapType").getOrCreate()

##MApType pada Skema DataFrame

In [4]:
mySchema = StructType([
    StructField("nama", StringType(), True),
    StructField("jurusan", StringType(), True),
    StructField("nilai", MapType(StringType(), IntegerType()), True)
])

###Menggunakan arrayType pada DataFrame

In [20]:
data = [['Agus Supono','F',{"uts":100,"uas":150,"tugas":150}],
        ['Budi Sumardi','B',{"uts":200,"uas":100,"tugas":150}],
        ['Dina Mariana','F',{"uts":150,"uas":150,"tugas":130}],
        ['Dedi Setiadi','B', {"uts":50,"uas":100,"tugas":100,"remedial":100}]]

df = spark.createDataFrame(data, mySchema)
df.show(truncate=False)
df.printSchema()

+------------+-------+------------------------------------------------------+
|nama        |jurusan|nilai                                                 |
+------------+-------+------------------------------------------------------+
|Agus Supono |F      |{tugas -> 150, uts -> 100, uas -> 150}                |
|Budi Sumardi|B      |{tugas -> 150, uts -> 200, uas -> 100}                |
|Dina Mariana|F      |{tugas -> 130, uts -> 150, uas -> 150}                |
|Dedi Setiadi|B      |{tugas -> 100, remedial -> 100, uts -> 50, uas -> 100}|
+------------+-------+------------------------------------------------------+

root
 |-- nama: string (nullable = true)
 |-- jurusan: string (nullable = true)
 |-- nilai: map (nullable = true)
 |    |-- key: string
 |    |-- value: integer (valueContainsNull = true)



##Mengakses MapType Kolom

In [10]:
df.withColumn("UTS",df.nilai.getItem("uts")) \
  .withColumn("UAS",df.nilai.getItem("uas")) \
  .withColumn("TUGAS",df.nilai.getItem("tugas")) \
  .show()

+------------+-------+--------------------+---+---+-----+
|        nama|jurusan|               nilai|UTS|UAS|TUGAS|
+------------+-------+--------------------+---+---+-----+
| Agus Supono|      F|{tugas -> 150, ut...|100|150|  150|
|Budi Sumardi|      B|{tugas -> 150, ut...|200|100|  150|
|Dina Mariana|      F|{tugas -> 130, ut...|150|150|  130|
|Dedi Setiadi|      B|{tugas -> 100, ut...| 50|100|  100|
+------------+-------+--------------------+---+---+-----+



In [16]:
df.select(df.nama, df.jurusan,
          df.nilai.getItem("uts").alias("UTS"),
          df.nilai.getItem("uas").alias("UAS"),
          df.nilai.getItem("tugas").alias("TUGAS")).show()

+------------+-------+---+---+-----+
|        nama|jurusan|UTS|UAS|TUGAS|
+------------+-------+---+---+-----+
| Agus Supono|      F|100|150|  150|
|Budi Sumardi|      B|200|100|  150|
|Dina Mariana|      F|150|150|  130|
|Dedi Setiadi|      B| 50|100|  100|
+------------+-------+---+---+-----+



In [17]:
df.select(df.nama, df.jurusan,
          df.nilai["uts"].alias("UTS"),
          df.nilai["uas"].alias("UAS"),
          df.nilai["tugas"].alias("TUGAS")).show()

+------------+-------+---+---+-----+
|        nama|jurusan|UTS|UAS|TUGAS|
+------------+-------+---+---+-----+
| Agus Supono|      F|100|150|  150|
|Budi Sumardi|      B|200|100|  150|
|Dina Mariana|      F|150|150|  130|
|Dedi Setiadi|      B| 50|100|  100|
+------------+-------+---+---+-----+



###Fungsi-fungsi MapType

####Fungsi `map_keys()`

Fungsi ini mengembalikan array yang berisi semua key dari map.


In [22]:
from pyspark.sql.functions import map_keys

df.select(df.nama,df.jurusan,map_keys(df.nilai)) \
            .show(truncate=False)

+------------+-------+---------------------------+
|nama        |jurusan|map_keys(nilai)            |
+------------+-------+---------------------------+
|Agus Supono |F      |[tugas, uts, uas]          |
|Budi Sumardi|B      |[tugas, uts, uas]          |
|Dina Mariana|F      |[tugas, uts, uas]          |
|Dedi Setiadi|B      |[tugas, remedial, uts, uas]|
+------------+-------+---------------------------+



####Fungsi `map_values()`

Fungsi ini mengembalikan array yang berisi semua value dari map


In [23]:
from pyspark.sql.functions import map_values

df.select(df.nama,df.jurusan,map_values(df.nilai)) \
            .show(truncate=False)

+------------+-------+-------------------+
|nama        |jurusan|map_values(nilai)  |
+------------+-------+-------------------+
|Agus Supono |F      |[150, 100, 150]    |
|Budi Sumardi|B      |[150, 200, 100]    |
|Dina Mariana|F      |[130, 150, 150]    |
|Dedi Setiadi|B      |[100, 100, 50, 100]|
+------------+-------+-------------------+



####Fungsi `Explode()`

Fungsi explode digunakan untuk mengubah setiap pasangan key-value menjadi satu baris atau record tersendiri. Misalnya untuk contoh di atas :


In [24]:
from pyspark.sql.functions import explode

df.select(df.nama,df.jurusan,explode(df.nilai)) \
            .show(truncate=False)

+------------+-------+--------+-----+
|nama        |jurusan|key     |value|
+------------+-------+--------+-----+
|Agus Supono |F      |tugas   |150  |
|Agus Supono |F      |uts     |100  |
|Agus Supono |F      |uas     |150  |
|Budi Sumardi|B      |tugas   |150  |
|Budi Sumardi|B      |uts     |200  |
|Budi Sumardi|B      |uas     |100  |
|Dina Mariana|F      |tugas   |130  |
|Dina Mariana|F      |uts     |150  |
|Dina Mariana|F      |uas     |150  |
|Dedi Setiadi|B      |tugas   |100  |
|Dedi Setiadi|B      |remedial|100  |
|Dedi Setiadi|B      |uts     |50   |
|Dedi Setiadi|B      |uas     |100  |
+------------+-------+--------+-----+

