

# Pemrosesan Paralel dengan Dask
* **Produk yang digunakan:**
[gm_s2_annual](https://explorer.digitalearth.africa/gm_s2_annual)
* **Prasyarat**: Pengguna notebook ini harus memiliki pemahaman dasar tentang:
  * Cara menjalankan [Jupyter notebook](http://43.218.254.133:8888/notebooks/panduan-pengguna/01_Jupyter_notebooks.ipynb)
  * Memeriksa produk dan pengukuran yang tersedia di [Piksel Products and measurement](http://43.218.254.133:8888/notebooks/panduan-pengguna/02_Product_dan_measurement.ipynb)
  * Cara [membuka data Piksel](http://43.218.254.133:8888/notebooks/panduan-pengguna/03_Membuka_data.ipynb)
  * Cara [plotting data](http://43.218.254.133:8888/notebooks/panduan-pengguna/04_Plotting.ipynb)
  * Cara menjalankan [analisis dasar](http://43.218.254.133:8888/notebooks/panduan-pengguna/05_Analisis_dasar.ipynb)




## Latar Belakang
[Dask](https://dask.org/) adalah alat yang berguna saat bekerja dengan analisis skala besar (baik dalam ruang maupun waktu) karena membagi data menjadi bagian-bagian yang dapat dikelola dengan mudah dalam memori.
Dask juga dapat menggunakan beberapa inti pemrosesan untuk mempercepat perhitungan.
Hal ini memberikan banyak manfaat bagi analisis, yang akan dibahas dalam notebook ini.




## Deskripsi
Notebook ini membahas cara mengaktifkan Dask sebagai bagian dari proses pemuatan data, yang memungkinkan analisis area yang lebih luas dan rentang waktu yang lebih panjang tanpa menyebabkan lingkungan Piksel mengalami crash, serta berpotensi mempercepat perhitungan.

Topik yang dibahas dalam notebook ini meliputi:

1. Perbedaan antara perintah pemuatan standar dan pemuatan dengan Dask.
2. Mengaktifkan Dask dan Dask Dashboard.
3. Menentukan ukuran chunk untuk pemuatan data.
4. Memuat data dengan Dask.
5. Menggabungkan operasi sebelum memuat data dan memahami grafik tugas (task graphs).

***



## Memulai
Untuk menjalankan pengenalan Dask ini, jalankan semua sel dalam notebook mulai dari sel "Load packages". Untuk bantuan dalam menjalankan sel notebook, lihat kembali notebook [Jupyter notebook](http://43.218.254.133:8888/notebooks/panduan-pengguna/01_Jupyter_notebooks.ipynb) .


### Memuat Paket
Sel di bawah ini mengimpor paket `datacube`, yang sudah menyertakan fungsi Dask.
Paket `deafrica_tools` menyediakan akses ke fungsi pendukung yang berguna dalam modul `dask`, khususnya fungsi `create_local_dask_cluster`.

In [9]:
import datacube

from dea_tools.dask import create_local_dask_cluster



### Terhubung ke Datacube
Langkah berikutnya adalah menghubungkan ke database datacube.
Objek `dc` yang dihasilkan kemudian dapat digunakan untuk memuat data.
Parameter `app` adalah nama unik yang digunakan untuk mengidentifikasi notebook, tetapi tidak berpengaruh pada analisis.

In [10]:
dc = datacube.Datacube(app="08_parallel_processing_with_dask")



## Pemrosesan Standar
Secara default, pustaka `datacube` **tidak** akan menggunakan Dask saat memuat data.
Artinya, ketika `dc.load()` digunakan, semua data yang terkait dengan kueri pemuatan akan diminta dan dimuat ke dalam memori.

Untuk area yang sangat luas atau rentang waktu yang panjang, hal ini dapat menyebabkan Jupyter Notebook mengalami crash.

Untuk informasi lebih lanjut tentang cara menggunakan `dc.load()`, lihat notebook [Membuka Data](http://43.218.254.133:8888/notebooks/panduan-pengguna/03_Membuka_data.ipynb) dari Piksel.
Di bawah ini, kami menunjukkan contoh pemuatan data standar:

In [12]:
data = dc.load(product='s2_l2a',
               measurements=['red', 'green', 'blue'],
               x=(107.05926, 107.11926),
               y=(-6.49799, -6.43799),
               output_crs='EPSG:32748',
               resolution=(-10, 10),
               time=("2024-07-11", "2024-07-13"))

data

Querying product Product(name='s2_l2a', id_=9)
Resolution should be provided as a single int or float, or the axis order specified using odc.geo.resxy_ or odc.geo.resyx_




## Mengaktifkan Dask
Salah satu fitur utama Dask adalah kemampuannya memanfaatkan beberapa inti CPU untuk mempercepat perhitungan, yang dikenal sebagai komputasi terdistribusi.
Hal ini sangat berguna dalam situasi di mana Anda perlu melakukan banyak perhitungan pada kumpulan data yang besar.

Untuk mengatur komputasi terdistribusi dengan Dask, langkah pertama adalah mengatur klien Dask menggunakan fungsi berikut:

In [5]:
create_local_dask_cluster()

Perhaps you already have a cluster running?
Hosting the HTTP server on port 45505 instead


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /proxy/45505/status,

0,1
Dashboard: /proxy/45505/status,Workers: 1
Total threads: 2,Total memory: 4.84 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:41171,Workers: 1
Dashboard: /proxy/45505/status,Total threads: 2
Started: Just now,Total memory: 4.84 GiB

0,1
Comm: tcp://127.0.0.1:37691,Total threads: 2
Dashboard: /proxy/37325/status,Memory: 4.84 GiB
Nanny: tcp://127.0.0.1:39769,
Local directory: /tmp/dask-scratch-space/worker-qn6fkxav,Local directory: /tmp/dask-scratch-space/worker-qn6fkxav




Sebuah tampilan output akan muncul, menampilkan informasi tentang `Client` dan `Cluster`.
Untuk saat ini, yang paling penting adalah tautan setelah bagian **Dashboard**: yang terlihat seperti [/user/<email>/proxy/8787/status](#), di mana [\<email\>](#) adalah email Anda untuk Piksel.

Tautan ini memungkinkan Anda untuk melihat bagaimana perhitungan yang sedang dijalankan berkembang. Ada dua cara untuk melihat dasbor ini:

1. Klik tautan tersebut, yang akan membuka tab baru di browser Anda.
2. Mengatur dasbor di dalam lingkungan DE Africa.
   
Selanjutnya, kita akan membahas cara melakukan opsi kedua.



### Dashboard Dask di Piksel ###
Pada menu bar di sebelah kiri, klik ikon Dask, seperti yang ditunjukkan di bawah ini:

![Image](../Supplementary_data/08_parallel_processing_with_dask/dask.png)

Salin dan tempel tautan **Dashboard** dari hasil print out Client ke dalam kotak teks DASK DASHBOARD URL:

![Image](../Supplementary_data/08_parallel_processing_with_dask/dask_url_filled.png)

Jika URL valid, tombol-tombolnya akan berubah dari abu-abu menjadi oranye.
Klik tombol **PROGRESS** yang berwarna oranye di panel Dask, yang akan membuka tab baru di dalam Lingkungan Piksel.

Untuk melihat jendela Dask dan notebook aktif Anda pada waktu yang bersamaan, seret tab Progress Dask baru ke bagian bawah layar.

Sekarang, ketika Anda melakukan perhitungan dengan Dask, Anda akan melihat kemajuan perhitungan ini di jendela Dask baru.

:

## Lazy Load
Saat menggunakan Dask, fungsi `dc.load()` akan beralih dari memuat data secara langsung ke "lazy-loading" data.
Ini berarti data hanya akan dimuat saat diperlukan untuk perhitungan, yang dapat menghemat waktu dan memori.

Lazy-loading mengubah struktur data yang dikembalikan dari perintah `dc.load()`: `xarray.Dataset` yang dikembalikan akan terdiri dari objek `dask.array`.

Untuk meminta data yang dimuat secara tunda, tambahkan parameter `dask_chunks` ke pemanggilan `dc.load()` Anda:

In [10]:
lazy_data = dc.load(product='s2_l2a',
                    measurements=['red', 'green', 'blue'],
                    x=(107.05926, 107.11926),
                    y=(-6.49799, -6.43799),
                    output_crs='EPSG:32748',
                    resolution=(-10, 10),
                    time=("2024-07-11", "2024-07-13"),
                    dask_chunks={'time': 1, 'x': 100, 'y': 100})

lazy_data

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray


The function should return much faster, as it is not reading any data from disk.

Fungsi ini seharusnya mengembalikan hasil jauh lebih cepat, karena tidak ada data yang dibaca dari disk.

### Dask chunks

After adding the `dask_chunks` parameter to `dc.load()`, the lazy-loaded data contains `dask.array` objects with the `chunksize` listed.
The `chunksize` should match the `dask_chunks` parameter originally passed to `dc.load()`. 

Dask works by breaking up large datasets into chunks, which can be read individually. You may specify the number of pixels in each chunk for each dataset dimension.

For example, we passed the following chunk definition to `dc.load()`:
```
dask_chunks = {'time': 1, 'x': 3000, 'y': 3000}
```

This definition tells Dask to cut the data into chunks containing 3000 pixels in the `x` and `y` dimensions and one measurement in the `time` dimension. 
For DE Africa, we always set `'time': 1` in the `dask_chunk` definition, since the data files only span a single time.

If a chunk size is not provided for a given dimension, or if it set to -1, then the chunk will be set to the size of the array in that dimension.
This means all the data in that dimension will be loaded at once, rather than being broken into smaller chunks.


### Dask Chunks
Setelah menambahkan parameter `dask_chunks` ke dalam `dc.load()`, data yang di-lazy-loaded akan berisi objek `dask.array` dengan `chunksize` yang terdaftar. `chunksize` ini harus sesuai dengan parameter `dask_chunks` yang diberikan sebelumnya pada pemanggilan `dc.load()`.

Dask bekerja dengan membagi dataset besar menjadi potongan-potongan (chunks), yang dapat dibaca secara individual. Anda dapat menentukan jumlah piksel dalam setiap chunk untuk setiap dimensi dataset.

Sebagai contoh, kami mengirimkan definisi chunk berikut ke dalam `dc.load()`:
```
dask_chunks = {'time': 1, 'x': 3000, 'y': 3000}
```

Definisi ini memberi tahu Dask untuk memotong data menjadi chunk yang berisi 3000 piksel dalam dimensi `x` dan `y`, serta satu ukuran pengukuran dalam dimensi time. Untuk Piksel, kami selalu mengatur `'time': 1` dalam definisi `dask_chunk`, karena file data hanya mencakup satu waktu pengukuran.

Jika ukuran chunk tidak diberikan untuk dimensi tertentu, atau jika diatur ke -1, maka chunk tersebut akan disesuaikan dengan ukuran array pada dimensi tersebut. Ini berarti semua data pada dimensi tersebut akan dimuat sekaligus, bukannya dibagi menjadi chunk yang lebih kecil.

### Viewing Dask chunks

To get a visual intuition for how the data has been broken into chunks, we can use the `.data` attribute provided by `xarray`. 
This attribute can be used on individual measurements from the lazy-loaded data.
When used in a Jupyter Notebook, it provides a table summarising the size of individual chunks and the number of chunks needed.

An example is shown below, using the `red` measurement from the lazy-loaded data:

### Melihat Dask Chunks
Untuk memahami secara visual bagaimana data telah dibagi menjadi chunks, kita dapat menggunakan atribut `.data` yang disediakan oleh `xarray`.

Atribut ini dapat diterapkan pada setiap pengukuran dari data yang di-lazy-loaded. Ketika digunakan dalam Jupyter Notebook, atribut ini akan menampilkan tabel yang merangkum ukuran masing-masing chunk serta jumlah total chunk yang diperlukan.

Contoh di bawah ini menggunakan pengukuran `red` dari data yang di-lazy-loaded:

In [11]:
lazy_data.red.data

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray


From the Chunk column of the table, we can see that the data has been broken into 4 chunks, with each chunk having a shape of `(1 time, 3000 pixels, 3000 pixels)` and taking up 18.00MB of memory. 
Comparing this with the Array column, using Dask means that we can load 4 lots of 18.00MB. rather than one lot of 57.67MB. 

This is valuable when it comes to working with large areas or time-spans, as the entire array may not always fit into the memory available.
Breaking large datasets into chunks and loading chunks one at a time means that you can do computations over large areas without crashing the DE Africa environment.

Dari kolom Chunk pada tabel, kita dapat melihat bahwa data telah dibagi menjadi 4 bagian (chunk), di mana setiap chunk memiliki bentuk `(1 waktu, 3000 piksel, 3000 piksel)` dan membutuhkan memori sebesar 18,00MB.

Jika dibandingkan dengan kolom Array, penggunaan Dask memungkinkan kita untuk memuat 4 bagian data masing-masing sebesar 18,00MB, daripada harus langsung memuat satu bagian besar sebesar 57,67MB.

Pendekatan ini sangat berguna saat bekerja dengan area yang luas atau rentang waktu yang panjang, karena seluruh array mungkin tidak selalu cukup untuk dimuat ke dalam memori yang tersedia. Dengan membagi dataset besar menjadi beberapa chunk dan memuatnya satu per satu, kita dapat melakukan perhitungan pada dataset besar tanpa menyebabkan sistem Piksel mengalami crash.

## Loading lazy data

When working with lazy-loaded data, you have to specifically ask Dask to read and load data when you want to use it. 
Until you do this, the lazy-loaded dataset only knows where the data is, not its values.

To load the data from disk, call `.load()` on the `DataArray` or `Dataset`.
If you opened the Dask progress window, you should see the computation proceed there.

Saat bekerja dengan data yang di-lazy-loaded, Anda harus secara eksplisit meminta Dask untuk membaca dan memuat data ketika ingin menggunakannya.
Sampai Anda melakukan ini, dataset yang di-lazy-loaded hanya mengetahui lokasi data, tetapi tidak mengetahui nilainya.

Untuk memuat data dari disk, gunakan metode `.load()` pada `DataArray` atau `Dataset`.
Jika Anda sudah membuka jendela progres Dask, Anda akan melihat proses komputasi berjalan di sana.

In [12]:
loaded_data = lazy_data.load()

In [13]:
loaded_data

The Dask arrays constructed by the lazy load
Array Dask yang dibuat melalui lazy load
```
red      (time, y, x) uint16 dask.array<chunksize=(1, 3000, 3000), meta=np.ndarray>
```
have now been replaced with actual numbers:
sekarang telah digantikan dengan angka sebenarnya:

```
 red      (time, y, x) uint16 10967 11105 10773 10660 ... 12431 12410 12313
 ```

After applying the `.load()` command, the lazy-loaded data is the same as the data loaded from the first query.

Setelah menerapkan perintah `.load()`, data yang di-lazy-loaded menjadi sama dengan data yang dimuat dari query pertama.



## Lazy operations

In addition to breaking data into smaller chunks that fit in memory, Dask has another advantage in that it can track how you want to work with the data, then only perform the necessary operations later.

We'll now explore how to do this by calculating the normalised difference vegetation index (NDVI) for our data.
To do this, we'll perform the lazy-load operation again, this time adding the near-infrared band (`nir`) to the `dc.load()` command:

Selain membagi data menjadi potongan-potongan kecil agar muat di memori, Dask memiliki keunggulan lain, yaitu dapat melacak bagaimana Anda ingin bekerja dengan data dan hanya menjalankan operasi yang diperlukan nanti.

Sekarang, kita akan mengeksplorasi cara kerja ini dengan menghitung Normalized Difference Vegetation Index (NDVI) dari data kita.
Untuk itu, kita akan melakukan lazy load lagi, kali ini dengan menambahkan pita near-infrared (`NIR`) ke dalam perintah `dc.load()`.

In [15]:
lazy_data = dc.load(product='s2_l2a',
                    measurements=['red', 'green', 'blue', 'nir'],
                    x=(107.05926, 107.11926),
                    y=(-6.49799, -6.43799),
                    output_crs='EPSG:32748',
                    resolution=(-10, 10),
                    time=("2024-07-11", "2024-07-13"),
                    dask_chunks={'time': 1, 'x': 100, 'y': 100})

lazy_data

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray


### Adding more tasks

The power of this method comes from chaining tasks together before loading the data.
This is because Dask will only load the data that is required by the final operation in the chain.

We can demonstrate this by requesting only a small portion of the red band.
If we do this for the lazy-loaded data, we can view the new task graph:

### Menambahkan Lebih Banyak Tugas
Keunggulan utama metode ini adalah kemampuannya untuk merangkai tugas-tugas sebelum memuat data.
Dengan cara ini, Dask hanya akan memuat data yang benar-benar dibutuhkan untuk operasi akhir dalam rantai tugas tersebut.

Kita bisa mendemonstrasikan ini dengan meminta hanya sebagian kecil dari red band.
Jika kita melakukan ini pada data yang dimuat secara lazy, kita bisa melihat grafik tugas (task graph) yang baru.

In [33]:
extract_from_red = lazy_data.red[:, 100:200, 100:200]

Notice that the new task `getitem` has been added, and that it only applies to the left-most chunk.
If we call `.load()` on the `extract_from_red` Dask array, Dask trace the operation back through the graph to find only the relevant data.
This can save both memory and time.

We can establish that the above operation yields the same result as loading the data without Dask and subsetting it by running the command below:

Perhatikan bahwa tugas baru `getitem` telah ditambahkan, dan hanya berlaku pada chunk paling kiri.
Jika kita memanggil `.load()` pada array Dask `extract_from_red`, Dask akan melacak operasi kembali melalui grafik tugas untuk menemukan hanya data yang relevan.

Pendekatan ini dapat menghemat memori dan waktu secara signifikan.

Kita bisa memastikan bahwa operasi di atas menghasilkan hasil yang sama seperti memuat data tanpa Dask dan melakukan subset secara manual dengan menjalankan perintah berikut:

In [34]:
lazy_red_subset = extract_from_red.load()
data_red_subset = data.red[:, 100:200, 100:200]

print(f"The loaded arrays match: {lazy_red_subset.equals(data_red_subset)}")

The loaded arrays match: True


Since the arrays are the same, it is worth using lazy-loading to chain operations together, then calling `.load()` when you're ready to get the answer.
This saves time and memory, since Dask will only load the input data that is required to get the final output. 
In this example, the lazy-load only needed to load a small section of the `red` band, whereas the original load to get `data` had to load the `red`, `green` and `blue` bands, then subset the `red` band, meaning time was spent loading data that wasn't used.

Karena array yang dihasilkan sama, maka lebih baik menggunakan lazy-loading untuk merangkai operasi bersama sebelum akhirnya memanggil `.load()`.

Pendekatan ini menghemat waktu dan memori, karena Dask hanya akan memuat data input yang benar-benar diperlukan untuk mendapatkan output akhir.

Dalam contoh ini, lazy-load hanya perlu memuat sebagian kecil dari band red, sedangkan metode pemuatan `data` biasa harus memuat seluruh band `red`, `green`, dan `blue` terlebih dahulu, lalu melakukan subset pada band red.
Akibatnya, waktu dan memori terbuang untuk memuat data yang sebenarnya tidak digunakan.

### Multiple tasks

The power of using lazy-loading in Dask is that you can continue to chain operations together until you are ready to get the answer.

Here, we chain multiple steps together to calculate a new band for our array. Specifically, we use the `red` and `nir` bands to calculate the normalized difference vegetation index:

Keunggulan utama dari lazy-loading dalam Dask adalah kemampuannya untuk merangkai banyak operasi bersama sebelum akhirnya memuat hasil akhir.

Di sini, kita akan merangkai beberapa langkah sekaligus untuk menghitung sebuah band baru dalam array kita, yaitu Normalized Difference Vegetation Index (NDVI).
NDVI dihitung menggunakan band `red` dan `nir`, dengan rumus berikut:

In [35]:
band_diff = lazy_data.nir - lazy_data.red
band_sum = lazy_data.nir + lazy_data.red

lazy_data['ndvi'] = band_diff / band_sum

Doing this adds the new `ndvi` Dask array to the `lazy_data` dataset:

Dengan melakukan ini, array Dask `ndvi` yang baru ditambahkan ke dalam dataset `lazy_data`.

In [36]:
lazy_data

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.40 MiB,78.12 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 5 graph layers,49 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.40 MiB 78.12 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 5 graph layers Data type float64 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,3.40 MiB,78.12 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 5 graph layers,49 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Finally, we can calculate the NDVI values by calling the `.load()` command.
We'll store the result in the `ndvi_load` variable:

Terakhir, kita dapat menghitung nilai NDVI dengan memanggil perintah `.load()`.
Kita akan menyimpan hasilnya dalam variabel `ndvi_load`:

In [37]:
ndvi_load = lazy_data.ndvi.load()
ndvi_load

Note that running the `.load()` command also modifies the `ndvi` entry in the `lazy_load` dataset:

Perhatikan bahwa menjalankan perintah `.load()` juga memodifikasi entri `ndvi` dalam dataset `lazy_load`:

In [38]:
lazy_data

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray


You can see that `ndvi` is a number, whereas all the other variables are Dask arrays.

### Keeping variables as Dask arrays
If you wanted to calculate the NDVI values, but leave `ndvi` as a dask array in `lazy_load`, you can use the `.compute()` command instead.

To demonstrate this, we first redefine the `ndvi` variable so that it becomes a Dask array again

Anda dapat melihat bahwa `ndvi` adalah angka, sedangkan semua variabel lainnya masih berupa array Dask.

### Menjaga variabel tetap sebagai array Dask
Jika Anda ingin menghitung nilai NDVI tetapi tetap membiarkan `ndvi` sebagai array Dask dalam `lazy_load`, Anda dapat menggunakan perintah `.compute()`.

Untuk mendemonstrasikannya, pertama-tama kita mendefinisikan ulang variabel `ndvi` agar kembali menjadi array Dask.

In [39]:
lazy_data['ndvi'] = band_diff / band_sum
lazy_data

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.40 MiB,78.12 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 5 graph layers,49 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.40 MiB 78.12 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 5 graph layers Data type float64 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,3.40 MiB,78.12 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 5 graph layers,49 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Now, we perform the same steps as before to calculate NDVI, but use `.compute()` instead of `.load():`

Sekarang, kita melakukan langkah yang sama seperti sebelumnya untuk menghitung NDVI, tetapi menggunakan `.compute()` alih-alih `.load()`:

In [40]:
ndvi_compute = lazy_data.ndvi.compute()
ndvi_compute

You can see that the values have been calculated, but as shown below, the `ndvi` variable is kept as a Dask array.

Anda dapat melihat bahwa nilai telah dihitung, tetapi seperti yang ditunjukkan di bawah ini, variabel `ndvi` tetap sebagai array Dask.

In [41]:
lazy_data

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 870.23 kiB 19.53 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 1 graph layer Data type uint16 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,870.23 kiB,19.53 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 1 graph layer,49 chunks in 1 graph layer
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.40 MiB,78.12 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 5 graph layers,49 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.40 MiB 78.12 kiB Shape (1, 667, 668) (1, 100, 100) Dask graph 49 chunks in 5 graph layers Data type float64 numpy.ndarray",668  667  1,

Unnamed: 0,Array,Chunk
Bytes,3.40 MiB,78.12 kiB
Shape,"(1, 667, 668)","(1, 100, 100)"
Dask graph,49 chunks in 5 graph layers,49 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Using `.compute()` can allow you to calculate in-between steps and store the results, without modifying the original Dask dataset or array. 
However, be careful when using `.compute()`, as it may lead to confusion about what you have and have not modified, as well as multiple computations of the same quantity.

Menggunakan `.compute()` memungkinkan Anda menghitung langkah-langkah perantara dan menyimpan hasilnya tanpa mengubah dataset atau array Dask asli. Namun, berhati-hatilah saat menggunakannya, karena dapat menyebabkan kebingungan tentang apa yang telah dan belum dimodifikasi, serta kemungkinan perhitungan ulang untuk kuantitas yang sama.

## Further Resources

For further reading on how Dask works, and how it is used by xarray, see these resources:

 * http://xarray.pydata.org/en/stable/dask.html
 * https://dask.readthedocs.io/en/latest/
 * http://stephanhoyer.com/2015/06/11/xray-dask-out-of-core-labeled-arrays/

### Other notebooks
This is the last notebook in the beginner's guide; if anything was unclear, we recommend revising the relevant notebook:

1. [Jupyter Notebooks](01_Jupyter_notebooks.ipynb)
2. [Products and Measurements](02_Products_and_measurements.ipynb)
3. [Loading data](03_Loading_data.ipynb)
4. [Plotting](04_Plotting.ipynb)
5. [Performing a basic analysis](05_Basic_analysis.ipynb)
6. [Introduction to numpy](06_Intro_to_numpy.ipynb)
7. [Introduction to xarray](07_Intro_to_xarray.ipynb)
8. **Parallel processing with Dask (this notebook)**

Once you have completed the above eight tutorials, join advanced users in exploring:

* The "Datasets" directory in the repository, where you can explore DE Africa products in depth.
* The "Frequently used code" directory, which contains a recipe book of common techniques and methods for analysing DE Africa data.
* The "Real-world examples" directory, which provides more complex workflows and analysis case studies.

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Compatible datacube version:** 

In [21]:
print(datacube.__version__)

1.8.15


**Last Tested:** 

In [22]:
from datetime import date
print(date.today())

2023-08-11
