# **Generating Data** w/ Numpy

In [12]:
import numpy as np

## **Non-Random** Values

### **np.empty** ( )

cara tercepat untuk generate N-Dimensional Array. Namun array yang dihasilkan tidak konsisten, tergantung memori yang tersedia di RAM. Biasanya akan generate 0. Namun bisa saja bukan 0.    
***shape** menentukan dimensi arraynya  
***dtype** menentukan data typenya *(optional)*

In [13]:
array_empty = np.empty(shape = (2,3))
array_empty

array([[1., 1., 1.],
       [1., 1., 1.]])

### **np.zeros** ( )

Membuat array dengan angka konsisten yaitu 0 (sesuai namanya). Argumennya pun sama dengan np.empty

In [14]:
# zeros
array_0s = np.zeros(shape  = (2,3))
array_0s

array([[0., 0., 0.],
       [0., 0., 0.]])

In [15]:
array_0s = np.zeros(shape = (2,3), dtype = np.int8) 
array_0s

array([[0, 0, 0],
       [0, 0, 0]], dtype=int8)

### **np.ones** ( )

Membuat array dengan angka konsisten yaitu 1 (sesuai namanya). Argumennya pun sama dengan np.empty

In [16]:
# ones
array_1s = np.ones(shape  = (2,3))
array_1s

array([[1., 1., 1.],
       [1., 1., 1.]])

### **np.full** ( )

Membuat array dengan nilai yang sudah kita tentukan di parameter ***fill-value***  
Array akan terisi dengan value tersebut (isinya sama semua dengan fill-value)  
***fill-value** menentukan array kita berisi value apa

In [17]:
array_full = np.full(shape = (2,3), fill_value = 2) # One additional mandatory argument - fill_value -> scalar
array_full

array([[2, 2, 2],
       [2, 2, 2]])

In [18]:
array_full = np.full(shape = (2,3), fill_value = 'Three-Six-Five')
array_full

# bisa juga diisi dengan String

array([['Three-Six-Five', 'Three-Six-Five', 'Three-Six-Five'],
       ['Three-Six-Five', 'Three-Six-Five', 'Three-Six-Five']],
      dtype='<U14')

### **np.arange** ( )

Membuat array dengan value dari **range** yang kita tentukan  
***start** menentukan mulai rangenya darimana  
***stop** menentukan berhenti rangenya dimana *(optional)*  jika stop tidak ditentukan maka start = stop  
***step** menentukan kelipatan value di range tersebut *(optional)*

In [19]:
array_rng = np.arange(30)
array_rng

## Creates an ndarray with the values in this range.
## Dimulai dari 0 dan berhenti di 29

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

In [20]:
array_rng = np.arange(stop = 30)
array_rng

# The only mandatory argument is "stop", rather than start. 
# If we specify only a stop, the function assumes array starts from the origin (0).

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

In [21]:
array_rng = np.arange(start = 0, stop =  30)
array_rng

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

In [22]:
array_rng = np.arange(start = 0, stop =  30, step = 2.5)
array_rng

# "Step" doesn't have to be the same type as the values of the array. 

array([ 0. ,  2.5,  5. ,  7.5, 10. , 12.5, 15. , 17.5, 20. , 22.5, 25. ,
       27.5])

Jika dtypenya berbeda, maka bisa jadi hasilnya juga berbeda  
Pastikan jika menggunakan **step decimal**, maka **dtypenya juga harus float** atau double

In [23]:
array_rng = np.arange(start = 0, stop =  30, step = 2.5, dtype = np.float32)
array_rng = np.arange(start = 0, stop =  30, step = 2.5, dtype = np.int32)
array_rng

# NumPy unik saat menghitung, kenapa berbeda?
# Pada int, NumPy menghitung 30 (range) : 2.5 (step) adalah 12
# Jadi value yang digenerate hanya 12. Sama dengan jumlah value yang digenerate jika dtype = float

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22])

## **"_like"** functions

Basicnya bisa menggunakan Non-Random dan Random, namun menambahkan **_like**   
Dimensi array yang digenerate **_like** akan sama dengan array yang menjadi prototypenya

In [24]:
matrix_A = np.array([[1,0,9,2,2],[3,23,4,5,1],[0,2,3,4,1]])
matrix_A

# kita membuat matrix_A sebagai prototype untuk ditiru
# terdiri dari 3 baris dan 5 kolom

array([[ 1,  0,  9,  2,  2],
       [ 3, 23,  4,  5,  1],
       [ 0,  2,  3,  4,  1]])

In [25]:
array_empty_like = np.empty_like(matrix_A)    
array_empty_like

# Shape and type are like the prototype. 
# If we want to override this, we can define dtype and shape and pass different values (but why even use empty_like then). 
# Membuat np.empty dengan fungsi _like

array([[3801155, 5570652, 6619251, 7536754, 4259932],
       [7667827, 6029427, 7340097, 4456560, 7602273],
       [6029409, 7274572, 6357091, 6029420, 6619220]])

In [26]:
array_0s_like = np.zeros_like(matrix_A)    
array_0s_like

# Membuat np.zeros dengan fungsi _like
# We have corresponding functions for 1s and full as well. 

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

Cara lain membuat value array menjadi **negatif** semua cukup dengan simbol negatif sebelum fungsi

In [49]:
array_negative_1 = -np.ones_like(matrix_A)
array_negative_1

array([[-1, -1, -1, -1, -1],
       [-1, -1, -1, -1, -1],
       [-1, -1, -1, -1, -1]])

## **Random** Generators

### Defining Random Generators

In [27]:
from numpy.random import Generator as gen
from numpy.random import PCG64 as pcg

# We load two functions from the numpy.random module
# Dua fungsi tersebut wajib digunakan saat generate random array
# Dibuat aliasnya dengan gen dan pcg agar tidak memanggil nama aslinya yang panjang "numpy.random.Generator"

In [28]:
# Pertama kita generate random valuenya dulu
array_RG = gen(pcg())

# Lalu generate bentuk n-dimensional array
# array_RG.normal()
# array_RG.normal(size = 5)
array_RG.normal(size = (5,5))

# RG is short for Random Generator.

array([[-1.5270306 ,  1.52938251, -0.9143592 , -0.16089776,  0.65360852],
       [-1.69416951, -0.77208921, -1.08430799,  0.70093662,  0.60493441],
       [ 0.30702595,  0.97680142, -0.05729642, -0.88432862, -1.69535749],
       [ 1.2190002 ,  1.77185489, -1.08122465,  0.03056569, -0.41505336],
       [-0.50966204, -0.99348025,  1.81049372,  0.47336698, -1.61141271]])

### Set a Seed

We can **set a seed**, so that our random values don't change everytime we re-run the code.  
We'll se the seed equal to 365.  
**Seed itu seperti code**. Jika kita generate random tapi seednya 365, lalu orang lain generate random tapi seednya 365 juga, hasilnya akan sama.

In [29]:
array_RG = gen(pcg(seed = 365)) 
array_RG.normal(size = (5,5))

# Re-running this cell provides a consistent output, since the seed (with fixed starting values) is set. 

array([[-0.13640899,  0.09414431, -0.06300442,  1.05391641, -0.6866818 ],
       [-0.50922173, -0.7999526 ,  0.73041825,  0.08825439, -2.1177576 ],
       [ 0.65526774, -0.48095012, -0.5519114 , -0.58578662, -0.98257896],
       [ 1.12378166, -1.30984316, -0.04703774,  0.955272  ,  0.26071745],
       [-0.20023668, -1.50172484, -1.4929163 ,  0.96535084,  1.18694633]])

In [30]:
array_RG.normal(size = (5,5))

# The seed is fixed for a single iteration.
# Kalo seed tidak didefinisikan didalam block yang sama, maka akan ke-randomize lagi 

array([[-0.76065577,  1.48158358,  0.01200258, -0.06846959,  0.25301664],
       [-0.52640788,  0.79613109,  0.28203421,  1.80238008,  0.93932117],
       [-0.53693283, -0.26317689, -1.77723035,  1.14900013, -2.20733915],
       [ 1.54116775, -0.5124932 , -2.14564563,  1.98878673,  0.32208907],
       [-1.2651495 ,  3.2714633 ,  1.78650493, -0.20233675,  0.20427467]])

### Generating Random **Integers**

***low** = angka paling rendah dari range (batas bawah) -> optional  
***high** = angka paling tinggi dari range (batas atas) -> optional

In [31]:
array_RG = gen(pcg(seed = 365)) 
array_RG.integers(10, size = (5,5))

# Batas bawahnya 0, batas atasnya 10

array([[0, 7, 6, 7, 8],
       [6, 6, 2, 0, 6],
       [3, 0, 3, 7, 9],
       [1, 1, 8, 7, 4],
       [4, 8, 6, 4, 9]], dtype=int64)

In [32]:
array_RG = gen(pcg(seed = 365)) 
array_RG.integers(low = 10, high = 100, size = (5,5))

# Menggunakan batas bawah dan batas atas

array([[18, 78, 64, 78, 84],
       [66, 67, 28, 10, 69],
       [45, 15, 37, 74, 96],
       [19, 21, 89, 73, 54],
       [53, 84, 66, 51, 92]], dtype=int64)

### Generating Random **Probabilities**

Array yang digenerate berupa value diantara 0 dan 1  
Hasilnya pasti bertipe **float**

In [33]:
array_RG = gen(pcg(seed = 365)) 
array_RG.random(size = (5,5))

array([[0.75915734, 0.7662218 , 0.6291028 , 0.20336599, 0.66501486],
       [0.06559111, 0.71326309, 0.10812106, 0.87969046, 0.49405844],
       [0.82472673, 0.45652944, 0.07367232, 0.69628564, 0.36690736],
       [0.29787156, 0.4996155 , 0.4865245 , 0.62740703, 0.54952637],
       [0.64894629, 0.04411757, 0.7206516 , 0.84594003, 0.17159792]])

### Generating Random **Choices**

Valuenya berdasarkan list atau tuple yang kita definisikan  
***array** yang akan mengisi value dari random generator kita  
***probabilities** (p) terdiri dari 0 sampai 1 yang memberi tahu besarnya probabilitas value tersebut muncul di array kita

In [34]:
array_RG.choice(matrix_A[0], size = (5,5))

# Index ke 0 pada Matrix_A di contoh "_like"
# Yang kegenerate value dari tuple index ke 0 tersebut di Matrix_A

array([[2, 2, 2, 0, 2],
       [9, 0, 2, 2, 0],
       [0, 0, 1, 9, 9],
       [9, 1, 0, 9, 2],
       [1, 2, 9, 1, 9]])

In [35]:
array_RG = gen(pcg(seed = 365)) 
array_RG.choice([1,2,3,4,5], size = (5,5))

array([[1, 4, 4, 4, 5],
       [4, 4, 2, 1, 4],
       [2, 1, 2, 4, 5],
       [1, 1, 5, 4, 3],
       [3, 5, 4, 3, 5]])

Banyaknya ***probabilities** harus sama dengan banyaknya value yang kita sediakan  
Seperti contoh di bawah, valuenya ada 5 (1 sampai 5) berarti ***p** nya juga ada 5  
Dan total ***p** harus 1 

In [36]:
array_RG = gen(pcg(seed = 365)) 
array_RG.choice((1,2,3,4,5), p = [0.1,0.1,0.1,0.1,0.6], size = (5,5))

# Cara bacanya probabilitas value 1 muncul adalah 10% 
# Probabilitas value 5 muncul adalah 60%
# Chooses among a given set (with possible weighted probabilities).

array([[5, 5, 5, 3, 5],
       [1, 5, 2, 5, 5],
       [5, 5, 1, 5, 4],
       [3, 5, 5, 5, 5],
       [5, 1, 5, 5, 2]])

In [37]:
array_RG = gen(pcg(seed = 60)) 
array_RG.choice(("Heads","Tails"), p = [0.3,0.7], size = (3,3))

array([['Tails', 'Heads', 'Heads'],
       ['Tails', 'Tails', 'Tails'],
       ['Tails', 'Tails', 'Tails']], dtype='<U5')

### Generating Arrays From **Known Distributions**

Menggunakan persamaan statistik, seperti **Poisson**  **Binomial**  **Logistic**  **Exponential** dan lainnya  
Masing-masing memiliki argumen unik yang wajib di define 

In [38]:
array_RG = gen(pcg(seed = 365)) 
array_RG.poisson(size = (5,5))

# The default Poisson distribution.

array([[2, 0, 1, 1, 2],
       [1, 1, 0, 1, 1],
       [1, 2, 1, 1, 0],
       [0, 1, 0, 2, 1],
       [0, 1, 0, 0, 2]], dtype=int64)

In [39]:
array_RG = gen(pcg(seed = 365)) 
array_RG.poisson(lam = 10,size = (5,5))

# Specifying lambda. 

array([[11, 12, 12, 14, 13],
       [ 9, 10, 11, 11,  8],
       [11,  8, 10,  9, 14],
       [ 7,  8,  9, 15, 15],
       [13,  8,  8,  7,  9]], dtype=int64)

In [40]:
array_RG = gen(pcg(seed = 365)) 
array_RG.binomial(n = 100, p = 0.4, size = (5,5))

# A binomial distribution with p = 0.4 and 100 trials. 

array([[42, 44, 30, 36, 45],
       [36, 41, 38, 42, 41],
       [35, 31, 35, 46, 29],
       [41, 41, 46, 34, 48],
       [45, 45, 45, 40, 43]], dtype=int64)

In [41]:
array_RG = gen(pcg(seed = 365)) 
array_RG.logistic(loc = 9, scale = 1.2, size = (5,5))

# A logistic distribution with a location = 9 and scale = 1.2.

array([[10.37767822, 10.42451863,  9.63404367,  7.36153427,  9.82286787],
       [ 5.81223125, 10.09354231,  6.46790532, 11.38740256,  8.97147918],
       [10.85844698,  8.79081317,  5.962079  ,  9.99560681,  8.34539118],
       [ 7.97105522,  8.9981544 ,  8.93530194,  9.6253307 ,  9.23850869],
       [ 9.73729284,  5.3090678 , 10.13723528, 11.04372782,  7.11078651]])

### **Applications** of Random Generators

#### Creating Tests

In [42]:
array_RG = gen(pcg(seed = 365)) 

array_column_1 = array_RG.normal(loc = 2, scale = 3, size = (1000))
array_column_2 = array_RG.normal(loc = 7, scale = 2, size = (1000))
array_column_3 = array_RG.logistic(loc = 11, scale = 3, size = (1000))
array_column_4  = array_RG.exponential(scale = 4, size = (1000))
array_column_5  = array_RG.geometric(p = 0.7, size = (1000))

# Create the individual columns of the dataset we're creating. 

In [43]:
random_test_data = np.array([array_column_1, array_column_2, array_column_3, array_column_4, array_column_5]).transpose()
random_test_data

# Menggabungkan semua array menjadi 1 array baru dengan nama "random_test_data" 
# lalu di transpose agar barisnya 1000 kolomnya 5 (sebelumnya kolomnya seribu barisnya 5)
# Use np.array to generate a new array with the 5 arrays we created earlier. 
# Use the transpose method to make sure our dataset isn't flipped. 

array([[ 1.59077303,  6.42174295, 10.14698427,  6.91500737,  1.        ],
       [ 2.28243293,  8.57902322, 15.93309953,  6.243605  ,  1.        ],
       [ 1.81098674,  5.17270135, -0.46878789,  2.44997251,  1.        ],
       ...,
       [ 0.1973629 ,  4.3465854 ,  2.66485989,  0.80935387,  1.        ],
       [-2.21015722,  8.2176402 , 12.69328115,  0.50644607,  2.        ],
       [ 2.91161235,  7.90337695, 11.79840961,  4.86816939,  1.        ]])

In [44]:
random_test_data.shape

(1000, 5)

#### **Saving** Tests to **CSV** file

In [45]:
np.savetxt("Random-Test-from-NumPy.csv", random_test_data, fmt = '%s', delimiter = ',')


# Saving the arrays to an extrenal file we're creating. 

# file name -> "Random-Test-from-NumPy.csv"
# random_test_data -> data we're exporting (saving to an external file)
# format -> strings
# delimiter ","

#### **Import** Tests from **CSV** file

In [46]:
np.genfromtxt("Random-Test-from-NumPy.csv", delimiter = ',')

# Importing the data from the file we just created. 

array([[ 1.59077303,  6.42174295, 10.14698427,  6.91500737,  1.        ],
       [ 2.28243293,  8.57902322, 15.93309953,  6.243605  ,  1.        ],
       [ 1.81098674,  5.17270135, -0.46878789,  2.44997251,  1.        ],
       ...,
       [ 0.1973629 ,  4.3465854 ,  2.66485989,  0.80935387,  1.        ],
       [-2.21015722,  8.2176402 , 12.69328115,  0.50644607,  2.        ],
       [ 2.91161235,  7.90337695, 11.79840961,  4.86816939,  1.        ]])

In [47]:
rand_test_data = np.genfromtxt("Random-Test-from-NumPy.csv", delimiter = ',')
print(rand_test_data)

[[ 1.59077303  6.42174295 10.14698427  6.91500737  1.        ]
 [ 2.28243293  8.57902322 15.93309953  6.243605    1.        ]
 [ 1.81098674  5.17270135 -0.46878789  2.44997251  1.        ]
 ...
 [ 0.1973629   4.3465854   2.66485989  0.80935387  1.        ]
 [-2.21015722  8.2176402  12.69328115  0.50644607  2.        ]
 [ 2.91161235  7.90337695 11.79840961  4.86816939  1.        ]]
