# Kobold names generator

In this notebook we will train a Gaussian Mixture Model to generate
kobold names. I am a kobold expert and I know what kobold names look
like, so I will make up the list of example names myself.

For this work, we will need to install the legendary `sklearn` machine
learning library. My hands literally shiver when I type this.

In [2]:
pip install scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.5.2-cp312-cp312-macosx_10_9_x86_64.whl.metadata (13 kB)
Collecting scipy>=1.6.0 (from scikit-learn)
  Downloading scipy-1.14.1-cp312-cp312-macosx_14_0_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.8/60.8 kB[0m [31m504.6 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting joblib>=1.2.0 (from scikit-learn)
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Downloading threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.5.2-cp312-cp312-macosx_10_9_x86_64.whl (12.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.1/12.1 MB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m00:01[0

## What kobold names are like?

Obviously, kobold names look something like this. If you've ever seen
a kobold's name, you will immediately know what I mean.

In [3]:
example_names = [
    'Kobbo', 'Alexandro', 'Raiko', 'Zabu',
    'Marvin', 'Ramuso', 'Meepe', 'Blah'
]

example_names

['Kobbo', 'Alexandro', 'Raiko', 'Zabu', 'Marvin', 'Ramuso', 'Meepe', 'Blah']

These are good, gender-neutral names, very fitting for our task.

## Training

Let's now setup the machine learning model.

In [19]:
from sklearn.feature_extraction.text import CountVectorizer
from scipy.sparse import csr_matrix
import numpy as np

# Step 1: Feature Extraction using character n-grams
vectorizer = CountVectorizer(analyzer='char', ngram_range=(2, 4))
X: csr_matrix = vectorizer.fit_transform(example_names)

X

<Compressed Sparse Row sparse matrix of dtype 'int64'
	with 84 stored elements and shape (8, 82)>

But what is Compressed Sparse Row sparse matrix of dtype 'int64'?

In [20]:
type(X)

scipy.sparse._csr.csr_matrix

It's a [Compressed Sparse Row matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html),
a data type from SciPy — a space-efficient representation of a matrix.

In [21]:
for x in X:
    print(x)

<Compressed Sparse Row sparse matrix of dtype 'int64'
	with 9 stored elements and shape (1, 82)>
  Coords	Values
  (0, 38)	1
  (0, 58)	1
  (0, 18)	1
  (0, 23)	1
  (0, 39)	1
  (0, 59)	1
  (0, 19)	1
  (0, 40)	1
  (0, 60)	1
<Compressed Sparse Row sparse matrix of dtype 'int64'
	with 21 stored elements and shape (1, 82)>
  Coords	Values
  (0, 6)	1
  (0, 43)	1
  (0, 32)	1
  (0, 76)	1
  (0, 12)	1
  (0, 55)	1
  (0, 25)	1
  (0, 67)	1
  (0, 7)	1
  (0, 44)	1
  (0, 33)	1
  (0, 77)	1
  (0, 13)	1
  (0, 56)	1
  (0, 26)	1
  (0, 8)	1
  (0, 45)	1
  (0, 34)	1
  (0, 78)	1
  (0, 14)	1
  (0, 57)	1
<Compressed Sparse Row sparse matrix of dtype 'int64'
	with 9 stored elements and shape (1, 82)>
  Coords	Values
  (0, 38)	1
  (0, 62)	1
  (0, 3)	1
  (0, 35)	1
  (0, 63)	1
  (0, 4)	1
  (0, 36)	1
  (0, 64)	1
  (0, 5)	1
<Compressed Sparse Row sparse matrix of dtype 'int64'
	with 6 stored elements and shape (1, 82)>
  Coords	Values
  (0, 79)	1
  (0, 0)	1
  (0, 24)	1
  (0, 80)	1
  (0, 1)	1
  (0, 81)	1
<Compressed Spa

Most of the matrix is filled with zeroes, and only in some places
there are ones.

In [22]:
from sklearn.mixture import GaussianMixture

# Step 2: Train a Gaussian Mixture Model
gmm = GaussianMixture(n_components=6, random_state=42)
gmm.fit(X.toarray())

> ***But what is Gaussian Mixture and why do I need it?!***
> 
> For now I have no idea..

Ahem, anyway, let's now perform the *inference* of our model.

In [28]:
sample = gmm.sample(10)

sample

(array([[ 5.97720467e-04,  2.55948803e-03, -7.03343802e-04,
          1.00213962e+00,  1.00049018e+00,  1.00139488e+00,
          5.97548420e-04, -5.04414161e-04,  5.85878710e-04,
         -7.67321415e-04, -1.07812384e-03,  5.23438719e-04,
         -1.05118623e-04,  3.77562060e-04, -1.52161283e-03,
         -1.22280128e-03,  2.48549341e-04,  2.30587683e-04,
         -1.62207656e-04, -1.56466897e-03, -2.38136181e-03,
          5.37754949e-04,  4.68540672e-04,  1.76736024e-04,
         -8.90496245e-04, -4.45338493e-04, -1.66887075e-05,
         -2.36493588e-03,  8.07198758e-04,  1.26353626e-03,
          9.30494014e-04,  1.04720714e-04,  3.29330350e-05,
          3.39847263e-04,  1.38140990e-03,  9.99028044e-01,
          1.00148301e+00,  2.04588852e-04,  9.98638136e-01,
          2.00360776e-03,  2.65654375e-04, -2.66968804e-05,
          3.49630671e-06, -1.36904994e-03, -9.38091138e-04,
         -7.85457743e-04,  1.26419539e-03, -1.43113076e-03,
         -2.91113300e-04,  2.21551265e-0

I have no idea what this means. :c

## Conclusion

I haven't learned anything. I lack some background knowledge I believe.