# MNIST Dataset

The MNIST database of handwritten digits, available from this page: 

 	http://yann.lecun.com/exdb/mnist/ 
    
has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.



# Converted MNIST Dataset 

The files, X2.bin, Y2_int8.bin, and Y2.bin are binary files converted from the original MNIST image file and label files: 

1. X2.bin (219,520,000 bytes)

    70,000 digit (gray) images stored sequentially and each digit image is stored in 28x28 Float32s (in little endian), top-down and row-by-row. 


2. Y2_int8.bin (70,000 bytes)

    70,000 Int8 labels of the corresponding digit images
    

3. Y2.bin (2,800,000 byte)

   70,000 one-hot-vector, where a 10-float32 label represents one-hot vector of the represented digit label, for example, digit 3, is represented as the following vector:

		(0, 0, 0, 1, 0, 0, 0, 0, 0, 0)


In [None]:
using Images, Colors, FileIO

# Read MNIST binary data 

## Julia - Read MNIST binary Float32 image data

In [None]:
pwd()

In [None]:
cd("MNIST")

In [None]:
readdir()

## Read digit images

In [None]:
digits = Array{Float32, 2}(undef, 28*28, 70_000) # requires 219,520,000 bytes) Bytes
read!("X2.bin", digits);

In [None]:
typeof(digits), size(digits), sizeof(digits)

## Display digit images

In [None]:
digit = reshape(digits[:, 34], 28, 28);

In [None]:
digit =  Matrix{Gray{N0f8}}(digit)

In [None]:
digit'

## Read labels

In [None]:
labels = Vector{Int8}(undef,70000)
read!("Y2_int8.bin", labels);

In [None]:
labels[34]

## Plot histrogram of digit ditribution

In [None]:
using Plots

In [None]:
histogram(labels, label="digits", bins=11, color=:gray)
xlabel!("digit")

In [None]:
histogram(labels, label="digits", bins=11, normalize=true, color=:gray)

In [None]:
labels_digits = []
for i in 0:9 
    push!(labels_digits, Vector{Int32}(findall(==(i), labels)))
end

In [None]:
length.(labels_digits)

In [None]:
sum(length.(labels_digits))

## Different Binary IO Approaches


In [None]:
img = fill(0f0, 28*28*70_000) # requires 219,520,000 bytes) Bytes
open("X2.bin") do io
    read!(io, img)
end;

In [None]:
# The follwoing method (from ChatGPT) to read binary Float32 vector from a file doesn't work

# Specify the path to your binary file
file_path = "X2.bin"

# Open the file in read mode
file = open(file_path, "r")

# Read the binary data as Float32 values
data = read(file, Vector{Float32})

# Close the file
close(file)

## Python - Read MNIST binary Float32 image data

*Python code:*

#Read digit images: 28X28X65,000 grey images (binary data)

`digits = np.fromfile('X.bin', dtype=np.single) # binary read into one-dim array`

#Read digit image labels: 10X65,000 (one-hot vector)

`labels = np.fromfile('Y.bin', dtype=np.single)`

using FileIO, Images, Colors