# Generating Dataset

## Introduction
We want to create a polynomial dataset of the form `(polynomial, classified_roots)` where polynomial a degree-$n$ polynomial and `classified_roots` is a list of three elements: number of integer roots, number of non-integer real roots, and number of complex roots of the polynomial.

Before we start, let us import some useful libraries.

In [4]:
import random
import numpy
import numpy.polynomial
import scipy
import scipy.sparse

Now, we want to generate these polynomials randomly. The general idea here is to first randomly generate the polynomial roots, and then build the polynomial from those roots. Recall that given roots $r_1,r_2,\ldots,r_n$, we can form a polynomial by $$f(x) = \prod_{i=1}^n x - r_i$$ such that $f(x)$ has the roots $r_1,r_2,\ldots,r_n$.

*Typically you will see that polynomial datasets are generated by generating the polynomial first and then finding the roots. However, for the problem we are trying to solve, it makes sense, for computational reasons, to generate the dataset in such a way. However, for numerical stability, it is better to generate the polynomial first.*

Let us first generate random roots. Now, we need to be smart about how we generate these roots. Recall that the problem we are trying to solve here is to find the number of integer, non-integer real, and complex roots of a given polynomial. It wouldn't make sense to have a dataset of polynomials that only has complex roots, only non-integer real roots, etc. Instead, we need to make sure that there is somewhat of an even distribution in the number of each type of root.

One way to approach this is to generate random complex numbers. Recall that a complex number, $z$ can be expressed as an ordered pair $z=(x,y)$ where $x,y\in\mathbb{R}$ such that $\Re(z) = x$ and $\Im(z) = y$. We can randomly generate the real parts and imaginary parts separately. As a design choice, we will restrict our roots up to a single decimal point. Since we want an even distribution of each type, we will use a sparse matrix with density $<1$ to generate the imaginary array to ensure that there exist real roots. In order to have integer roots, we will do this by rounding a random set of roots to $0$ decimal places.

In [81]:
def generate_roots(num):
    """
    This function randomly generates num number of random roots.
    
    Input(s):
    num := [Integer] the number of roots to generate, i.e. the degree of our polynomial
    
    Output:
    [List] A list of randomly generated complex roots.
    """
    
    real_parts = 100 * scipy.sparse.rand(num, 1, density=1).toarray()
    imaginary_parts = 100 * scipy.sparse.rand(num, 1, density=0.4).toarray()
    roots = [complex(round(real_parts[i], random.randint(0, 1)), round(imaginary_parts[i], 1)) \
             for i in range(len(real_parts))]
    return roots

Now that we have a function `generate_roots` that generates random roots, we can use the roots to compute a polynomial. To do this, we will use a function from the NumPy Polynomial library, `numpy.polynomial.polynomial.polyfromroots`.

In [82]:
def generate_polynomial(roots):
    """
    This function generates a polynomial given a list of roots
    
    Input(s):
    roots := [List] list of roots of a polynomial
    
    Output:
    [ndarray] An array of the polynomial's coefficients.
    """
    
    return numpy.polynomial.polynomial.polyfromroots(roots)

Now, the crux of our problem is to find the number of the different types of roots. We now create a function that classifies each root into one of the three categories: integer, non-integer reals, and complex.

In [83]:
def classify_roots(roots):
    """
    This function classifies a list of roots into three classes: integers, non-integer reals, and complex.
    
    Input(s):
    roots := [List] list of roots of a polynomial
    
    Output:
    [List] A list of length three of the form [num of integer roots, num of non-integer real roots, 
    num of complex roots]
    """
    
    classified_roots = [0, 0, 0]
    for root in roots:
        if root[1] > 0:
            classified_roots[2] += 1
        else:
            if int(root[0]) == root[0]:
                classified_roots[0] += 1
            else:
                classified_roots[1] += 1
    return classified_roots

Now, using the functions we defined earlier, we will create our dataset.

In [84]:
def generate_dataset(n):
    """
    This function generates a dataset of n data points.
    
    Input(s):
    n := [Integer] number of data points to generate
    
    Output:
    [List] A list of tuples of the form (polynomial, classified_roots)
    """
    
    data = []
    for i in range(n):
        deg = random.randint(2, 11)
        roots = generate_roots(deg)
        roots_lst = map(lambda x: [round(x.real, 1), round(x.imag, 1)], roots)
        poly = generate_polynomial(roots)
        classified_roots = classify_roots(roots_lst)
        data.append((poly, classified_roots))
    return data

In [85]:
DATA = generate_dataset(1000)
DATA

[(array([ 6.82579894e+12+1.88343309e+14j, -3.80889895e+12-3.32134495e+13j,
          5.14480148e+11+2.13932288e+12j, -2.85730749e+10-6.90860376e+10j,
          8.12999847e+08+1.24105488e+09j, -1.28716242e+07-1.26134980e+07j,
          1.14368480e+05+6.79552500e+04j, -5.31500000e+02-1.51000000e+02j,
          1.00000000e+00+0.00000000e+00j]), [0, 5, 3]),
 (array([-2.01344e+04-1.850816e+05j,  2.89800e+03+8.843000e+03j,
         -1.02900e+02-9.560000e+01j,  1.00000e+00+0.000000e+00j]), [1, 1, 1]),
 (array([-6.62719018e+10-2.12700642e+11j,  1.55044191e+10+3.91536550e+10j,
         -1.48985011e+09-2.84085450e+09j,  7.57406695e+07+1.03407412e+08j,
         -2.18443462e+06-1.98091284e+06j,  3.54760700e+04+1.88429500e+04j,
         -2.98400000e+02-6.93000000e+01j,  1.00000000e+00+0.00000000e+00j]),
  [3, 2, 2]),
 (array([228.+0.j, -44.+0.j,   1.+0.j]), [2, 0, 0]),
 (array([-9.57548797e+14+2.00252971e+15j,  4.79163291e+13-1.99111034e+14j,
         -9.05416601e+10+8.34519205e+12j, -4.45429459e+1

In [86]:
counts = [0, 0, 0]
for i in range(len(DATA)):
    counts[0] += DATA[i][1][0]
    counts[1] += DATA[i][1][1]
    counts[2] += DATA[i][1][2]
counts

[2349, 1868, 2133]