In [3]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


# Introduction

ulab is a C module for micropython. My goal was to implement a small subset of numpy. I chose those functions and methods that might be useful in the context of a microcontroller. This means low-level data processing of linear (array) and two-dimensional (matrix) data.

The main points of ulab are 

- compact, iterable and slicable container of numerical data in 1, and 2 dimensions (arrays and matrices). In addition, these containers support all the relevant unary and binary operators (e.g., `len`, ==, +, *, etc.)
- vectorised computations on micropython iterables and numerical arrays/matrices (universal functions)
- basic linear algebra routines (matrix inversion, matrix reshaping, and transposition)
- polynomial fits to numerical data
- fast Fourier transforms

The code itself is split into submodules. This should make exclusion of unnecessary functions, if storage space is a concern. Each section of the implementation part kicks out with a short discussion on what can be done with the particular submodule, and what are the tripping points at the C level. I hope that these musings can be used as a starting point for further discussion on the code.

The code and its documentation can be found under https://github.com/v923z/micropython-ulab/. The MIT licence applies to all material.

# Environmental settings and magic commands

The entire C source code, as well as the documentation (mainly verbose comments on certain aspects of the implementation) are contained in this notebook. The code is exported to separate C files in `/ulab/`, and then compiled from this notebook. However, I would like to stress that the compilation does not require a jupyter notebook. It can be done from the command line by invoking the command in the [make](#make), or [Compiling the module](#Compiling-the-module). After all, the ipython kernel simply passes the `make` commands to the underlying operating system.

Testing is done on the unix and stm32 ports of micropython, also directly from the notebook. This is why this section contains a couple of magic functions. But once again: the C module can be used without the notebook. 

In [1]:
%cd ../../micropython/ports/unix/

/home/v923z/sandbox/micropython/v1.11/micropython/ports/unix


In [2]:
from IPython.core.magic import Magics, magics_class, line_cell_magic
from IPython.core.magic import cell_magic, register_cell_magic, register_line_magic
from IPython.core.magic_arguments import argument, magic_arguments, parse_argstring
import subprocess
import os

In [804]:
def string_to_matrix(string):
    matrix = []
    string = string.replace("array(\'d\', ", '').replace(')', '').replace('[', '').replace(']', '')
    for _str in string.split('\r\n'):
        if len(_str) > 0:
            matrix.append([float(n) for n in _str.split(',')])
    return array(matrix)

## micropython magic command

The following magic class takes the content of a cell, and depending on the arguments, either passes it to the unix, or the stm32 implementation. In the latter case, a pyboard must be connected to the computer, and must be initialised beforehand. 

In [3]:
@magics_class
class PyboardMagic(Magics):
    @cell_magic
    @magic_arguments()
    @argument('-skip')
    @argument('-unix')
    @argument('-file')
    @argument('-data')
    @argument('-time')
    @argument('-memory')
    def micropython(self, line='', cell=None):
        args = parse_argstring(self.micropython, line)
        if args.skip: # doesn't care about the cell's content
            print('skipped execution')
            return None # do not parse the rest
        if args.unix: # tests the code on the unix port. Note that this works on unix only
            with open('/dev/shm/micropython.py', 'w') as fout:
                fout.write(cell)
            proc = subprocess.Popen(["./micropython", "/dev/shm/micropython.py"], 
                                    stdout=subprocess.PIPE, stderr=subprocess.PIPE)
            print(proc.stdout.read().decode("utf-8"))
            print(proc.stderr.read().decode("utf-8"))
            return None
        if args.file: # can be used to copy the cell content onto the pyboard's flash
            spaces = "    "
            try:
                with open(args.file, 'w') as fout:
                    fout.write(cell.replace('\t', spaces))
                    printf('written cell to {}'.format(args.file))
            except:
                print('Failed to write to disc!')
            return None # do not parse the rest
        if args.data: # can be used to load data from the pyboard directly into kernel space
            message = pyb.exec(cell)
            if len(message) == 0:
                print('pyboard >>>')
            else:
                print(message.decode('utf-8'))
                # register new variable in user namespace
                self.shell.user_ns[args.data] = string_to_matrix(message.decode("utf-8"))
        
        if args.time: # measures the time of executions
            pyb.exec('import utime')
            message = pyb.exec('t = utime.ticks_us()\n' + cell + '\ndelta = utime.ticks_diff(utime.ticks_us(), t)' + 
                               "\nprint('execution time: {:d} us'.format(delta))")
            print(message.decode('utf-8'))
        
        if args.memory: # prints out memory information 
            message = pyb.exec('from micropython import mem_info\nprint(mem_info())\n')
            print("memory before execution:\n========================\n", message.decode('utf-8'))
            message = pyb.exec(cell)
            print(">>> ", message.decode('utf-8'))
            message = pyb.exec('print(mem_info())')
            print("memory after execution:\n========================\n", message.decode('utf-8'))

        else:
            message = pyb.exec(cell)
            print(message.decode('utf-8'))

ip = get_ipython()
ip.register_magics(PyboardMagic)

### pyboard initialisation

In [None]:
import pyboard
pyb = pyboard.Pyboard('/dev/ttyACM0')
pyb.enter_raw_repl()

### pyboad detach

In [None]:
pyb.exit_raw_repl()
pyb.close()

In [5]:
import IPython

js = """
    (function () {
        var defaults = IPython.CodeCell.config_defaults || IPython.CodeCell.options_default;
        defaults.highlight_modes['magic_text/x-csrc'] = {'reg':[/^\\s*%%ccode/]};
    })();
    """
cjs = """
        IPython.CodeCell.options_default.highlight_modes['magic_text/x-csrc'] = {'reg':[/^\\s*%%ccode/]};
    """

IPython.core.display.display_javascript(cjs, raw=True)

js = """
    (function () {
        var defaults = IPython.CodeCell.config_defaults || IPython.CodeCell.options_default;
        defaults.highlight_modes['magic_text/x-csrc'] = {'reg':[/^\\s*%%makefile/]};
    })();
    """
IPython.core.display.display_javascript(js, raw=True)

## Code magic

The following cell magic simply writes a licence header, and the contents of the cell to the file given in the header of the cell. 

In [4]:
@magics_class
class MyMagics(Magics):
        
    @cell_magic
    def ccode(self, line, cell):
        copyright = """/*
 * This file is part of the micropython-ulab project, 
 *
 * https://github.com/v923z/micropython-ulab
 *
 * The MIT License (MIT)
 *
 * Copyright (c) 2019 Zoltán Vörös
*/
    """
        if line:
            with open('../../../ulab/code/'+line, 'w') as cout:
                cout.write(copyright)
                cout.write(cell)
            print('written %d bytes to %s'%(len(copyright) + len(cell), line))
            return None

ip = get_ipython()
ip.register_magics(MyMagics)

# Notebook conversion

In [1007]:
%cd ../../../ulab/docs/

/home/v923z/sandbox/micropython/v1.11/ulab/docs


In [1008]:
import nbformat as nb
import nbformat.v4.nbbase as nb4
from nbconvert import RSTExporter

def convert_notebook(nbfile, rstfile):
    (rst, resources) = rstexporter.from_filename(nbfile)
    with open(rstfile, 'w') as fout:
        fout.write(rst)
        
rstexporter = RSTExporter()
rstexporter.template_file = './templates/rst.tpl'

convert_notebook('ulab.ipynb', './source/ulab.rst')

  mimetypes=output.keys())


# Compiling the module

Detailed instructions on how to set up and compile a C module can be found in chapter 2 of https://micropython-usermod.readthedocs.io/en/latest/. 

First, on the command line, you should clone both the micropython, and the `ulab` repositories: 

```bash
git clone https://github.com/micropython/micropython.git
```
Then navigate to your micropython folder, and run 

```bash
git clone https://github.com/v923z/micropython-ulab.git ulab
```

Finally, in the `mpconfigport.h` header file of the port that you want to compile for, you have to define the variable `MODULE_ULAB_ENABLED`

```make
#define MODULE_ULAB_ENABLED (1)
```

At this point, you should be able to run make in the port's root folder:

```bash
make USER_C_MODULES=../../../ulab all
```
(unix port) or 
```bash
make BOARD=PYBV11 CROSS_COMPILE=<Path where you uncompressed the toolchain>/bin/arm-none-eabi-
```
(pyboard). When compiling for the pyboard (or any other hardware platform), you might or might not have to set the cross-compiler's path. If your installation of the cross-compiler is system-wide, you can drop the `make` argument `CROSS_COMPILE`.

# The ndarray type

## General comments

`ndarrays` are efficient containers of numerical data of the same type (i.e., signed/unsigned chars, signed/unsigned integers or floats). Beyond storing the actual data, the type definition has three additional members (on top of the `base` type). Namely, two `size_t` objects, `m`, and `n`, which give the dimensions of the matrix (obviously, if the `ndarray` is meant to be linear, either `m`, or `n` is equal to 1), as well as the byte size, `bytes`, i.e., the total number of bytes consumed by the data container. `bytes` is equal to `m*n` for `byte` types (`uint8`, and `int8`), to `2*m*n` for integers (`uint16`, and `int16`), and `4*m*n` for floats. 

The type definition is as follows:

```c
typedef struct _ndarray_obj_t {
    mp_obj_base_t base;
    size_t m, n;
    mp_obj_array_t *array;
    size_t bytes;
} ndarray_obj_t;
```

**NOTE: with a little bit of extra effort, mp_obj_array_t can be replaced by a single void array. We should, perhaps, consider the pros and cons of that. One patent advantage is that we could get rid of the verbatim copy of array_new function in ndarray.c. On the other hand, objarray.c facilities couldn't be used anymore.**

## Handling different types

In order to make the code type-agnostic, we will resort to macros, where necessary. This will inevitably add to the firmware size, because, in effect, we unroll the code for each possible case. However, the source will be much more readable. Also note that by unrolling, we no longer need intermediate containers and we no longer need to dispatch type-conversion functions, which means that we should be able to gain in speed.

### Additional structure members in numpy

Also note that, in addition, `numpy` defines the following members:

- `.ndim`: the number of dimensions of the array (in our case, it would be 1, or 2)
- `.size`: the number of elements in the array; it is the product of m, and n
- `.dtype`: the data type; in our case, it is basically stored in data->typecode
- `.itemsize`: the size of a single element in the array: this can be gotten by calling `mp_binary_get_size('@', data->typecode, NULL)`.

One should, perhaps, consider, whether these are necessary fields. E.g., if `ndim` were defined, then 

```c
if((myarray->m == 1) || (myarray->n == 1)) {
    ...
}
```

could be replaced by 

```c
if(myarray->ndim == 1) {
    ...
}
```
and 
```c
if((myarray->m > 1) && (myarray->n > 1)) {
    ...
}
```
would be equivalent to 
```c
if(myarray->ndim == 2) {
    ...
}
```

One could also save the extra function call `mp_binary_get_size('@', data->typecode, NULL)`, if `itemsize` is available. 

### Returning and accepting raw bytes

It might make sense to have a function that returns the raw content of the `ndarray`. The rationale for this is that this would make direct use of calculation results a piece of cake. E.g., the DAC could be fed as 

```python
length = 100
amp = 127

x = linspace(0, 2*pi, length)
y = ndarray(128 + amp*sin(x), dtype=uint8)
buf = y.bytearray()

dac = DAC(1)
dac.write_timed(buf, 400*length, mode=DAC.CIRCULAR)
```

Likewise, having the option of writing raw data directly into the `ndarray` could simplify data analysis. E.g., ADC results could be processed as follows:

```python
length = 100
y = ndarray([0]*length, dtype=uint16)

adc = pyb.ADC(pyb.Pin.board.X19)
tim = pyb.Timer(6, freq=10)
buf = y.bytearray()
adc.read_timed(buf, tim)

y.reshape((10, 10)) # or whatever
```

### Exposed functions and properties

Most of the functions in ndarray.c are internal (i.e., not exposed to the python interpreter). Exception to this rule are the `shape`, `size`, and `rawsize` functions, and the `.unary_op`, `.binary_op`, and `.iter_next` class methods. Note that `rawsize` is is not standard in numpy, and is meant to return the total number of bytes used by the container. Since the RAM of a microcontroller is limited, I deemed this to be a reasonable addition for optimisation purposes, but could later be removed, if it turns out to be of no interest.

As mentioned above, `numpy` defines a number of extra members in its `ndarray`. It would be great, if we could return these members as properties of the `ndarray` instance. At the moment, `shape` is a function, as is `rawsize`. 

## Initialisation

An `ndarray` can be initialised by passing an iterable (linear arrays), or an iterable of iterables (matrices) into the constructor. In addition, the constructor can take a keyword argument, `dtype`, that will force type conversion. The default value is `float`.

In [124]:
%%micropython -unix 1

from ulab import ndarray

a = ndarray([1, 2, 3, 4])
print(a)
a = ndarray([[1, 2, 3, 4], [2, 3, 4, 5]])
print('\n', a)
a = ndarray([range(10), range(10)])
print('\n', a)

ndarray([1.0, 2.0, 3.0, 4.0], dtype=float)

 ndarray([[1.0, 2.0, 3.0, 4.0],
	 [2.0, 3.0, 4.0, 5.0]], dtype=float)

 ndarray([[0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0],
	 [0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0]], dtype=float)




## Slicing and subscriptions

Subscriptions must resolve the following cases:

1. The index is a single item (can be a slice, a list, or an integer). This case must be evaluated properly, even if the array is two-dimensional.
2. The index is a tuple: a tuple can contain integers, slices, and Boolean lists. 

In order to simplify the code, scalars will be turned into slices of length 1. 

Now, if the index is a scalar, we might have to distinguish two cases: once the array can be a row vector, in which case the method must return a single item as an `mp_obj_t` object. On the other hand, if the array is two-dimensional, the method must return a row as an ndarray. numpy returns an array, if the index was a slice with length one:

In [89]:
a = array([1, 2, 3, 4])
# this is a python object
print(a[0])

# this is an array
print(a[:1])

1
[1]


In [122]:
%%micropython -unix 1

from ulab import ndarray

# initialise a matrix
a = ndarray([[1, 2, 3, 4], [6, 7, 8, 9]])
print('2D array: \n', a)

# print out the second row
print('second row of matrix: ', a[1])

#initialise an array
a = ndarray([1, 2, 3, 4, 5, 6, 7, 8, 9])
print('\n1D array: ', a)
# slice the array
print('slize between 1, and 5: ', a[1:5])

2D array: 
 ndarray([[1.0, 2.0, 3.0, 4.0],
	 [6.0, 7.0, 8.0, 9.0]], dtype=float)
second row of matrix:  ndarray([6.0, 7.0, 8.0, 9.0], dtype=float)

1D array:  ndarray([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], dtype=float)
slize between 1, and 5:  ndarray([2.0, 3.0, 4.0, 5.0], dtype=float)




## Iterators

`ndarray` objects can be iterated on, and just as in numpy, matrices are iterated along their first axis, and they return  `ndarray`s. 

In [121]:
%%micropython -unix 1

from ulab import ndarray

#  initialise a matrix
a = ndarray([[1, 2, 3, 4], [6, 7, 8, 9]])
print('2D array: \n', a)

# print out the matrix' rows, one by one
for i, _a in enumerate(a): 
    print('\nrow %d: '%i, _a)

2D array: 
 ndarray([[1.0, 2.0, 3.0, 4.0],
	 [6.0, 7.0, 8.0, 9.0]], dtype=float)

row 0:  ndarray([1.0, 2.0, 3.0, 4.0], dtype=float)

row 1:  ndarray([6.0, 7.0, 8.0, 9.0], dtype=float)




On the other hand, flat arrays return their elements:

In [34]:
%%micropython -unix 1

from ulab import ndarray, uint8

# initialise an array
a = ndarray(range(10), dtype=uint8)
print('1D array: ', a)

# print out the array's elements, one by one
for i, _a in enumerate(a): 
    print('element %d: '%i, _a)

1D array:  ndarray([0, 1, 2, ..., 7, 8, 9], dtype=uint8)
element 0:  0
element 1:  1
element 2:  2
element 3:  3
element 4:  4
element 5:  5
element 6:  6
element 7:  7
element 8:  8
element 9:  9




## Upcasting

The following section shows the upcasting rules of `numpy`, and immediately after each case, the test for `ulab`.

### uint8

In [60]:
a = array([100], dtype=uint8)
b = array([101], dtype=uint8)
a+b, a-b, a*b, a/b

(array([201], dtype=uint8),
 array([255], dtype=uint8),
 array([116], dtype=uint8),
 array([0.99009901]))

In [727]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.uint8)
b = ulab.ndarray([101], dtype=ulab.uint8)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201], dtype=uint8)
ndarray([255], dtype=uint8)
ndarray([116], dtype=uint8)
ndarray([0.9900990128517151], dtype=float)




In [63]:
a = array([100], dtype=uint8)
b = array([101], dtype=int8)
a+b, a-b, a*b, a/b

(array([201], dtype=int16),
 array([-1], dtype=int16),
 array([10100], dtype=int16),
 array([0.99009901]))

In [728]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.uint8)
b = ulab.ndarray([101], dtype=ulab.int8)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201], dtype=int16)
ndarray([-1], dtype=int16)
ndarray([10100], dtype=int16)
ndarray([0.9900990128517151], dtype=float)




In [75]:
a = array([100], dtype=uint8)
b = array([101], dtype=uint16)
a+b, a-b, a*b, a/b

(array([201], dtype=uint16),
 array([65535], dtype=uint16),
 array([10100], dtype=uint16),
 array([0.99009901]))

In [639]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.uint8)
b = ulab.ndarray([101], dtype=ulab.uint16)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201], dtype=uint16)
ndarray([65535], dtype=uint16)
ndarray([10100], dtype=uint16)
ndarray([0.9900990128517151], dtype=float)




In [83]:
a = array([100], dtype=uint8)
b = array([101], dtype=int16)
a+b, a-b, a*b, a/b

(array([201], dtype=int16),
 array([-1], dtype=int16),
 array([10100], dtype=int16),
 array([0.99009901]))

In [638]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.uint8)
b = ulab.ndarray([101], dtype=ulab.int16)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201], dtype=int16)
ndarray([-1], dtype=int16)
ndarray([10100], dtype=int16)
ndarray([0.9900990128517151], dtype=float)




In [93]:
a = array([100], dtype=uint8)
b = array([101], dtype=float)
a+b, a-b, a*b, a/b

(array([201.]), array([-1.]), array([10100.]), array([0.99009901]))

In [92]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.uint8)
b = ulab.ndarray([101], dtype=ulab.float)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201.0], dtype=float)
ndarray([-1.0], dtype=float)
ndarray([10100.0], dtype=float)
ndarray([0.9900990128517151], dtype=float)




### int8

In [56]:
a = array([100], dtype=int8)
b = array([101], dtype=uint8)
a + b, a-b, a*b, a/b

(array([201], dtype=int16),
 array([-1], dtype=int16),
 array([10100], dtype=int16),
 array([0.99009901]))

In [637]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.int8)
b = ulab.ndarray([101], dtype=ulab.uint8)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201], dtype=int16)
ndarray([-1], dtype=int16)
ndarray([10100], dtype=int16)
ndarray([0.9900990128517151], dtype=float)




In [97]:
a = array([100, 101], dtype=int8)
b = array([200, 101], dtype=int8)
a+b, a-b, a*b, a/b

(array([ 44, -54], dtype=int8),
 array([-100,    0], dtype=int8),
 array([ 32, -39], dtype=int8),
 array([-1.78571429,  1.        ]))

In [636]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100, 101], dtype=ulab.int8)
b = ulab.ndarray([200, 101], dtype=ulab.int8)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([44, -54], dtype=int8)
ndarray([-100, 0], dtype=int8)
ndarray([32, -39], dtype=int8)
ndarray([-1.785714268684387, 1.0], dtype=float)




In [99]:
a = array([100], dtype=int8)
b = array([200], dtype=uint16)
a+b, a-b, a*b, a/b

(array([300], dtype=int32),
 array([-100], dtype=int32),
 array([20000], dtype=int32),
 array([0.5]))

In [98]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.int8)
b = ulab.ndarray([200], dtype=ulab.uint16)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([300], dtype=int16)
ndarray([-100], dtype=int16)
ndarray([20000], dtype=int16)
ndarray([0], dtype=int16)




In [101]:
a = array([100], dtype=int8)
b = array([200], dtype=int16)
a+b, a-b, a*b, a/b

(array([300], dtype=int16),
 array([-100], dtype=int16),
 array([20000], dtype=int16),
 array([0.5]))

In [635]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.int8)
b = ulab.ndarray([200], dtype=ulab.int16)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([300], dtype=int16)
ndarray([-100], dtype=int16)
ndarray([20000], dtype=int16)
ndarray([0.5], dtype=float)




In [106]:
a = array([100, 101], dtype=int8)
b = array([200, 101], dtype=float)
a+b, a-b, a*b, a/b

(array([300., 202.]),
 array([-100.,    0.]),
 array([20000., 10201.]),
 array([0.5, 1. ]))

In [105]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100, 101], dtype=ulab.int8)
b = ulab.ndarray([200, 101], dtype=ulab.float)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([300.0, 202.0], dtype=float)
ndarray([-100.0, 0.0], dtype=float)
ndarray([20000.0, 10201.0], dtype=float)
ndarray([0.5, 1.0], dtype=float)




### uint16

In [110]:
a = array([100, 101], dtype=uint16)
b = array([200, 101], dtype=uint8)
a+b, a-b, a*b, a/b

(array([300, 202], dtype=uint16),
 array([65436,     0], dtype=uint16),
 array([20000, 10201], dtype=uint16),
 array([0.5, 1. ]))

In [634]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100, 101], dtype=ulab.uint16)
b = ulab.ndarray([200, 101], dtype=ulab.uint8)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([300, 202], dtype=uint16)
ndarray([65436, 0], dtype=uint16)
ndarray([20000, 10201], dtype=uint16)
ndarray([0.5, 1.0], dtype=float)




In [111]:
a = array([100, 101], dtype=uint16)
b = array([200, 101], dtype=int8)
a+b, a-b, a*b, a/b

(array([ 44, 202], dtype=int32),
 array([156,   0], dtype=int32),
 array([-5600, 10201], dtype=int32),
 array([-1.78571429,  1.        ]))

**This deviates from numpy's behaviour, because we don't have int32.**

In [633]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100, 101], dtype=ulab.uint16)
b = ulab.ndarray([200, 101], dtype=ulab.int8)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([44, 202], dtype=uint16)
ndarray([156, 0], dtype=uint16)
ndarray([59936, 10201], dtype=uint16)
ndarray([-1.785714268684387, 1.0], dtype=float)




In [113]:
a = array([100], dtype=uint16)
b = array([101], dtype=uint16)
a+b, a-b, a*b, a/b

(array([201], dtype=uint16),
 array([65535], dtype=uint16),
 array([10100], dtype=uint16),
 array([0.99009901]))

In [632]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.uint16)
b = ulab.ndarray([101], dtype=ulab.uint16)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201], dtype=uint16)
ndarray([65535], dtype=uint16)
ndarray([10100], dtype=uint16)
ndarray([0.9900990128517151], dtype=float)




In [114]:
a = array([100], dtype=uint16)
b = array([101], dtype=int16)
a+b, a-b, a*b, a/b

(array([201], dtype=int32),
 array([-1], dtype=int32),
 array([10100], dtype=int32),
 array([0.99009901]))

**Again, in numpy, the result is an int32**

In [631]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.uint16)
b = ulab.ndarray([101], dtype=ulab.int16)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201.0], dtype=float)
ndarray([-1.0], dtype=float)
ndarray([10100.0], dtype=float)
ndarray([0.9900990128517151], dtype=float)




In [115]:
a = array([100], dtype=uint16)
b = array([101], dtype=float)
a+b, a-b, a*b, a/b

(array([201.]), array([-1.]), array([10100.]), array([0.99009901]))

In [125]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.uint16)
b = ulab.ndarray([101], dtype=ulab.float)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201.0], dtype=float)
ndarray([-1.0], dtype=float)
ndarray([10100.0], dtype=float)
ndarray([0.9900990128517151], dtype=float)




### int16

In [116]:
a = array([100], dtype=int16)
b = array([101], dtype=uint8)
a+b, a-b, a*b, a/b

(array([201], dtype=int16),
 array([-1], dtype=int16),
 array([10100], dtype=int16),
 array([0.99009901]))

In [630]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.int16)
b = ulab.ndarray([101], dtype=ulab.uint8)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201], dtype=int16)
ndarray([-1], dtype=int16)
ndarray([10100], dtype=int16)
ndarray([0.9900990128517151], dtype=float)




In [117]:
a = array([100], dtype=int16)
b = array([101], dtype=int8)
a+b, a-b, a*b, a/b

(array([201], dtype=int16),
 array([-1], dtype=int16),
 array([10100], dtype=int16),
 array([0.99009901]))

In [629]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.int16)
b = ulab.ndarray([101], dtype=ulab.int8)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201], dtype=int16)
ndarray([-1], dtype=int16)
ndarray([10100], dtype=int16)
ndarray([0.9900990128517151], dtype=float)




In [118]:
a = array([100], dtype=int16)
b = array([101], dtype=uint16)
a+b, a-b, a*b, a/b

(array([201], dtype=int32),
 array([-1], dtype=int32),
 array([10100], dtype=int32),
 array([0.99009901]))

**While the results are correct, here we have float instead of int32**

In [132]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.int16)
b = ulab.ndarray([101], dtype=ulab.uint16)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201.0], dtype=float)
ndarray([-1.0], dtype=float)
ndarray([10100.0], dtype=float)
ndarray([0.0], dtype=float)




In [119]:
a = array([100], dtype=int16)
b = array([101], dtype=int16)
a+b, a-b, a*b, a/b

(array([201], dtype=int16),
 array([-1], dtype=int16),
 array([10100], dtype=int16),
 array([0.99009901]))

In [133]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.int16)
b = ulab.ndarray([101], dtype=ulab.int16)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201], dtype=int16)
ndarray([-1], dtype=int16)
ndarray([10100], dtype=int16)
ndarray([0], dtype=int16)




In [120]:
a = array([100], dtype=int16)
b = array([101], dtype=float)
a+b, a-b, a*b, a/b

(array([201.]), array([-1.]), array([10100.]), array([0.99009901]))

In [134]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([100], dtype=ulab.int16)
b = ulab.ndarray([101], dtype=ulab.float)
print(a+b)
print(a-b)
print(a*b)
print(a/b)

ndarray([201.0], dtype=float)
ndarray([-1.0], dtype=float)
ndarray([10100.0], dtype=float)
ndarray([0.9900990128517151], dtype=float)




When in an operation the `dtype` of two arrays is different, the result's `dtype` will be decided by the following upcasting rules: 

1. Operations with two `ndarray`s of the same `dtype` preserve their `dtype`, even when the results overflow.

2. if either of the operands is a float, the results is also a float

3. 
    - `uint8` + `int8` => `int16`, 
    - `uint8` + `int16` => `int16`
    - `uint8` + `uint16` => `uint16`
    
    - `int8` + `int16` => `int16`
    - `int8` + `uint16` => `uint16` (in numpy, it is a `int32`)

    - `uint16` + `int16` => `float` (in numpy, it is a `int32`)
    
4. When the right hand side of a binary operator is a micropython variable, `mp_obj_int`, or `mp_obj_float`, then the result will be promoted to `dtype` `float`. This is necessary, because a micropython integer can be 31 bites wide.

`numpy` is also inconsistent in how it represents `dtype`: as an argument, it is denoted by the constants `int8`, `uint8`, etc., while a string will be returned, if the user asks for the type of an array.

The upcasting rules are stipulated in a single C function, `upcasting()`. 

### upcasting rules with scalars

When a 

In [47]:
a = array([1, 2, 3], dtype=int8)
b = a * 555
a *= -555
b.dtype, b, a.dtype, a

(dtype('int16'),
 array([ 555, 1110, 1665], dtype=int16),
 dtype('int8'),
 array([-43, -86, 127], dtype=int8))

## Binary operations

In the standard binary operations, the operands are either two `ndarray`s or an `ndarray`, and a number. From the C standpoint, these operations are probably the most difficult: the problem is that two operands, each with 5 possible C types are added, multiplied, subtracted, or divided, hence making the number of possible combinations large. In order to mitigate the situation, we make use of macros: this would make most of the code type-agnostic. 

Also, when an operation involves a scalar, and an `ndarray`, we will turn the scalar into an `ndarray` of length 1. In this way, we can reduce the code size of the binary handler by almost a factor of two.

In [758]:
%%micropython -unix 1

from ulab import ndarray, float

a = ndarray([1, 2, 3], dtype=float)
print(a + a)
print(a * 5.0)
print(a / 2)
print(a - 10)

ndarray([2.0, 4.0, 6.0], dtype=float)
ndarray([5.0, 10.0, 15.0], dtype=float)
ndarray([0.5, 1.0, 1.5], dtype=float)
ndarray([-9.0, -8.0, -7.0], dtype=float)




### in-place operators

In-place operators preserve the type of the array's type. Here are a couple of caveats:

1. overflow obviously occurs
2. float can be added only to float type
3. true divide fails, except when the array is of type float

In [397]:
a = array([1, 2, 3, 40], dtype=uint8)
a += 220
a

array([221, 222, 223,   4], dtype=uint8)

In [400]:
a = array([1, 2, 3, 40], dtype=int8)
a += 220
a

array([-35, -34, -33,   4], dtype=int8)

In [403]:
a = array([1, 2, 3, 40], dtype=uint16)
a += 220
a

array([221, 222, 223, 260], dtype=uint16)

In [404]:
a = array([1, 2, 3, 40], dtype=int16)
a += 220
a

array([221, 222, 223, 260], dtype=int16)

In [405]:
a = array([1, 2, 3, 40], dtype=float)
a += 220
a

array([221., 222., 223., 260.])

In [406]:
a = array([1, 2, 3, 40], dtype=uint8)
a += 220.0
a

TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('uint8') with casting rule 'same_kind'

In [407]:
a = array([1, 2, 3, 40], dtype=uint8)
a /= 22
a

TypeError: No loop matching the specified signature and casting
was found for ufunc true_divide

In [408]:
a = array([1, 2, 3, 40], dtype=float)
a += 220
a

array([221., 222., 223., 260.])

In [413]:
a = array([1, 2, 3, 4], dtype=int8)
b = array([5, 10, 15, 20], dtype=float)
a /= b
a

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'b') according to the casting rule ''same_kind''

In [414]:
a = array([1, 2, 3, 4], dtype=int8)
b = array([5, 10, 15, 20], dtype=int8)
a /= b
a

TypeError: No loop matching the specified signature and casting
was found for ufunc true_divide

In [419]:
a = array([1, 2, 3, 4], dtype=int8)
b = array([5, 10, 15, 100], dtype=int16)
a *= b
a

array([   5,   20,   45, -112], dtype=int8)

In [424]:
a = array([1, 2, 3, 4], dtype=int8)
a **= 2
a

array([ 1,  4,  9, 16], dtype=int8)

### Comparison operators

These return list(s) of Booleans.

In [762]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([1, 2, 3, 4, 5, 6, 7, 8])
a.reshape((1, 8))
print(a < 4)
print(a <= 4)
print(a > 4)
print(a >= 4)

[True, True, True, False, False, False, False, False]
[True, True, True, True, False, False, False, False]
[False, False, False, False, True, True, True, True]
[False, False, False, True, True, True, True, True]




In [763]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([[1, 2, 3, 4], [5, 6, 7, 8]])
print(a < 4)
print(a <= 4)
print(a > 4)
print(a >= 4)

[[True, True, True, False], [False, False, False, False]]
[[True, True, True, True], [False, False, False, False]]
[[False, False, False, False], [True, True, True, True]]
[[False, False, False, True], [True, True, True, True]]




In [764]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([1, 2, 3, 4, 4, 6, 7, 8])
a.transpose()
b = ulab.ndarray([8, 7, 6, 5, 4, 3, 2, 1])
b.transpose()
print(a < b)
print(a <= b)
print(a > b)
print(a >= b)

[True, True, True, True, False, False, False, False]
[True, True, True, True, True, False, False, False]
[False, False, False, False, False, True, True, True]
[False, False, False, False, True, True, True, True]




In [765]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([[1, 2, 3, 4], [5, 6, 7, 8]])
b = ulab.ndarray([[8, 7, 1, 1], [4, 3, 2, 1]], dtype=ulab.int8)
print(a < b)
print(a <= b)
print(a > b)
print(a >= b)

[[True, True, False, False], [False, False, False, False]]
[[True, True, False, False], [False, False, False, False]]
[[False, False, True, True], [True, True, True, True]]
[[False, False, True, True], [True, True, True, True]]




### Simple running weighted average

With the subscription tools, a weighted running average can very easily be implemented as follows:

In [36]:
%%micropython -unix 1

from ulab import ndarray, mean, roll

# These are the weights; the last entry is the most dominant
weight = ndarray([1, 2, 3, 4, 5]) 

# initial array of samples
samples = ndarray([0]*5)

for i in range(5):
    # a new datum is inserted on the right hand side. This simply overwrites whatever was in the last slot
    samples[-1] = 2
    print(mean(samples*weight))
    # the data are shifted by one position to the left
    roll(samples, 1)

2.0
3.6
4.8
5.6
6.0




## Unary operators

At the moment, only `len` is implemented, which returns the number of elements for one-dimensional arrays, and the length of the first axis for matrices. One should consider other possibilities.

In [39]:
%%micropython -unix 1

from ulab import ndarray

# initialise an array
a = ndarray(range(10))
print('1D array: ', a)

# print out the array's length
print('length of array: ', len(a))

1D array:  ndarray([0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0], dtype=float)
length of array:  10




In [119]:
%%micropython -unix 1

from ulab import ndarray

# initialise a matrix
a = ndarray([range(10), range(10), range(10)])
print('2D array: \n', a)

# print out the array's elements, one by one
print('length of array: ', len(a))

2D array: 
 ndarray([[0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0],
	 [0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0],
	 [0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0]], dtype=float)
length of array:  3




## Class methods: shape, size, rawsize, flatten

In [221]:
%%micropython -unix 1

from ulab import ndarray

# initialise an array
a = ndarray(range(10))
print('1D array: ', a)

# print out the shape
print('shape: ', a.shape())

#print out the size
print('size 0: ', a.size(0), '\nsize 1: ', a.size(1), '\nsize 2: ', a.size(2))

#print out the raw size
print('raw size: ', a.rawsize())

# initialise a matrix
a = ndarray([range(10), range(10), range(10)])
print('\n2D array: \n', a)

# print out the shape
print('shape: ', a.shape())

#print out the size
print('size 0: ', a.size(0), '\nsize 1: ', a.size(1), '\nsize 2: ', a.size(2))

#print out the raw size
print('raw size: ', a.rawsize())

#flattened array
a = ndarray([range(3), range(3), range(3)])
print('\n2D array: \n', a)
print('flattened array: (C)', a.flatten(order='C'))
print('flattened array: (F)', a.flatten(order='F'))

1D array:  ndarray([0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0], dtype=float)
shape:  (10, 1)
size 0:  10 
size 1:  10 
size 2:  1
raw size:  (10, 1, 40, 10, 4)

2D array: 
 ndarray([[0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0],
	 [0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0],
	 [0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0]], dtype=float)
shape:  (3, 10)
size 0:  30 
size 1:  3 
size 2:  10
raw size:  (3, 10, 120, 30, 4)

2D array: 
 ndarray([[0.0, 1.0, 2.0],
	 [0.0, 1.0, 2.0],
	 [0.0, 1.0, 2.0]], dtype=float)
flattened array: (C) ndarray([0.0, 1.0, 2.0, 0.0, 1.0, 2.0, 0.0, 1.0, 2.0], dtype=float)
flattened array: (F) ndarray([0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0], dtype=float)




## ndarray.h

In [5]:
%%ccode ndarray.h

#ifndef _NDARRAY_
#define _NDARRAY_

#include "py/objarray.h"
#include "py/binary.h"
#include "py/objstr.h"
#include "py/objlist.h"

#define PRINT_MAX  10

#if MICROPY_FLOAT_IMPL == MICROPY_FLOAT_IMPL_FLOAT
#define FLOAT_TYPECODE 'f'
#elif MICROPY_FLOAT_IMPL == MICROPY_FLOAT_IMPL_DOUBLE
#define FLOAT_TYPECODE 'd'
#endif

extern const mp_obj_type_t ulab_ndarray_type;

enum NDARRAY_TYPE {
    NDARRAY_UINT8 = 'B',
    NDARRAY_INT8 = 'b',
    NDARRAY_UINT16 = 'H', 
    NDARRAY_INT16 = 'h',
    NDARRAY_FLOAT = FLOAT_TYPECODE,
};

typedef struct _ndarray_obj_t {
    mp_obj_base_t base;
    size_t m, n;
    size_t len;
    mp_obj_array_t *array;
    size_t bytes;
} ndarray_obj_t;

mp_obj_t mp_obj_new_ndarray_iterator(mp_obj_t , size_t , mp_obj_iter_buf_t *);

mp_float_t ndarray_get_float_value(void *, uint8_t , size_t );
void fill_array_iterable(mp_float_t *, mp_obj_t );

void ndarray_print_row(const mp_print_t *, mp_obj_array_t *, size_t , size_t );
void ndarray_print(const mp_print_t *, mp_obj_t , mp_print_kind_t );
void ndarray_assign_elements(mp_obj_array_t *, mp_obj_t , uint8_t , size_t *);
ndarray_obj_t *create_new_ndarray(size_t , size_t , uint8_t );

mp_obj_t ndarray_copy(mp_obj_t );
mp_obj_t ndarray_make_new(const mp_obj_type_t *, size_t , size_t , const mp_obj_t *);
mp_obj_t ndarray_subscr(mp_obj_t , mp_obj_t , mp_obj_t );
mp_obj_t ndarray_getiter(mp_obj_t , mp_obj_iter_buf_t *);
mp_obj_t ndarray_binary_op(mp_binary_op_t , mp_obj_t , mp_obj_t );
mp_obj_t ndarray_unary_op(mp_unary_op_t , mp_obj_t );

mp_obj_t ndarray_shape(mp_obj_t );
mp_obj_t ndarray_rawsize(mp_obj_t );
mp_obj_t ndarray_flatten(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t ndarray_asbytearray(mp_obj_t );

#define CREATE_SINGLE_ITEM(outarray, type, typecode, value) do {\
    ndarray_obj_t *tmp = create_new_ndarray(1, 1, (typecode));\
    type *tmparr = (type *)tmp->array->items;\
    tmparr[0] = (type)(value);\
    (outarray) = MP_OBJ_FROM_PTR(tmp);\
} while(0)

/*  
    mp_obj_t row = mp_obj_new_list(n, NULL);
    mp_obj_list_t *row_ptr = MP_OBJ_TO_PTR(row);
    
    should work outside the loop, but it doesn't. Go figure! 
*/

#define RUN_BINARY_LOOP(typecode, type_out, type_left, type_right, ol, or, op) do {\
    type_left *left = (type_left *)(ol)->array->items;\
    type_right *right = (type_right *)(or)->array->items;\
    uint8_t inc = 0;\
    if((or)->array->len > 1) inc = 1;\
    if(((op) == MP_BINARY_OP_ADD) || ((op) == MP_BINARY_OP_SUBTRACT) || ((op) == MP_BINARY_OP_MULTIPLY)) {\
        ndarray_obj_t *out = create_new_ndarray(ol->m, ol->n, typecode);\
        type_out *(odata) = (type_out *)out->array->items;\
        if((op) == MP_BINARY_OP_ADD) { for(size_t i=0, j=0; i < (ol)->array->len; i++, j+=inc) odata[i] = left[i] + right[j];}\
        if((op) == MP_BINARY_OP_SUBTRACT) { for(size_t i=0, j=0; i < (ol)->array->len; i++, j+=inc) odata[i] = left[i] - right[j];}\
        if((op) == MP_BINARY_OP_MULTIPLY) { for(size_t i=0, j=0; i < (ol)->array->len; i++, j+=inc) odata[i] = left[i] * right[j];}\
        return MP_OBJ_FROM_PTR(out);\
    } else if((op) == MP_BINARY_OP_TRUE_DIVIDE) {\
        ndarray_obj_t *out = create_new_ndarray(ol->m, ol->n, NDARRAY_FLOAT);\
        mp_float_t *odata = (mp_float_t *)out->array->items;\
        for(size_t i=0, j=0; i < (ol)->array->len; i++, j+=inc) odata[i] = (mp_float_t)left[i]/(mp_float_t)right[j];\
        return MP_OBJ_FROM_PTR(out);\
    } else if(((op) == MP_BINARY_OP_LESS) || ((op) == MP_BINARY_OP_LESS_EQUAL) ||  \
             ((op) == MP_BINARY_OP_MORE) || ((op) == MP_BINARY_OP_MORE_EQUAL)) {\
        mp_obj_t out_list = mp_obj_new_list(0, NULL);\
        size_t m = (ol)->m, n = (ol)->n;\
        for(size_t i=0, r=0; i < m; i++, r+=inc) {\
            mp_obj_t row = mp_obj_new_list(n, NULL);\
            mp_obj_list_t *row_ptr = MP_OBJ_TO_PTR(row);\
            for(size_t j=0, s=0; j < n; j++, s+=inc) {\
                row_ptr->items[j] = mp_const_false;\
                if((op) == MP_BINARY_OP_LESS) {\
                    if(left[i*n+j] < right[r*n+s]) row_ptr->items[j] = mp_const_true;\
                } else if((op) == MP_BINARY_OP_LESS_EQUAL) {\
                    if(left[i*n+j] <= right[r*n+s]) row_ptr->items[j] = mp_const_true;\
                } else if((op) == MP_BINARY_OP_MORE) {\
                    if(left[i*n+j] > right[r*n+s]) row_ptr->items[j] = mp_const_true;\
                } else if((op) == MP_BINARY_OP_MORE_EQUAL) {\
                    if(left[i*n+j] >= right[r*n+s]) row_ptr->items[j] = mp_const_true;\
                }\
            }\
            if(m == 1) return row;\
            mp_obj_list_append(out_list, row);\
        }\
        return out_list;\
    }\
} while(0)

#endif

written 4898 bytes to ndarray.h


## ndarray.c

In [632]:
%%ccode ndarray.c

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "py/runtime.h"
#include "py/binary.h"
#include "py/obj.h"
#include "py/objtuple.h"
#include "ndarray.h"

// This function is copied verbatim from objarray.c
STATIC mp_obj_array_t *array_new(char typecode, size_t n) {
    int typecode_size = mp_binary_get_size('@', typecode, NULL);
    mp_obj_array_t *o = m_new_obj(mp_obj_array_t);
    // this step could probably be skipped: we are never going to store a bytearray per se
    #if MICROPY_PY_BUILTINS_BYTEARRAY && MICROPY_PY_ARRAY
    o->base.type = (typecode == BYTEARRAY_TYPECODE) ? &mp_type_bytearray : &mp_type_array;
    #elif MICROPY_PY_BUILTINS_BYTEARRAY
    o->base.type = &mp_type_bytearray;
    #else
    o->base.type = &mp_type_array;
    #endif
    o->typecode = typecode;
    o->free = 0;
    o->len = n;
    o->items = m_new(byte, typecode_size * o->len);
    return o;
}

mp_float_t ndarray_get_float_value(void *data, uint8_t typecode, size_t index) {
    if(typecode == NDARRAY_UINT8) {
        return (mp_float_t)((uint8_t *)data)[index];
    } else if(typecode == NDARRAY_INT8) {
        return (mp_float_t)((int8_t *)data)[index];
    } else if(typecode == NDARRAY_UINT16) {
        return (mp_float_t)((uint16_t *)data)[index];
    } else if(typecode == NDARRAY_INT16) {
        return (mp_float_t)((int16_t *)data)[index];
    } else {
        return (mp_float_t)((mp_float_t *)data)[index];
    }
}

void fill_array_iterable(mp_float_t *array, mp_obj_t iterable) {
    mp_obj_iter_buf_t x_buf;
    mp_obj_t x_item, x_iterable = mp_getiter(iterable, &x_buf);
    size_t i=0;
    while ((x_item = mp_iternext(x_iterable)) != MP_OBJ_STOP_ITERATION) {
        array[i] = (mp_float_t)mp_obj_get_float(x_item);
        i++;
    }
}

void ndarray_print_row(const mp_print_t *print, mp_obj_array_t *data, size_t n0, size_t n) {
    mp_print_str(print, "[");
    size_t i;
    if(n < PRINT_MAX) { // if the array is short, print everything
        mp_obj_print_helper(print, mp_binary_get_val_array(data->typecode, data->items, n0), PRINT_REPR);
        for(i=1; i<n; i++) {
            mp_print_str(print, ", ");
            mp_obj_print_helper(print, mp_binary_get_val_array(data->typecode, data->items, n0+i), PRINT_REPR);
        }
    } else {
        mp_obj_print_helper(print, mp_binary_get_val_array(data->typecode, data->items, n0), PRINT_REPR);
        for(i=1; i<3; i++) {
            mp_print_str(print, ", ");
            mp_obj_print_helper(print, mp_binary_get_val_array(data->typecode, data->items, n0+i), PRINT_REPR);
        }
        mp_printf(print, ", ..., ");
        mp_obj_print_helper(print, mp_binary_get_val_array(data->typecode, data->items, n0+n-3), PRINT_REPR);
        for(size_t i=1; i<3; i++) {
            mp_print_str(print, ", ");
            mp_obj_print_helper(print, mp_binary_get_val_array(data->typecode, data->items, n0+n-3+i), PRINT_REPR);
        }
    }
    mp_print_str(print, "]");
}

void ndarray_print(const mp_print_t *print, mp_obj_t self_in, mp_print_kind_t kind) {
    (void)kind;
    ndarray_obj_t *self = MP_OBJ_TO_PTR(self_in);
    mp_print_str(print, "array(");
    
    if(self->array->len == 0) {
        mp_print_str(print, "[]");
    } else {
        if((self->m == 1) || (self->n == 1)) {
            ndarray_print_row(print, self->array, 0, self->array->len);
        } else {
            // TODO: add vertical ellipses for the case, when self->m > PRINT_MAX
            mp_print_str(print, "[");
            ndarray_print_row(print, self->array, 0, self->n);
            for(size_t i=1; i < self->m; i++) {
                mp_print_str(print, ",\n\t ");
                ndarray_print_row(print, self->array, i*self->n, self->n);
            }
            mp_print_str(print, "]");
        }
    }
    if(self->array->typecode == NDARRAY_UINT8) {
        mp_print_str(print, ", dtype=uint8)");
    } else if(self->array->typecode == NDARRAY_INT8) {
        mp_print_str(print, ", dtype=int8)");
    } else if(self->array->typecode == NDARRAY_UINT16) {
        mp_print_str(print, ", dtype=uint16)");
    } else if(self->array->typecode == NDARRAY_INT16) {
        mp_print_str(print, ", dtype=int16)");
    } else if(self->array->typecode == NDARRAY_FLOAT) {
        mp_print_str(print, ", dtype=float)");
    }
}

void ndarray_assign_elements(mp_obj_array_t *data, mp_obj_t iterable, uint8_t typecode, size_t *idx) {
    // assigns a single row in the matrix
    mp_obj_t item;
    while ((item = mp_iternext(iterable)) != MP_OBJ_STOP_ITERATION) {
        mp_binary_set_val_array(typecode, data->items, (*idx)++, item);
    }
}

ndarray_obj_t *create_new_ndarray(size_t m, size_t n, uint8_t typecode) {
    // Creates the base ndarray with shape (m, n), and initialises the values to straight 0s
    ndarray_obj_t *ndarray = m_new_obj(ndarray_obj_t);
    ndarray->base.type = &ulab_ndarray_type;
    ndarray->m = m;
    ndarray->n = n;
    mp_obj_array_t *array = array_new(typecode, m*n);
    ndarray->bytes = m * n * mp_binary_get_size('@', typecode, NULL);
    // this should set all elements to 0, irrespective of the of the typecode (all bits are zero)
    // we could, perhaps, leave this step out, and initialise the array only, when needed
    memset(array->items, 0, ndarray->bytes); 
    ndarray->array = array;
    return ndarray;
}

mp_obj_t ndarray_copy(mp_obj_t self_in) {
    // returns a verbatim (shape and typecode) copy of self_in
    ndarray_obj_t *self = MP_OBJ_TO_PTR(self_in);
    ndarray_obj_t *out = create_new_ndarray(self->m, self->n, self->array->typecode);
    memcpy(out->array->items, self->array->items, self->bytes);
    return MP_OBJ_FROM_PTR(out);
}

STATIC uint8_t ndarray_init_helper(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj)} },
        { MP_QSTR_dtype, MP_ARG_KW_ONLY | MP_ARG_INT, {.u_int = NDARRAY_FLOAT } },
    };
    
    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(1, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);
    
    uint8_t dtype = args[1].u_int;
    return dtype;
}

mp_obj_t ndarray_make_new(const mp_obj_type_t *type, size_t n_args, size_t n_kw, const mp_obj_t *args) {
    mp_arg_check_num(n_args, n_kw, 1, 2, true);
    mp_map_t kw_args;
    mp_map_init_fixed_table(&kw_args, n_kw, args + n_args);
    uint8_t dtype = ndarray_init_helper(n_args, args, &kw_args);

    size_t len1, len2=0, i=0;
    mp_obj_t len_in = mp_obj_len_maybe(args[0]);
    if (len_in == MP_OBJ_NULL) {
        mp_raise_ValueError("first argument must be an iterable");
    } else {
        // len1 is either the number of rows (for matrices), or the number of elements (row vectors)
        len1 = MP_OBJ_SMALL_INT_VALUE(len_in);
    }

    // We have to figure out, whether the first element of the iterable is an iterable itself
    // Perhaps, there is a more elegant way of handling this
    mp_obj_iter_buf_t iter_buf1;
    mp_obj_t item1, iterable1 = mp_getiter(args[0], &iter_buf1);
    while ((item1 = mp_iternext(iterable1)) != MP_OBJ_STOP_ITERATION) {
        len_in = mp_obj_len_maybe(item1);
        if(len_in != MP_OBJ_NULL) { // indeed, this seems to be an iterable
            // Next, we have to check, whether all elements in the outer loop have the same length
            if(i > 0) {
                if(len2 != MP_OBJ_SMALL_INT_VALUE(len_in)) {
                    mp_raise_ValueError("iterables are not of the same length");
                }
            }
            len2 = MP_OBJ_SMALL_INT_VALUE(len_in);
            i++;
        }
    }
    // By this time, it should be established, what the shape is, so we can now create the array
    ndarray_obj_t *self = create_new_ndarray((len2 == 0) ? 1 : len1, (len2 == 0) ? len1 : len2, dtype);
    iterable1 = mp_getiter(args[0], &iter_buf1);
    i = 0;
    if(len2 == 0) { // the first argument is a single iterable
        ndarray_assign_elements(self->array, iterable1, dtype, &i);
    } else {
        mp_obj_iter_buf_t iter_buf2;
        mp_obj_t iterable2; 

        while ((item1 = mp_iternext(iterable1)) != MP_OBJ_STOP_ITERATION) {
            iterable2 = mp_getiter(item1, &iter_buf2);
            ndarray_assign_elements(self->array, iterable2, dtype, &i);
        }
    }
    return MP_OBJ_FROM_PTR(self);
}

size_t slice_length(mp_bound_slice_t slice) {
    // TODO: check, whether this is true!
    if(slice.step < 0) {
        slice.step = -slice.step;
        return (slice.start - slice.stop) / slice.step;
    } else {
        return (slice.stop - slice.start) / slice.step;        
    }
}

size_t true_length(mp_obj_t bool_list) {
    // returns the number of Trues in a Boolean list
    // I wonder, wouldn't this be faster, if we looped through bool_list->items instead?
    mp_obj_iter_buf_t iter_buf;
    mp_obj_t item, iterable = mp_getiter(bool_list, &iter_buf);
    size_t trues = 0;
    while((item = mp_iternext(iterable)) != MP_OBJ_STOP_ITERATION) {
        if(!mp_obj_is_type(item, &mp_type_bool)) {
            // numpy seems to be a little bit inconsistent in when an index is considered
            // to be True/False. Bail out immediately, if the items are not True/False
            return 0;
        }
        if(mp_obj_is_true(item)) {
            trues++;
        }
    }
    return trues;
}

mp_bound_slice_t generate_slice(mp_uint_t n, mp_obj_t index) {
    // micropython seems to have difficulties with negative steps
    mp_bound_slice_t slice;
    if(MP_OBJ_IS_TYPE(index, &mp_type_slice)) {
        mp_seq_get_fast_slice_indexes(n, index, &slice);
    } else if(mp_obj_is_int(index)) {
        int32_t _index = mp_obj_get_int(index);
        if(_index < 0) {
            _index += n;
        } 
        if((_index >= n) || (_index < 0)) {
            mp_raise_msg(&mp_type_IndexError, "index is out of bounds");
        }
        slice.start = _index;
        slice.stop = _index + 1;
        slice.step = 1;
    } else {
        mp_raise_msg(&mp_type_IndexError, "indices must be integers, slices, or Boolean lists");
    }
    return slice;
}

mp_bound_slice_t simple_slice(int16_t start, int16_t stop, int16_t step) {
    mp_bound_slice_t slice;
    slice.start = start;
    slice.stop = stop;
    slice.step = step;
    return slice;
}

void insert_binary_value(ndarray_obj_t *ndarray, size_t nd_index, ndarray_obj_t *values, size_t value_index) {
    // there is probably a more elegant implementation...
    mp_obj_t tmp = mp_binary_get_val_array(values->array->typecode, values->array->items, value_index);
    if((values->array->typecode == NDARRAY_FLOAT) && (ndarray->array->typecode != NDARRAY_FLOAT)) {
        // workaround: rounding seems not to work in the arm compiler
        int32_t x = (int32_t)floorf(mp_obj_get_float(tmp)+0.5);
        tmp = mp_obj_new_int(x);
    }
    mp_binary_set_val_array(ndarray->array->typecode, ndarray->array->items, nd_index, tmp); 
}

mp_obj_t insert_slice_list(ndarray_obj_t *ndarray, size_t m, size_t n, 
                            mp_bound_slice_t row, mp_bound_slice_t column, 
                            mp_obj_t row_list, mp_obj_t column_list, 
                            ndarray_obj_t *values) {
    if((m != values->m) && (n != values->n)) {
        if((values->array->len != 1)) { // not a single item
            mp_raise_ValueError("could not broadast input array from shape");
        }
    }
    size_t cindex, rindex;
    // M, and N are used to manipulate how the source index is incremented in the loop
    uint8_t M = 1, N = 1;
    if(values->m == 1) {
        M = 0;
    }
    if(values->n == 1) {
        N = 0;
    }
    
    if(row_list == mp_const_none) { // rows are indexed by a slice
        rindex = row.start;
        if(column_list == mp_const_none) { // columns are indexed by a slice
            for(size_t i=0; i < m; i++) {
                cindex = column.start;
                for(size_t j=0; j < n; j++) {
                    insert_binary_value(ndarray, rindex*ndarray->n+cindex, values, i*M*n+j*N);
                    cindex += column.step;
                }
                rindex += row.step;
            }
        } else { // columns are indexed by a Boolean list
            mp_obj_iter_buf_t column_iter_buf;
            mp_obj_t column_item, column_iterable;
            for(size_t i=0; i < m; i++) {
                column_iterable = mp_getiter(column_list, &column_iter_buf);
                size_t j = 0;
                cindex = 0;
                while((column_item = mp_iternext(column_iterable)) != MP_OBJ_STOP_ITERATION) {
                    if(mp_obj_is_true(column_item)) {
                        insert_binary_value(ndarray, rindex*ndarray->n+cindex, values, i*M*n+j*N);
                        j++;
                    }
                    cindex++;
                }
                rindex += row.step;
            }
        }
    } else { // rows are indexed by a Boolean list
        mp_obj_iter_buf_t row_iter_buf;
        mp_obj_t row_item, row_iterable;
        row_iterable = mp_getiter(row_list, &row_iter_buf);
        size_t i = 0;
        rindex = 0;
        if(column_list == mp_const_none) { // columns are indexed by a slice
            while((row_item = mp_iternext(row_iterable)) != MP_OBJ_STOP_ITERATION) {
                if(mp_obj_is_true(row_item)) {
                    cindex = column.start;
                    for(size_t j=0; j < n; j++) {
                        insert_binary_value(ndarray, rindex*ndarray->n+cindex, values, i*M*n+j*N);
                        cindex += column.step;
                    }
                    i++;
                }
                rindex++;
            } 
        } else { // columns are indexed by a list
            mp_obj_iter_buf_t column_iter_buf;
            mp_obj_t column_item, column_iterable;
            size_t j = 0, cindex = 0;
            while((row_item = mp_iternext(row_iterable)) != MP_OBJ_STOP_ITERATION) {
                if(mp_obj_is_true(row_item)) {
                    column_iterable = mp_getiter(column_list, &column_iter_buf);                   
                    while((column_item = mp_iternext(column_iterable)) != MP_OBJ_STOP_ITERATION) {
                        if(mp_obj_is_true(column_item)) {
                            insert_binary_value(ndarray, rindex*ndarray->n+cindex, values, i*M*n+j*N);
                            j++;
                        }
                        cindex++;
                    }
                    i++;
                }
                rindex++;
            }
        }
    }
    return mp_const_none;
}

mp_obj_t iterate_slice_list(ndarray_obj_t *ndarray, size_t m, size_t n, 
                            mp_bound_slice_t row, mp_bound_slice_t column, 
                            mp_obj_t row_list, mp_obj_t column_list, 
                            ndarray_obj_t *values) {
    if((m == 0) || (n == 0)) {
        mp_raise_msg(&mp_type_IndexError, "empty index range");
    }

    if(values != NULL) {
        return insert_slice_list(ndarray, m, n, row, column, row_list, column_list, values);
    }
    uint8_t _sizeof = mp_binary_get_size('@', ndarray->array->typecode, NULL);
    ndarray_obj_t *out = create_new_ndarray(m, n, ndarray->array->typecode);
    uint8_t *target = (uint8_t *)out->array->items;
    uint8_t *source = (uint8_t *)ndarray->array->items;
    size_t cindex, rindex;    
    if(row_list == mp_const_none) { // rows are indexed by a slice
        rindex = row.start;
        if(column_list == mp_const_none) { // columns are indexed by a slice
            for(size_t i=0; i < m; i++) {
                cindex = column.start;
                for(size_t j=0; j < n; j++) {
                    memcpy(target+(i*n+j)*_sizeof, source+(rindex*ndarray->n+cindex)*_sizeof, _sizeof);
                    cindex += column.step;
                }
                rindex += row.step;
            }
        } else { // columns are indexed by a Boolean list
            // TODO: the list must be exactly as long as the axis
            mp_obj_iter_buf_t column_iter_buf;
            mp_obj_t column_item, column_iterable;
            for(size_t i=0; i < m; i++) {
                column_iterable = mp_getiter(column_list, &column_iter_buf);
                size_t j = 0;
                cindex = 0;
                while((column_item = mp_iternext(column_iterable)) != MP_OBJ_STOP_ITERATION) {
                    if(mp_obj_is_true(column_item)) {
                        memcpy(target+(i*n+j)*_sizeof, source+(rindex*ndarray->n+cindex)*_sizeof, _sizeof);
                        j++;
                    }
                    cindex++;
                }
                rindex += row.step;
            }
        }
    } else { // rows are indexed by a Boolean list
        mp_obj_iter_buf_t row_iter_buf;
        mp_obj_t row_item, row_iterable;
        row_iterable = mp_getiter(row_list, &row_iter_buf);
        size_t i = 0;
        rindex = 0;
        if(column_list == mp_const_none) { // columns are indexed by a slice
            while((row_item = mp_iternext(row_iterable)) != MP_OBJ_STOP_ITERATION) {
                if(mp_obj_is_true(row_item)) {
                    cindex = column.start;
                    for(size_t j=0; j < n; j++) {
                        memcpy(target+(i*n+j)*_sizeof, source+(rindex*ndarray->n+cindex)*_sizeof, _sizeof);
                        cindex += column.step;
                    }
                    i++;
                }
                rindex++;
            } 
        } else { // columns are indexed by a list
            mp_obj_iter_buf_t column_iter_buf;
            mp_obj_t column_item, column_iterable;
            size_t j = 0, cindex = 0;
            while((row_item = mp_iternext(row_iterable)) != MP_OBJ_STOP_ITERATION) {
                if(mp_obj_is_true(row_item)) {
                    column_iterable = mp_getiter(column_list, &column_iter_buf);                   
                    while((column_item = mp_iternext(column_iterable)) != MP_OBJ_STOP_ITERATION) {
                        if(mp_obj_is_true(column_item)) {
                            memcpy(target+(i*n+j)*_sizeof, source+(rindex*ndarray->n+cindex)*_sizeof, _sizeof);
                            j++;
                        }
                        cindex++;
                    }
                    i++;
                }
                rindex++;
            }
        }
    }
    return MP_OBJ_FROM_PTR(out);
}

mp_obj_t ndarray_get_slice(ndarray_obj_t *ndarray, mp_obj_t index, ndarray_obj_t *values) {
    mp_bound_slice_t row_slice = simple_slice(0, 0, 1), column_slice = simple_slice(0, 0, 1);

    size_t m = 0, n = 0;
    if(mp_obj_is_int(index) && (ndarray->m == 1) && (values == NULL)) { 
        // we have a row vector, and don't want to assign
        column_slice = generate_slice(ndarray->n, index);
        if(slice_length(column_slice) == 1) { // we were asked for a single item
            // subscribe returns an mp_obj_t, if and only, if the index is an integer, and we have a row vector
            return mp_binary_get_val_array(ndarray->array->typecode, ndarray->array->items, column_slice.start);
        }
    }
    
    if(mp_obj_is_int(index) || MP_OBJ_IS_TYPE(index, &mp_type_slice)) {
        if(ndarray->m == 1) { // we have a row vector
            column_slice = generate_slice(ndarray->n, index);
            row_slice = simple_slice(0, 1, 1);
        } else { // we have a matrix
            row_slice = generate_slice(ndarray->m, index);
            column_slice = simple_slice(0, ndarray->n, 1); // take all columns
        }
        m = slice_length(row_slice);
        n = slice_length(column_slice);
        return iterate_slice_list(ndarray, m, n, row_slice, column_slice, mp_const_none, mp_const_none, values);
    } else if(MP_OBJ_IS_TYPE(index, &mp_type_list)) {
        n = true_length(index);
        if(ndarray->m == 1) { // we have a flat array
            // we might have to separate the n == 1 case
            row_slice = simple_slice(0, 1, 1);
            return iterate_slice_list(ndarray, 1, n, row_slice, column_slice, mp_const_none, index, values);
        } else { // we have a matrix
            return iterate_slice_list(ndarray, 1, n, row_slice, column_slice, mp_const_none, index, values);
        }
    }
    else { // we certainly have a tuple, so let us deal with it
        mp_obj_tuple_t *tuple = MP_OBJ_TO_PTR(index);
        if(tuple->len != 2) {
            mp_raise_msg(&mp_type_IndexError, "too many indices");
        }
        if(!(MP_OBJ_IS_TYPE(tuple->items[0], &mp_type_list) || 
            MP_OBJ_IS_TYPE(tuple->items[0], &mp_type_slice) || 
            mp_obj_is_int(tuple->items[0])) || 
           !(MP_OBJ_IS_TYPE(tuple->items[1], &mp_type_list) || 
            MP_OBJ_IS_TYPE(tuple->items[1], &mp_type_slice) || 
            mp_obj_is_int(tuple->items[1]))) {
                mp_raise_msg(&mp_type_IndexError, "indices must be integers, slices, or Boolean lists");
        }
        if(MP_OBJ_IS_TYPE(tuple->items[0], &mp_type_list)) { // rows are indexed by Boolean list
            m = true_length(tuple->items[0]);
            if(MP_OBJ_IS_TYPE(tuple->items[1], &mp_type_list)) {
                n = true_length(tuple->items[1]);
                return iterate_slice_list(ndarray, m, n, row_slice, column_slice, 
                                          tuple->items[0], tuple->items[1], values);
            } else { // the column is indexed by an integer, or a slice
                column_slice = generate_slice(ndarray->n, tuple->items[1]);
                n = slice_length(column_slice);
                return iterate_slice_list(ndarray, m, n, row_slice, column_slice, 
                                          tuple->items[0], mp_const_none, values);
            }
            
        } else { // rows are indexed by a slice, or an integer
            row_slice = generate_slice(ndarray->m, tuple->items[0]);
            m = slice_length(row_slice);
            if(MP_OBJ_IS_TYPE(tuple->items[1], &mp_type_list)) { // columns are indexed by a Boolean list
                n = true_length(tuple->items[1]);
                return iterate_slice_list(ndarray, m, n, row_slice, column_slice, 
                                         mp_const_none, tuple->items[1], values);
            } else { // columns are indexed by an integer, or a slice
                column_slice = generate_slice(ndarray->n, tuple->items[1]);
                n = slice_length(column_slice);
                return iterate_slice_list(ndarray, m, n, row_slice, column_slice, 
                                          mp_const_none, mp_const_none, values);             
                
            }
        }
    }
}

mp_obj_t ndarray_subscr(mp_obj_t self_in, mp_obj_t index, mp_obj_t value) {
    ndarray_obj_t *self = MP_OBJ_TO_PTR(self_in);
    
    if (value == MP_OBJ_SENTINEL) { // return value(s)
        return ndarray_get_slice(self, index, NULL);    
    } else { // assignment to slices; the value must be an ndarray, or a scalar
        if(!MP_OBJ_IS_TYPE(value, &ulab_ndarray_type) && 
          !mp_obj_is_int(value) && !mp_obj_is_float(value)) {
            mp_raise_ValueError("right hand side must be an ndarray, or a scalar");
        } else {
            ndarray_obj_t *values = NULL;
            if(mp_obj_is_int(value)) {
                values = create_new_ndarray(1, 1, self->array->typecode);
                mp_binary_set_val_array(values->array->typecode, values->array->items, 0, value);   
            } else if(mp_obj_is_float(value)) {
                values = create_new_ndarray(1, 1, NDARRAY_FLOAT);
                mp_binary_set_val_array(NDARRAY_FLOAT, values->array->items, 0, value);
            } else {
                values = MP_OBJ_TO_PTR(value);
            }
            return ndarray_get_slice(self, index, values);
        }
    }      
    return mp_const_none;
}

// itarray iterator

mp_obj_t ndarray_getiter(mp_obj_t o_in, mp_obj_iter_buf_t *iter_buf) {
    return mp_obj_new_ndarray_iterator(o_in, 0, iter_buf);
}

typedef struct _mp_obj_ndarray_it_t {
    mp_obj_base_t base;
    mp_fun_1_t iternext;
    mp_obj_t ndarray;
    size_t cur;
} mp_obj_ndarray_it_t;

mp_obj_t ndarray_iternext(mp_obj_t self_in) {
    mp_obj_ndarray_it_t *self = MP_OBJ_TO_PTR(self_in);
    ndarray_obj_t *ndarray = MP_OBJ_TO_PTR(self->ndarray);
    // TODO: in numpy, ndarrays are iterated with respect to the first axis. 
    size_t iter_end = 0;
    if((ndarray->m == 1)) {
        iter_end = ndarray->array->len;
    } else {
        iter_end = ndarray->m;
    }
    if(self->cur < iter_end) {
        if(ndarray->n == ndarray->array->len) { // we have a linear array
            // read the current value
            mp_obj_t value;
            value = mp_binary_get_val_array(ndarray->array->typecode, ndarray->array->items, self->cur);
            self->cur++;
            return value;
        } else { // we have a matrix, return the number of rows
            ndarray_obj_t *value = create_new_ndarray(1, ndarray->n, ndarray->array->typecode);
            // copy the memory content here
            uint8_t *tmp = (uint8_t *)ndarray->array->items;
            size_t strip_size = ndarray->n * mp_binary_get_size('@', ndarray->array->typecode, NULL);
            memcpy(value->array->items, &tmp[self->cur*strip_size], strip_size);
            self->cur++;
            return value;
        }
    } else {
        return MP_OBJ_STOP_ITERATION;
    }
}

mp_obj_t mp_obj_new_ndarray_iterator(mp_obj_t ndarray, size_t cur, mp_obj_iter_buf_t *iter_buf) {
    assert(sizeof(mp_obj_ndarray_it_t) <= sizeof(mp_obj_iter_buf_t));
    mp_obj_ndarray_it_t *o = (mp_obj_ndarray_it_t*)iter_buf;
    o->base.type = &mp_type_polymorph_iter;
    o->iternext = ndarray_iternext;
    o->ndarray = ndarray;
    o->cur = cur;
    return MP_OBJ_FROM_PTR(o);
}

mp_obj_t ndarray_shape(mp_obj_t self_in) {
    ndarray_obj_t *self = MP_OBJ_TO_PTR(self_in);
    mp_obj_t tuple[2] = {
        mp_obj_new_int(self->m),
        mp_obj_new_int(self->n)
    };
    return mp_obj_new_tuple(2, tuple);
}

mp_obj_t ndarray_rawsize(mp_obj_t self_in) {
    // returns a 5-tuple with the 
    // 
    // 0. number of rows
    // 1. number of columns
    // 2. length of the storage (should be equal to the product of 1. and 2.)
    // 3. length of the data storage in bytes
    // 4. datum size in bytes
    ndarray_obj_t *self = MP_OBJ_TO_PTR(self_in);
    mp_obj_tuple_t *tuple = MP_OBJ_TO_PTR(mp_obj_new_tuple(5, NULL));
    tuple->items[0] = MP_OBJ_NEW_SMALL_INT(self->m);
    tuple->items[1] = MP_OBJ_NEW_SMALL_INT(self->n);
    tuple->items[2] = MP_OBJ_NEW_SMALL_INT(self->array->len);
    tuple->items[3] = MP_OBJ_NEW_SMALL_INT(self->bytes);
    tuple->items[4] = MP_OBJ_NEW_SMALL_INT(mp_binary_get_size('@', self->array->typecode, NULL));
    return tuple;
}

mp_obj_t ndarray_flatten(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_order, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_QSTR(MP_QSTR_C)} },
    };

    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(n_args - 1, pos_args + 1, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);
    mp_obj_t self_copy = ndarray_copy(pos_args[0]);
    ndarray_obj_t *ndarray = MP_OBJ_TO_PTR(self_copy);
    
    GET_STR_DATA_LEN(args[0].u_obj, order, len);    
    if((len != 1) || ((memcmp(order, "C", 1) != 0) && (memcmp(order, "F", 1) != 0))) {
        mp_raise_ValueError("flattening order must be either 'C', or 'F'");        
    }

    // if order == 'C', we simply have to set m, and n, there is nothing else to do
    if(memcmp(order, "F", 1) == 0) {
        ndarray_obj_t *self = MP_OBJ_TO_PTR(pos_args[0]);
        uint8_t _sizeof = mp_binary_get_size('@', self->array->typecode, NULL);
        // get the data of self_in: we won't need a temporary buffer for the transposition
        uint8_t *self_array = (uint8_t *)self->array->items;
        uint8_t *array = (uint8_t *)ndarray->array->items;
        size_t i=0;
        for(size_t n=0; n < self->n; n++) {
            for(size_t m=0; m < self->m; m++) {
                memcpy(array+_sizeof*i, self_array+_sizeof*(m*self->n + n), _sizeof);
                i++;
            }
        }        
    }
    ndarray->n = ndarray->array->len;
    ndarray->m = 1;
    return self_copy;
}

mp_obj_t ndarray_asbytearray(mp_obj_t self_in) {
    ndarray_obj_t *self = MP_OBJ_TO_PTR(self_in);
    return MP_OBJ_FROM_PTR(self->array);
}

// Binary operations

mp_obj_t ndarray_binary_op(mp_binary_op_t op, mp_obj_t lhs, mp_obj_t rhs) {
//    if(op == MP_BINARY_OP_REVERSE_ADD) {
 //       return ndarray_binary_op(MP_BINARY_OP_ADD, rhs, lhs);
  //  }    
    // One of the operands is a scalar
    // TODO: conform to numpy with the upcasting
    // TODO: implement in-place operators
    mp_obj_t RHS = MP_OBJ_NULL;
    bool rhs_is_scalar = true;
    if(mp_obj_is_int(rhs)) {
        int32_t ivalue = mp_obj_get_int(rhs);
        if((ivalue > 0) && (ivalue < 256)) {
            CREATE_SINGLE_ITEM(RHS, uint8_t, NDARRAY_UINT8, ivalue);
        } else if((ivalue > 255) && (ivalue < 65535)) {
            CREATE_SINGLE_ITEM(RHS, uint16_t, NDARRAY_UINT16, ivalue);
        } else if((ivalue < 0) && (ivalue > -128)) {
            CREATE_SINGLE_ITEM(RHS, int8_t, NDARRAY_INT8, ivalue);
        } else if((ivalue < -127) && (ivalue > -32767)) {
            CREATE_SINGLE_ITEM(RHS, int16_t, NDARRAY_INT16, ivalue);
        } else { // the integer value clearly does not fit the ulab types, so move on to float
            CREATE_SINGLE_ITEM(RHS, mp_float_t, NDARRAY_FLOAT, ivalue);
        }
    } else if(mp_obj_is_float(rhs)) {
        mp_float_t fvalue = mp_obj_get_float(rhs);        
        CREATE_SINGLE_ITEM(RHS, mp_float_t, NDARRAY_FLOAT, fvalue);
    } else {
        RHS = rhs;
        rhs_is_scalar = false;
    }
    //else 
    if(mp_obj_is_type(lhs, &ulab_ndarray_type) && mp_obj_is_type(RHS, &ulab_ndarray_type)) { 
        // next, the ndarray stuff
        ndarray_obj_t *ol = MP_OBJ_TO_PTR(lhs);
        ndarray_obj_t *or = MP_OBJ_TO_PTR(RHS);
        if(!rhs_is_scalar && ((ol->m != or->m) || (ol->n != or->n))) {
            mp_raise_ValueError("operands could not be broadcast together");
        }
        // At this point, the operands should have the same shape
        switch(op) {
            case MP_BINARY_OP_EQUAL:
                // Two arrays are equal, if their shape, typecode, and elements are equal
                if((ol->m != or->m) || (ol->n != or->n) || (ol->array->typecode != or->array->typecode)) {
                    return mp_const_false;
                } else {
                    size_t i = ol->bytes;
                    uint8_t *l = (uint8_t *)ol->array->items;
                    uint8_t *r = (uint8_t *)or->array->items;
                    while(i) { // At this point, we can simply compare the bytes, the type is irrelevant
                        if(*l++ != *r++) {
                            return mp_const_false;
                        }
                        i--;
                    }
                    return mp_const_true;
                }
                break;
            case MP_BINARY_OP_LESS:
            case MP_BINARY_OP_LESS_EQUAL:
            case MP_BINARY_OP_MORE:
            case MP_BINARY_OP_MORE_EQUAL:
            case MP_BINARY_OP_ADD:
            case MP_BINARY_OP_SUBTRACT:
            case MP_BINARY_OP_TRUE_DIVIDE:
            case MP_BINARY_OP_MULTIPLY:
                // TODO: I believe, this part can be made significantly smaller (compiled size)
                // by doing only the typecasting in the large ifs, and moving the loops outside
                // These are the upcasting rules
                // float always becomes float
                // operation on identical types preserves type
                // uint8 + int8 => int16
                // uint8 + int16 => int16
                // uint8 + uint16 => uint16
                // int8 + int16 => int16
                // int8 + uint16 => uint16
                // uint16 + int16 => float
                // The parameters of RUN_BINARY_LOOP are 
                // typecode of result, type_out, type_left, type_right, lhs operand, rhs operand, operator
                if(ol->array->typecode == NDARRAY_UINT8) {
                    if(or->array->typecode == NDARRAY_UINT8) {
                        RUN_BINARY_LOOP(NDARRAY_UINT8, uint8_t, uint8_t, uint8_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_INT8) {
                        RUN_BINARY_LOOP(NDARRAY_INT16, int16_t, uint8_t, int8_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_UINT16) {
                        RUN_BINARY_LOOP(NDARRAY_UINT16, uint16_t, uint8_t, uint16_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_INT16) {
                        RUN_BINARY_LOOP(NDARRAY_INT16, int16_t, uint8_t, int16_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_FLOAT) {
                        RUN_BINARY_LOOP(NDARRAY_FLOAT, mp_float_t, uint8_t, mp_float_t, ol, or, op);
                    }
                } else if(ol->array->typecode == NDARRAY_INT8) {
                    if(or->array->typecode == NDARRAY_UINT8) {
                        RUN_BINARY_LOOP(NDARRAY_INT16, int16_t, int8_t, uint8_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_INT8) {
                        RUN_BINARY_LOOP(NDARRAY_INT8, int8_t, int8_t, int8_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_UINT16) {
                        RUN_BINARY_LOOP(NDARRAY_INT16, int16_t, int8_t, uint16_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_INT16) {
                        RUN_BINARY_LOOP(NDARRAY_INT16, int16_t, int8_t, int16_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_FLOAT) {
                        RUN_BINARY_LOOP(NDARRAY_FLOAT, mp_float_t, int8_t, mp_float_t, ol, or, op);
                    }                
                } else if(ol->array->typecode == NDARRAY_UINT16) {
                    if(or->array->typecode == NDARRAY_UINT8) {
                        RUN_BINARY_LOOP(NDARRAY_UINT16, uint16_t, uint16_t, uint8_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_INT8) {
                        RUN_BINARY_LOOP(NDARRAY_UINT16, uint16_t, uint16_t, int8_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_UINT16) {
                        RUN_BINARY_LOOP(NDARRAY_UINT16, uint16_t, uint16_t, uint16_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_INT16) {
                        RUN_BINARY_LOOP(NDARRAY_FLOAT, mp_float_t, uint16_t, int16_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_FLOAT) {
                        RUN_BINARY_LOOP(NDARRAY_FLOAT, mp_float_t, uint8_t, mp_float_t, ol, or, op);
                    }
                } else if(ol->array->typecode == NDARRAY_INT16) {
                    if(or->array->typecode == NDARRAY_UINT8) {
                        RUN_BINARY_LOOP(NDARRAY_INT16, int16_t, int16_t, uint8_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_INT8) {
                        RUN_BINARY_LOOP(NDARRAY_INT16, int16_t, int16_t, int8_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_UINT16) {
                        RUN_BINARY_LOOP(NDARRAY_FLOAT, mp_float_t, int16_t, uint16_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_INT16) {
                        RUN_BINARY_LOOP(NDARRAY_INT16, int16_t, int16_t, int16_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_FLOAT) {
                        RUN_BINARY_LOOP(NDARRAY_FLOAT, mp_float_t, uint16_t, mp_float_t, ol, or, op);
                    }
                } else if(ol->array->typecode == NDARRAY_FLOAT) {
                    if(or->array->typecode == NDARRAY_UINT8) {
                        RUN_BINARY_LOOP(NDARRAY_FLOAT, mp_float_t, mp_float_t, uint8_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_INT8) {
                        RUN_BINARY_LOOP(NDARRAY_FLOAT, mp_float_t, mp_float_t, int8_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_UINT16) {
                        RUN_BINARY_LOOP(NDARRAY_FLOAT, mp_float_t, mp_float_t, uint16_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_INT16) {
                        RUN_BINARY_LOOP(NDARRAY_FLOAT, mp_float_t, mp_float_t, int16_t, ol, or, op);
                    } else if(or->array->typecode == NDARRAY_FLOAT) {
                        RUN_BINARY_LOOP(NDARRAY_FLOAT, mp_float_t, mp_float_t, mp_float_t, ol, or, op);
                    }
                } else { // this should never happen
                    mp_raise_TypeError("wrong input type");
                }
                // this instruction should never be reached, but we have to make the compiler happy
                return MP_OBJ_NULL; 
            default:
                return MP_OBJ_NULL; // op not supported                                                        
        }
    } else {
        mp_raise_TypeError("wrong operand type on the right hand side");
    }
}

mp_obj_t ndarray_unary_op(mp_unary_op_t op, mp_obj_t self_in) {
    ndarray_obj_t *self = MP_OBJ_TO_PTR(self_in);
    ndarray_obj_t *ndarray = NULL;
    switch (op) {
        case MP_UNARY_OP_LEN: 
            if(self->m > 1) {
                return mp_obj_new_int(self->m);
            } else {
                return mp_obj_new_int(self->n);
            }
            break;
        
        case MP_UNARY_OP_INVERT:
            if(self->array->typecode == NDARRAY_FLOAT) {
                mp_raise_ValueError("operation is not supported for given type");
            }
            // we can invert the content byte by byte, there is no need to distinguish 
            // between different typecodes
            ndarray = MP_OBJ_TO_PTR(ndarray_copy(self_in));
            uint8_t *array = (uint8_t *)ndarray->array->items;
            for(size_t i=0; i < self->bytes; i++) array[i] = ~array[i];
            return MP_OBJ_FROM_PTR(ndarray);
            break;
        
        case MP_UNARY_OP_NEGATIVE:
            ndarray = MP_OBJ_TO_PTR(ndarray_copy(self_in));
            if(self->array->typecode == NDARRAY_UINT8) {
                uint8_t *array = (uint8_t *)ndarray->array->items;
                for(size_t i=0; i < self->array->len; i++) array[i] = -array[i];
            } else if(self->array->typecode == NDARRAY_INT8) {
                int8_t *array = (int8_t *)ndarray->array->items;
                for(size_t i=0; i < self->array->len; i++) array[i] = -array[i];
            } else if(self->array->typecode == NDARRAY_UINT16) {                
                uint16_t *array = (uint16_t *)ndarray->array->items;
                for(size_t i=0; i < self->array->len; i++) array[i] = -array[i];
            } else if(self->array->typecode == NDARRAY_INT16) {
                int16_t *array = (int16_t *)ndarray->array->items;
                for(size_t i=0; i < self->array->len; i++) array[i] = -array[i];
            } else {
                mp_float_t *array = (mp_float_t *)ndarray->array->items;
                for(size_t i=0; i < self->array->len; i++) array[i] = -array[i];
            }
            return MP_OBJ_FROM_PTR(ndarray);
            break;

        case MP_UNARY_OP_POSITIVE:
            return ndarray_copy(self_in);

        case MP_UNARY_OP_ABS:
            if((self->array->typecode == NDARRAY_UINT8) || (self->array->typecode == NDARRAY_UINT16)) {
                return ndarray_copy(self_in);
            }
            ndarray = MP_OBJ_TO_PTR(ndarray_copy(self_in));
            if((self->array->typecode == NDARRAY_INT8)) {
                int8_t *array = (int8_t *)ndarray->array->items;
                for(size_t i=0; i < self->array->len; i++) {
                    if(array[i] < 0) array[i] = -array[i];
                }
            } else if((self->array->typecode == NDARRAY_INT16)) {
                int16_t *array = (int16_t *)ndarray->array->items;
                for(size_t i=0; i < self->array->len; i++) {
                    if(array[i] < 0) array[i] = -array[i];
                }
            } else {
                mp_float_t *array = (mp_float_t *)ndarray->array->items;
                for(size_t i=0; i < self->array->len; i++) {
                    if(array[i] < 0) array[i] = -array[i];
                }                
            }
            return MP_OBJ_FROM_PTR(ndarray);
            break;
        default: return MP_OBJ_NULL; // operator not supported
    }
}

written 41199 bytes to ndarray.c


In [92]:
%%micropython -unix 1

import ulab as np

a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]], dtype=np.uint8)
b = a[:1:-2]
print(a[-2])
print(b.shape())
print(b[-1])

array([9, 10, 11, 12], dtype=uint8)
(1, 4)
16




In [100]:
%%micropython -unix 1

import ulab as np

a = np.zeros((5, 5))
a[1:3, 1:3] = np.array([3.0, 4.0])
print(a, a[1])

array([[0.0, 0.0, 0.0, 0.0, 0.0],
	 [0.0, 3.0, 4.0, 0.0, 0.0],
	 [0.0, 3.0, 4.0, 0.0, 0.0],
	 [0.0, 0.0, 0.0, 0.0, 0.0],
	 [0.0, 0.0, 0.0, 0.0, 0.0]], dtype=float) array([0.0, 3.0, 4.0, 0.0, 0.0], dtype=float)




In [94]:
%%micropython -unix 1

import ulab as np

a = np.zeros(8)
print(a)

for i in range(8):
    a[i] = i
    
# a[0] = 123
print(a, a[3])

array([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], dtype=float)
array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0], dtype=float) 3.0




In [None]:
a = ones((10,3))
a[2:7:2] = zeros(3)
print(a)
a = ones((4, 4))
a[1] = 0
print(a)

In [1272]:
%%micropython -unix 1


import ulab as np

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=np.float)
print(a[2:7:2])
a[2:7:2] = 13.0
print(a)

print(a[a > 3])
a = np.ones((10,3))
a[2:7:2] = np.zeros(3)
print(a)

a = np.ones((10,3))
a[2:7:2] = 2.0
print(a)

array([3.0, 5.0, 7.0], dtype=float)
array([1.0, 2.0, 13.0, 4.0, 13.0, 6.0, 13.0, 8.0, 9.0], dtype=float)
array([13.0, 4.0, 13.0, 6.0, 13.0, 8.0, 9.0], dtype=float)
array([[1.0, 1.0, 1.0],
	 [1.0, 1.0, 1.0],
	 [0.0, 0.0, 0.0],
	 [1.0, 1.0, 1.0],
	 [0.0, 0.0, 0.0],
	 [1.0, 1.0, 1.0],
	 [0.0, 0.0, 0.0],
	 [1.0, 1.0, 1.0],
	 [1.0, 1.0, 1.0],
	 [1.0, 1.0, 1.0]], dtype=float)
array([[1.0, 1.0, 1.0],
	 [1.0, 1.0, 1.0],
	 [2.0, 2.0, 2.0],
	 [1.0, 1.0, 1.0],
	 [2.0, 2.0, 2.0],
	 [1.0, 1.0, 1.0],
	 [2.0, 2.0, 2.0],
	 [1.0, 1.0, 1.0],
	 [1.0, 1.0, 1.0],
	 [1.0, 1.0, 1.0]], dtype=float)




In [1135]:
a = array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int8)
a[:8:2] = array([10, 20, 30, 40], dtype=float)
a

array([10,  1, 20,  3, 30,  5, 40,  7,  8,  9], dtype=int8)

In [1138]:
a = array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
a[1, :2]

array([5, 6])

In [438]:
a = array([1, 2, 3, 4])
a < a[2]

array([ True,  True, False, False])

In [973]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([1, -2, 3], dtype=ulab.int8)

print(abs(a))

a = ulab.ndarray([1, 2, 3], dtype=ulab.uint8)
print(~a)

a = ulab.ndarray([1, 2, 3], dtype=ulab.int8)
print(~a)

ndarray([1, 2, 3], dtype=int8)
ndarray([254, 253, 252], dtype=uint8)
ndarray([-2, -3, -4], dtype=int8)




In [999]:
a = array([1, -2, 3], dtype=int8)
print(-a, +a)

a = array([1, 2, 3], dtype=uint8)
print(-a, +a)

a = array([1, 2, -3], dtype=float)
print(-a, +a)

[-1  2 -3] [ 1 -2  3]
[255 254 253] [1 2 3]
[-1. -2.  3.] [ 1.  2. -3.]


In [1003]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([1, -2, 3], dtype=ulab.int8)
print(-a, +a)

a = ulab.ndarray([1, 2, 3], dtype=ulab.uint8)
print(-a, +a)

a = ulab.ndarray([1, 2, -3], dtype=ulab.float)
print(-a, +a)

ndarray([-1, 2, -3], dtype=int8) ndarray([1, -2, 3], dtype=int8)
ndarray([255, 254, 253], dtype=uint8) ndarray([1, 2, 3], dtype=uint8)
ndarray([-1.0, -2.0, 3.0], dtype=float) ndarray([1.0, 2.0, -3.0], dtype=float)




In [917]:
a = array([1, 2, 3], dtype=int8)

print(~a)

[-2 -3 -4]


In [151]:
a = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [164]:
a[0, 1]

2

# Linear algebra

This module contains very basic matrix operators, such as transposing, reshaping, inverting, and matrix multiplication. The actual inversion is factored out into a helper function, so that the routine can be re-used in other modules. (The `polyfit` function in `poly.c` uses that.) Also note that inversion is based on the notion of a *small number* (epsilon). During the computation of the inverse, a number is treated as 0, if its absolute value is smaller than epsilon. This precaution is required, otherwise, one might run into singular matrices. 

As in `numpy`, `inv` is not a class method, but it should be applied only on `ndarray`s. This is why one has to check the argument type at the beginning of the functions.

## Examples

### Transpose of one- and two-dimensional arrays, .transpose()

In [117]:
%%micropython -unix 1

from ulab import ndarray

a = ndarray(range(10))
print('1D array: ', a)
print('shape of a: ', a.shape())

a.transpose()
print('\ntranspose of array: ', a)
print('shape of a: ', a.shape())


a = ndarray([[1, 2, 3, 4], [5, 6, 7, 8]])
print('\n2D array: \n', a)
print('shape of a: ', a.shape())

a.transpose()
print('\ntranspose of array: \n', a)
print('shape of a: ', a.shape())

1D array:  ndarray([0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0], dtype=float)
shape of a:  (10, 1)

transpose of array:  ndarray([0.0, 1.0, 2.0, ..., 7.0, 8.0, 9.0], dtype=float)
shape of a:  (1, 10)

2D array: 
 ndarray([[1.0, 2.0, 3.0, 4.0],
	 [5.0, 6.0, 7.0, 8.0]], dtype=float)
shape of a:  (2, 4)

transpose of array: 
 ndarray([[1.0, 5.0],
	 [2.0, 6.0],
	 [3.0, 7.0],
	 [4.0, 8.0]], dtype=float)
shape of a:  (4, 2)




### .reshape()

In [116]:
%%micropython -unix 1

from ulab import ndarray

a = ndarray(range(15))
print('1D array: ', a)
print('shape of a: ', a.shape())

a.reshape((3, 5))
print('\n2D array: \n', a)
print('shape of a: ', a.shape())

1D array:  ndarray([0.0, 1.0, 2.0, ..., 12.0, 13.0, 14.0], dtype=float)
shape of a:  (15, 1)

2D array: 
 ndarray([[0.0, 1.0, 2.0, 3.0, 4.0],
	 [5.0, 6.0, 7.0, 8.0, 9.0],
	 [10.0, 11.0, 12.0, 13.0, 14.0]], dtype=float)
shape of a:  (3, 5)




### inverse of a matrix (inv)

In [192]:
%%micropython -unix 1

from ulab import ndarray, inv

a = ndarray([[1, 2], [3, 4]])
print('2D matrix (a): \n', a)
b = inv(a)
print('\ninverse of a: \n', b)

2D matrix (a): 
 ndarray([[1.0, 2.0],
	 [3.0, 4.0]], dtype=float)

inverse of a: 
 ndarray([[-2.0, 1.0],
	 [1.5, -0.5]], dtype=float)




### matrix multiplication (dot)

With the `dot` function, we can now check, whether the inverse of the matrix was correct:

In [216]:
%%micropython -unix 1

from ulab import ndarray, inv, dot


a = ndarray([[1, 2], [3, 4]])
print('2D matrix (a): \n', a)
b = inv(a)
print('\ninverse of a: \n', b)

c = dot(a, b)
print('\na multiplied by its inverse: \n', c)

2D matrix (a): 
 ndarray([[1.0, 2.0],
	 [3.0, 4.0]], dtype=float)

inverse of a: 
 ndarray([[-2.0, 1.0],
	 [1.5, -0.5]], dtype=float)

a multiplied by its inverse: 
 ndarray([[1.0, 0.0],
	 [0.0, 1.0]], dtype=float)




### zeros, ones, eye

In [366]:
%%micropython -unix 1

import ulab

print(ulab.zeros(3, dtype=ulab.int16))
print(ulab.zeros((5, 3), dtype=ulab.float))

print("\n====================\n");
print(ulab.ones(3, dtype=ulab.int16))
print(ulab.ones((5, 3), dtype=ulab.float))

print("\n====================\n");
print(ulab.eye(5, dtype=ulab.int16))
print(ulab.eye(5, M=3, dtype=ulab.float))

print(ulab.eye(5, k=1, dtype=ulab.uint8))
print(ulab.eye(5, k=-3, dtype=ulab.uint8))

ndarray([0, 0, 0], dtype=int16)
ndarray([[0.0, 0.0, 0.0],
	 [0.0, 0.0, 0.0],
	 [0.0, 0.0, 0.0],
	 [0.0, 0.0, 0.0],
	 [0.0, 0.0, 0.0]], dtype=float)


ndarray([1, 1, 1], dtype=int16)
ndarray([[1.0, 1.0, 1.0],
	 [1.0, 1.0, 1.0],
	 [1.0, 1.0, 1.0],
	 [1.0, 1.0, 1.0],
	 [1.0, 1.0, 1.0]], dtype=float)


ndarray([[1, 0, 0, 0, 0],
	 [0, 1, 0, 0, 0],
	 [0, 0, 1, 0, 0],
	 [0, 0, 0, 1, 0],
	 [0, 0, 0, 0, 1]], dtype=int16)
ndarray([[1.0, 0.0, 0.0, 0.0, 0.0],
	 [0.0, 1.0, 0.0, 0.0, 0.0],
	 [0.0, 0.0, 1.0, 0.0, 0.0]], dtype=float)
ndarray([[0, 1, 0, 0, 0],
	 [0, 0, 1, 0, 0],
	 [0, 0, 0, 1, 0],
	 [0, 0, 0, 0, 1],
	 [0, 0, 0, 0, 0]], dtype=uint8)
ndarray([[0, 0, 0, 0, 0],
	 [0, 0, 0, 0, 0],
	 [0, 0, 0, 0, 0],
	 [1, 0, 0, 0, 0],
	 [0, 1, 0, 0, 0]], dtype=uint8)




## eig

A decent description of the Jacobi method can be found in http://fourier.eng.hmc.edu/e176/lectures/ch1/node1.html, otherwise, https://en.wikipedia.org/wiki/Jacobi_eigenvalue_algorithm is also useful.

In [1989]:
%%micropython -unix 1

import ulab as np

a = np.array([[1, 2, 1, 4], [2, 5, 3, 5], [1, 3, 6, 1], [4, 5, 1, 7]], dtype=np.uint8)
# a = np.array([[1, 1, 1, 1], [1, 5, 5, 5], [1, 5, 3, 2], [1, 5, 2, 3]], dtype=np.uint8)
# a = np.array([[1, 5.5, 1], [5.5, 16, 1], [1, 1, 5.5]])
# a = np.array([[3, 2], [2, 1]])
x, y = np.eig(a)
print(x)
print(y)

array([-1.165288686752319, 0.8029366731643677, 5.585625648498535, 13.77672672271729], dtype=float)
array([[0.8151530027389526, -0.4499613046646118, -0.1644472032785416, 0.3256030678749084],
	 [0.2211322486400604, 0.7846922278404236, 0.08364589512348175, 0.5730286836624146],
	 [-0.1339886337518692, -0.3100103437900543, 0.8743090033531189, 0.3486031591892242],
	 [-0.5183368921279907, -0.292722225189209, -0.4489364922046661, 0.6664056777954102]], dtype=float)




In [1983]:
a = np.array([[1, 2, 1, 4], [2, 5, 3, 5], [1, 3, 6, 1], [4, 5, 1, 7]], dtype=np.uint8)
# a = np.array([[1, 1, 1, 1], [1, 5, 5, 5], [1, 5, 3, 2], [1, 5, 2, 3]], dtype=np.uint8)

# a = array([[1, 5], [5, 1]])
# a = np.array([[1, 5.5, 1], [5.5, 16, 1], [1, 1, 5.5]])

print(a)
eig(a)

[[1 2 1 4]
 [2 5 3 5]
 [1 3 6 1]
 [4 5 1 7]]


(array([13.77672606, -1.16528837,  0.80293655,  5.58562576]),
 array([[ 0.32561419,  0.815156  ,  0.44994112, -0.16446602],
        [ 0.57300777,  0.22113342, -0.78469926,  0.08372081],
        [ 0.34861093, -0.13401142,  0.31007764,  0.87427868],
        [ 0.66641421, -0.51832581,  0.29266348, -0.44897499]]))

## linalg.h

In [362]:
%%ccode linalg.h

#ifndef _LINALG_
#define _LINALG_

#include "ndarray.h"

#define SWAP(t, a, b) { t tmp = a; a = b; b = tmp; }

#if MICROPY_FLOAT_IMPL == MICROPY_FLOAT_IMPL_FLOAT
#define epsilon        1.2e-7
#elif MICROPY_FLOAT_IMPL == MICROPY_FLOAT_IMPL_DOUBLE
#define epsilon        2.3e-16
#endif

#define JACOBI_MAX     20

mp_obj_t linalg_transpose(mp_obj_t );
mp_obj_t linalg_reshape(mp_obj_t , mp_obj_t );
mp_obj_t linalg_size(size_t , const mp_obj_t *, mp_map_t *);
bool linalg_invert_matrix(mp_float_t *, size_t );
mp_obj_t linalg_inv(mp_obj_t );
mp_obj_t linalg_dot(mp_obj_t , mp_obj_t );
mp_obj_t linalg_zeros(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t linalg_ones(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t linalg_eye(size_t , const mp_obj_t *, mp_map_t *);

mp_obj_t linalg_det(mp_obj_t );
mp_obj_t linalg_eig(mp_obj_t );

#endif

written 1019 bytes to linalg.h


## linalg.c

In [22]:
%%ccode linalg.c

#include <stdlib.h>
#include <string.h>
#include <math.h>
#include "py/obj.h"
#include "py/runtime.h"
#include "py/misc.h"
#include "linalg.h"

mp_obj_t linalg_transpose(mp_obj_t self_in) {
    ndarray_obj_t *self = MP_OBJ_TO_PTR(self_in);
    // the size of a single item in the array
    uint8_t _sizeof = mp_binary_get_size('@', self->array->typecode, NULL);
    
    // NOTE: 
    // if the matrices are square, we can simply swap items, but 
    // generic matrices can't be transposed in place, so we have to 
    // declare a temporary variable
    
    // NOTE: 
    //  In the old matrix, the coordinate (m, n) is m*self->n + n
    //  We have to assign this to the coordinate (n, m) in the new 
    //  matrix, i.e., to n*self->m + m (since the new matrix has self->m columns)
    
    // one-dimensional arrays can be transposed by simply swapping the dimensions
    if((self->m != 1) && (self->n != 1)) {
        uint8_t *c = (uint8_t *)self->array->items;
        // self->bytes is the size of the bytearray, irrespective of the typecode
        uint8_t *tmp = m_new(uint8_t, self->bytes);
        for(size_t m=0; m < self->m; m++) {
            for(size_t n=0; n < self->n; n++) {
                memcpy(tmp+_sizeof*(n*self->m + m), c+_sizeof*(m*self->n + n), _sizeof);
            }
        }
        memcpy(self->array->items, tmp, self->bytes);
        m_del(uint8_t, tmp, self->bytes);
    } 
    SWAP(size_t, self->m, self->n);
    return mp_const_none;
}

mp_obj_t linalg_reshape(mp_obj_t self_in, mp_obj_t shape) {
    ndarray_obj_t *self = MP_OBJ_TO_PTR(self_in);
    if(!MP_OBJ_IS_TYPE(shape, &mp_type_tuple) || (MP_OBJ_SMALL_INT_VALUE(mp_obj_len_maybe(shape)) != 2)) {
        mp_raise_ValueError("shape must be a 2-tuple");
    }

    mp_obj_iter_buf_t iter_buf;
    mp_obj_t item, iterable = mp_getiter(shape, &iter_buf);
    uint16_t m, n;
    item = mp_iternext(iterable);
    m = mp_obj_get_int(item);
    item = mp_iternext(iterable);
    n = mp_obj_get_int(item);
    if(m*n != self->m*self->n) {
        // TODO: the proper error message would be "cannot reshape array of size %d into shape (%d, %d)"
        mp_raise_ValueError("cannot reshape array (incompatible input/output shape)");
    }
    self->m = m;
    self->n = n;
    return MP_OBJ_FROM_PTR(self);
}

mp_obj_t linalg_size(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj) } },
        { MP_QSTR_axis, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj)} },
    };

    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(1, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);

    if(!mp_obj_is_type(args[0].u_obj, &ulab_ndarray_type)) {
        mp_raise_TypeError("size is defined for ndarrays only");
    } else {
        ndarray_obj_t *ndarray = MP_OBJ_TO_PTR(args[0].u_obj);
        if(args[1].u_obj == mp_const_none) {
            return mp_obj_new_int(ndarray->array->len);
        } else if(mp_obj_is_int(args[1].u_obj)) {
            uint8_t ax = mp_obj_get_int(args[1].u_obj);
            if(ax == 0) {
                if(ndarray->m == 1) {
                    return mp_obj_new_int(ndarray->n);
                } else {
                    return mp_obj_new_int(ndarray->m);                    
                }
            } else if(ax == 1) {
                if(ndarray->m == 1) {
                    mp_raise_ValueError("tuple index out of range");
                } else {
                    return mp_obj_new_int(ndarray->n);
                }
            } else {
                    mp_raise_ValueError("tuple index out of range");                
            }
        } else {
            mp_raise_TypeError("wrong argument type");
        }
    }
}

bool linalg_invert_matrix(mp_float_t *data, size_t N) {
    // returns true, of the inversion was successful, 
    // false, if the matrix is singular
    
    // initially, this is the unit matrix: the contents of this matrix is what 
    // will be returned after all the transformations
    mp_float_t *unit = m_new(mp_float_t, N*N);

    mp_float_t elem = 1.0;
    // initialise the unit matrix
    memset(unit, 0, sizeof(mp_float_t)*N*N);
    for(size_t m=0; m < N; m++) {
        memcpy(&unit[m*(N+1)], &elem, sizeof(mp_float_t));
    }
    for(size_t m=0; m < N; m++){
        // this could be faster with ((c < epsilon) && (c > -epsilon))
        if(MICROPY_FLOAT_C_FUN(fabs)(data[m*(N+1)]) < epsilon) {
            m_del(mp_float_t, unit, N*N);
            return false;
        }
        for(size_t n=0; n < N; n++){
            if(m != n){
                elem = data[N*n+m] / data[m*(N+1)];
                for(size_t k=0; k < N; k++){
                    data[N*n+k] -= elem * data[N*m+k];
                    unit[N*n+k] -= elem * unit[N*m+k];
                }
            }
        }
    }
    for(size_t m=0; m < N; m++){ 
        elem = data[m*(N+1)];
        for(size_t n=0; n < N; n++){
            data[N*m+n] /= elem;
            unit[N*m+n] /= elem;
        }
    }
    memcpy(data, unit, sizeof(mp_float_t)*N*N);
    m_del(mp_float_t, unit, N*N);
    return true;
}

mp_obj_t linalg_inv(mp_obj_t o_in) {
    // since inv is not a class method, we have to inspect the input argument first
    if(!MP_OBJ_IS_TYPE(o_in, &ulab_ndarray_type)) {
        mp_raise_TypeError("only ndarrays can be inverted");
    }
    ndarray_obj_t *o = MP_OBJ_TO_PTR(o_in);
    if(!MP_OBJ_IS_TYPE(o_in, &ulab_ndarray_type)) {
        mp_raise_TypeError("only ndarray objects can be inverted");
    }
    if(o->m != o->n) {
        mp_raise_ValueError("only square matrices can be inverted");
    }
    ndarray_obj_t *inverted = create_new_ndarray(o->m, o->n, NDARRAY_FLOAT);
    mp_float_t *data = (mp_float_t *)inverted->array->items;
    mp_obj_t elem;
    for(size_t m=0; m < o->m; m++) { // rows first
        for(size_t n=0; n < o->n; n++) { // columns next
            // this could, perhaps, be done in single line... 
            // On the other hand, we probably spend little time here
            elem = mp_binary_get_val_array(o->array->typecode, o->array->items, m*o->n+n);
            data[m*o->n+n] = (mp_float_t)mp_obj_get_float(elem);
        }
    }
    
    if(!linalg_invert_matrix(data, o->m)) {
        // TODO: I am not sure this is needed here. Otherwise, 
        // how should we free up the unused RAM of inverted?
        m_del(mp_float_t, inverted->array->items, o->n*o->n);
        mp_raise_ValueError("input matrix is singular");
    }
    return MP_OBJ_FROM_PTR(inverted);
}

mp_obj_t linalg_dot(mp_obj_t _m1, mp_obj_t _m2) {
    // TODO: should the results be upcast?
    ndarray_obj_t *m1 = MP_OBJ_TO_PTR(_m1);
    ndarray_obj_t *m2 = MP_OBJ_TO_PTR(_m2);    
    if(m1->n != m2->m) {
        mp_raise_ValueError("matrix dimensions do not match");
    }
    // TODO: numpy uses upcasting here
    ndarray_obj_t *out = create_new_ndarray(m1->m, m2->n, NDARRAY_FLOAT);
    mp_float_t *outdata = (mp_float_t *)out->array->items;
    mp_float_t sum, v1, v2;
    for(size_t i=0; i < m1->m; i++) { // rows of m1
        for(size_t j=0; j < m2->n; j++) { // columns of m2
            sum = 0.0;
            for(size_t k=0; k < m2->m; k++) {
                // (i, k) * (k, j)
                v1 = ndarray_get_float_value(m1->array->items, m1->array->typecode, i*m1->n+k);
                v2 = ndarray_get_float_value(m2->array->items, m2->array->typecode, k*m2->n+j);
                sum += v1 * v2;
            }
            outdata[i*m1->m+j] = sum;
        }
    }
    return MP_OBJ_FROM_PTR(out);
}

mp_obj_t linalg_zeros_ones(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args, uint8_t kind) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_obj = MP_OBJ_NULL} } ,
        { MP_QSTR_dtype, MP_ARG_KW_ONLY | MP_ARG_INT, {.u_int = NDARRAY_FLOAT} },
    };

    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(n_args, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);
    
    uint8_t dtype = args[1].u_int;
    if(!mp_obj_is_int(args[0].u_obj) && !mp_obj_is_type(args[0].u_obj, &mp_type_tuple)) {
        mp_raise_TypeError("input argument must be an integer or a 2-tuple");
    }
    ndarray_obj_t *ndarray = NULL;
    if(mp_obj_is_int(args[0].u_obj)) {
        size_t n = mp_obj_get_int(args[0].u_obj);
        ndarray = create_new_ndarray(1, n, dtype);
    } else if(mp_obj_is_type(args[0].u_obj, &mp_type_tuple)) {
        mp_obj_tuple_t *tuple = MP_OBJ_TO_PTR(args[0].u_obj);
        if(tuple->len != 2) {
            mp_raise_TypeError("input argument must be an integer or a 2-tuple");            
        }
        ndarray = create_new_ndarray(mp_obj_get_int(tuple->items[0]), 
                                                  mp_obj_get_int(tuple->items[1]), dtype);
    }
    if(kind == 1) {
        mp_obj_t one = mp_obj_new_int(1);
        for(size_t i=0; i < ndarray->array->len; i++) {
            mp_binary_set_val_array(dtype, ndarray->array->items, i, one);
        }
    }
    return MP_OBJ_FROM_PTR(ndarray);
}

mp_obj_t linalg_zeros(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    return linalg_zeros_ones(n_args, pos_args, kw_args, 0);
}

mp_obj_t linalg_ones(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    return linalg_zeros_ones(n_args, pos_args, kw_args, 1);
}

mp_obj_t linalg_eye(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_INT, {.u_int = 0} },
        { MP_QSTR_M, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj) } },
        { MP_QSTR_k, MP_ARG_KW_ONLY | MP_ARG_INT, {.u_int = 0} },        
        { MP_QSTR_dtype, MP_ARG_KW_ONLY | MP_ARG_INT, {.u_int = NDARRAY_FLOAT} },
    };

    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(n_args, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);

    size_t n = args[0].u_int, m;
    int16_t k = args[2].u_int;
    uint8_t dtype = args[3].u_int;
    if(args[1].u_rom_obj == mp_const_none) {
        m = n;
    } else {
        m = mp_obj_get_int(args[1].u_rom_obj);
    }
    
    ndarray_obj_t *ndarray = create_new_ndarray(m, n, dtype);
    mp_obj_t one = mp_obj_new_int(1);
    size_t i = 0;
    if((k >= 0) && (k < n)) {
        while(k < n) {
            mp_binary_set_val_array(dtype, ndarray->array->items, i*n+k, one);
            k++;
            i++;
        }
    } else if((k < 0) && (-k < m)) {
        k = -k;
        i = 0;
        while(k < m) {
            mp_binary_set_val_array(dtype, ndarray->array->items, k*n+i, one);
            k++;
            i++;
        }
    }
    return MP_OBJ_FROM_PTR(ndarray);
}

mp_obj_t linalg_det(mp_obj_t oin) {
    if(!mp_obj_is_type(oin, &ulab_ndarray_type)) {
        mp_raise_TypeError("function defined for ndarrays only");
    }
    ndarray_obj_t *in = MP_OBJ_TO_PTR(oin);
    if(in->m != in->n) {
        mp_raise_ValueError("input must be square matrix");
    }
    
    mp_float_t *tmp = m_new(mp_float_t, in->n*in->n);
    for(size_t i=0; i < in->array->len; i++){
        tmp[i] = ndarray_get_float_value(in->array->items, in->array->typecode, i);
    }
    mp_float_t c;
    for(size_t m=0; m < in->m-1; m++){
        if(MICROPY_FLOAT_C_FUN(fabs)(tmp[m*(in->n+1)]) < epsilon) {
            m_del(mp_float_t, tmp, in->n*in->n);
            return mp_obj_new_float(0.0);
        }
        for(size_t n=0; n < in->n; n++){
            if(m != n) {
                c = tmp[in->n*n+m] / tmp[m*(in->n+1)];
                for(size_t k=0; k < in->n; k++){
                    tmp[in->n*n+k] -= c * tmp[in->n*m+k];
                }
            }
        }
    }
    mp_float_t det = 1.0;
                            
    for(size_t m=0; m < in->m; m++){ 
        det *= tmp[m*(in->n+1)];
    }
    m_del(mp_float_t, tmp, in->n*in->n);
    return mp_obj_new_float(det);
}

mp_obj_t linalg_eig(mp_obj_t oin) {
    if(!mp_obj_is_type(oin, &ulab_ndarray_type)) {
        mp_raise_TypeError("function defined for ndarrays only");
    }
    ndarray_obj_t *in = MP_OBJ_TO_PTR(oin);
    if(in->m != in->n) {
        mp_raise_ValueError("input must be square matrix");
    }
    mp_float_t *array = m_new(mp_float_t, in->array->len);
    for(size_t i=0; i < in->array->len; i++) {
        array[i] = ndarray_get_float_value(in->array->items, in->array->typecode, i);
    }
    // make sure the matrix is symmetric
    for(size_t m=0; m < in->m; m++) {
        for(size_t n=m+1; n < in->n; n++) {
            // compare entry (m, n) to (n, m)
            // TODO: this must probably be scaled!
            if(epsilon < MICROPY_FLOAT_C_FUN(fabs)(array[m*in->n + n] - array[n*in->n + m])) {
                mp_raise_ValueError("input matrix is asymmetric");
            }
        }
    }
    
    // if we got this far, then the matrix will be symmetric
    
    ndarray_obj_t *eigenvectors = create_new_ndarray(in->m, in->n, NDARRAY_FLOAT);
    mp_float_t *eigvectors = (mp_float_t *)eigenvectors->array->items;
    // start out with the unit matrix
    for(size_t m=0; m < in->m; m++) {
        eigvectors[m*(in->n+1)] = 1.0;
    }
    mp_float_t largest, w, t, c, s, tau, aMk, aNk, vm, vn;
    size_t M, N;
    size_t iterations = JACOBI_MAX*in->n*in->n;
    do {
        iterations--;
        // find the pivot here
        M = 0;
        N = 0;
        largest = 0.0;
        for(size_t m=0; m < in->m-1; m++) { // -1: no need to inspect last row
            for(size_t n=m+1; n < in->n; n++) {
                w = MICROPY_FLOAT_C_FUN(fabs)(array[m*in->n + n]);
                if((largest < w) && (epsilon < w)) {
                    M = m;
                    N = n;
                    largest = w;
                }
            }
        }
        if(M+N == 0) { // all entries are smaller than epsilon, there is not much we can do...
            break;
        }
        // at this point, we have the pivot, and it is the entry (M, N)
        // now we have to find the rotation angle
        w = (array[N*in->n + N] - array[M*in->n + M]) / (2.0*array[M*in->n + N]);
        // The following if/else chooses the smaller absolute value for the tangent 
        // of the rotation angle. Going with the smaller should be numerically stabler.
        if(w > 0) {
            t = MICROPY_FLOAT_C_FUN(sqrt)(w*w + 1.0) - w;
        } else {
            t = -1.0*(MICROPY_FLOAT_C_FUN(sqrt)(w*w + 1.0) + w);
        }
        s = t / MICROPY_FLOAT_C_FUN(sqrt)(t*t + 1.0); // the sine of the rotation angle
        c = 1.0 / MICROPY_FLOAT_C_FUN(sqrt)(t*t + 1.0); // the cosine of the rotation angle
        tau = (1.0-c)/s; // this is equal to the tangent of the half of the rotation angle
        
        // at this point, we have the rotation angles, so we can transform the matrix
        // first the two diagonal elements
        // a(M, M) = a(M, M) - t*a(M, N)
        array[M*in->n + M] = array[M*in->n + M] - t * array[M*in->n + N];
        // a(N, N) = a(N, N) + t*a(M, N)
        array[N*in->n + N] = array[N*in->n + N] + t * array[M*in->n + N];
        // after the rotation, the a(M, N), and a(N, M) entries should become zero
        array[M*in->n + N] = array[N*in->n + M] = 0.0;
        // then all other elements in the column
        for(size_t k=0; k < in->m; k++) {
            if((k == M) || (k == N)) {
                continue;
            }
            aMk = array[M*in->n + k];
            aNk = array[N*in->n + k];
            // a(M, k) = a(M, k) - s*(a(N, k) + tau*a(M, k))
            array[M*in->n + k] -= s*(aNk + tau*aMk);
            // a(N, k) = a(N, k) + s*(a(M, k) - tau*a(N, k))
            array[N*in->n + k] += s*(aMk - tau*aNk);
            // a(k, M) = a(M, k)
            array[k*in->n + M] = array[M*in->n + k];
            // a(k, N) = a(N, k)
            array[k*in->n + N] = array[N*in->n + k];
        }
        // now we have to update the eigenvectors
        // the rotation matrix, R, multiplies from the right
        // R is the unit matrix, except for the 
        // R(M,M) = R(N, N) = c
        // R(N, M) = s
        // (M, N) = -s
        // entries. This means that only the Mth, and Nth columns will change
        for(size_t m=0; m < in->m; m++) {
            vm = eigvectors[m*in->n+M];
            vn = eigvectors[m*in->n+N];
            // the new value of eigvectors(m, M)
            eigvectors[m*in->n+M] = c * vm - s * vn;
            // the new value of eigvectors(m, N)
            eigvectors[m*in->n+N] = s * vm + c * vn;
        }
    } while(iterations > 0);
    
    if(iterations == 0) { 
        // the computation did not converge; numpy raises LinAlgError
        m_del(mp_float_t, array, in->array->len);
        mp_raise_ValueError("iterations did not converge");
    }
    ndarray_obj_t *eigenvalues = create_new_ndarray(1, in->n, NDARRAY_FLOAT);
    mp_float_t *eigvalues = (mp_float_t *)eigenvalues->array->items;
    for(size_t i=0; i < in->n; i++) {
        eigvalues[i] = array[i*(in->n+1)];
    }
    m_del(mp_float_t, array, in->array->len);
    
    mp_obj_tuple_t *tuple = MP_OBJ_TO_PTR(mp_obj_new_tuple(2, NULL));
    tuple->items[0] = MP_OBJ_FROM_PTR(eigenvalues);
    tuple->items[1] = MP_OBJ_FROM_PTR(eigenvectors);
    return tuple;
    return MP_OBJ_FROM_PTR(eigenvalues);
}

written 17691 bytes to linalg.c


# Vectorising mathematical operations

## General comments

The following module implements the common mathematical functions for scalars, ndarrays (linear or matrix), and iterables. If the input argument is a scalar, a scalar is returned (i.e., for such arguments, these functions are identical to the functions in the `math` module), while for ndarrays, and iterables, the return value is an ndarray of type `float`. 

## Examples

In [73]:
%%micropython -unix 1

import ulab

# initialise an array
a = ulab.ndarray([1, 2, 3, 4, 5])
print('1D array: ', a)

print('\nexponent of an array (range(5)): ', ulab.exp(range(5)))

print('\nexponent of a scalar (2.0): ', ulab.exp(2.0))

print('\n exponent of a 1D ndarray (a): ', ulab.exp(a))

# initialise a matrix
b = ulab.ndarray([[1, 2, 3], [4, 5, 6]])
print('\n2D matrix: ', b)
print('exponent of a 2D matrix (b): ', ulab.exp(b))

1D array:  ndarray([1.0, 2.0, 3.0, 4.0, 5.0], dtype=float)

exponent of an array (range(5)):  ndarray([1.0, 2.718281745910645, 7.389056205749512, 20.08553695678711, 54.59814834594727], dtype=float)

exponent of a scalar (2.0):  7.38905609893065

 exponent of a 1D ndarray (a):  ndarray([2.718281745910645, 7.389056205749512, 20.08553695678711, 54.59814834594727, 148.4131622314453], dtype=float)

2D matrix:  ndarray([[1.0, 2.0, 3.0],
	 [4.0, 5.0, 6.0]], dtype=float)
exponent of a 2D matrix (b):  ndarray([[2.718281745910645, 7.389056205749512, 20.08553695678711],
	 [54.59814834594727, 148.4131622314453, 403.4288024902343]], dtype=float)




Note that ndarrays are linear arrays in memory, even if the `shape` of the ndarray is a matrix. This means that we can treat both cases in a *single* loop.

Since `ndarray`s are iterable, we could treat `ndarray`s, `list`s, `tuples`, and `range`s on the same footing. However, that would mean extra trips to a lot of functions, therefore, reading out the values of the `ndarray` directly is probably significantly faster. 

## vectorise.h

In [29]:
%%ccode vectorise.h

#ifndef _VECTORISE_
#define _VECTORISE_

#include "ndarray.h"

mp_obj_t vectorise_acos(mp_obj_t );
mp_obj_t vectorise_acosh(mp_obj_t );
mp_obj_t vectorise_asin(mp_obj_t );
mp_obj_t vectorise_asinh(mp_obj_t );
mp_obj_t vectorise_atan(mp_obj_t );
mp_obj_t vectorise_atanh(mp_obj_t );
mp_obj_t vectorise_ceil(mp_obj_t );
mp_obj_t vectorise_cos(mp_obj_t );
mp_obj_t vectorise_erf(mp_obj_t );
mp_obj_t vectorise_erfc(mp_obj_t );
mp_obj_t vectorise_exp(mp_obj_t );
mp_obj_t vectorise_expm1(mp_obj_t );
mp_obj_t vectorise_floor(mp_obj_t );
mp_obj_t vectorise_gamma(mp_obj_t );
mp_obj_t vectorise_lgamma(mp_obj_t );
mp_obj_t vectorise_log(mp_obj_t );
mp_obj_t vectorise_log10(mp_obj_t );
mp_obj_t vectorise_log2(mp_obj_t );
mp_obj_t vectorise_sin(mp_obj_t );
mp_obj_t vectorise_sinh(mp_obj_t );
mp_obj_t vectorise_sqrt(mp_obj_t );
mp_obj_t vectorise_tan(mp_obj_t );
mp_obj_t vectorise_tanh(mp_obj_t );

#define ITERATE_VECTOR(type, source, out) do {\
    type *input = (type *)(source)->array->items;\
    for(size_t i=0; i < (source)->array->len; i++) {\
                (out)[i] = f(input[i]);\
    }\
} while(0)

#define MATH_FUN_1(py_name, c_name) \
    mp_obj_t vectorise_ ## py_name(mp_obj_t x_obj) { \
        return vectorise_generic_vector(x_obj, MICROPY_FLOAT_C_FUN(c_name)); \
    }
    
#endif

written 1478 bytes to vectorise.h


## vectorise.c

In [168]:
%%ccode vectorise.c

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include "py/runtime.h"
#include "py/binary.h"
#include "py/obj.h"
#include "py/objarray.h"
#include "vectorise.h"

#ifndef MP_PI
#define MP_PI MICROPY_FLOAT_CONST(3.14159265358979323846)
#endif
    
mp_obj_t vectorise_generic_vector(mp_obj_t o_in, mp_float_t (*f)(mp_float_t)) {
    // Return a single value, if o_in is not iterable
    if(mp_obj_is_float(o_in) || mp_obj_is_integer(o_in)) {
            return mp_obj_new_float(f(mp_obj_get_float(o_in)));
    }
    mp_float_t x;
    if(MP_OBJ_IS_TYPE(o_in, &ulab_ndarray_type)) {
        ndarray_obj_t *source = MP_OBJ_TO_PTR(o_in);
        ndarray_obj_t *ndarray = create_new_ndarray(source->m, source->n, NDARRAY_FLOAT);
        mp_float_t *dataout = (mp_float_t *)ndarray->array->items;
        if(source->array->typecode == NDARRAY_UINT8) {
            ITERATE_VECTOR(uint8_t, source, dataout);
        } else if(source->array->typecode == NDARRAY_INT8) {
            ITERATE_VECTOR(int8_t, source, dataout);
        } else if(source->array->typecode == NDARRAY_UINT16) {
            ITERATE_VECTOR(uint16_t, source, dataout);
        } else if(source->array->typecode == NDARRAY_INT16) {
            ITERATE_VECTOR(int16_t, source, dataout);
        } else {
            ITERATE_VECTOR(mp_float_t, source, dataout);
        }
        return MP_OBJ_FROM_PTR(ndarray);
    } else if(MP_OBJ_IS_TYPE(o_in, &mp_type_tuple) || MP_OBJ_IS_TYPE(o_in, &mp_type_list) || 
        MP_OBJ_IS_TYPE(o_in, &mp_type_range)) { // i.e., the input is a generic iterable
            mp_obj_array_t *o = MP_OBJ_TO_PTR(o_in);
            ndarray_obj_t *out = create_new_ndarray(1, o->len, NDARRAY_FLOAT);
            mp_float_t *dataout = (mp_float_t *)out->array->items;
            mp_obj_iter_buf_t iter_buf;
            mp_obj_t item, iterable = mp_getiter(o_in, &iter_buf);
            size_t i=0;
            while ((item = mp_iternext(iterable)) != MP_OBJ_STOP_ITERATION) {
                x = mp_obj_get_float(item);
                dataout[i++] = f(x);
            }
        return MP_OBJ_FROM_PTR(out);
    }
    return mp_const_none;
}

MATH_FUN_1(acos, acos);
MATH_FUN_1(acosh, acosh);
MATH_FUN_1(asin, asin);
MATH_FUN_1(asinh, asinh);
MATH_FUN_1(atan, atan);
MATH_FUN_1(atanh, atanh);
MATH_FUN_1(ceil, ceil);
MATH_FUN_1(cos, cos);
MATH_FUN_1(erf, erf);
MATH_FUN_1(erfc, erfc);
MATH_FUN_1(exp, exp);
MATH_FUN_1(expm1, expm1);
MATH_FUN_1(floor, floor);
MATH_FUN_1(gamma, tgamma);
MATH_FUN_1(lgamma, lgamma);
MATH_FUN_1(log, log);
MATH_FUN_1(log10, log10);
MATH_FUN_1(log2, log2);
MATH_FUN_1(sin, sin);
MATH_FUN_1(sinh, sinh);
MATH_FUN_1(sqrt, sqrt);
MATH_FUN_1(tan, tan);
MATH_FUN_1(tanh, tanh);

written 2878 bytes to vectorise.c


# Polynomials

This module has two functions, `polyval`, and `polyfit`. The background for `polyfit` can be found under https://en.wikipedia.org/wiki/Polynomial_regression, and one can take the matrix inversion function from `linalg`. 

## Background 

An estimate, $\beta$, for the coefficients of a polynomial fit can be gotten from
\begin{equation}
\vec{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \vec{y}
\end{equation}
where $\vec{y}$ are the dependent values, and the matrix $X$ is constructed from the independent values as 
\begin{equation}
X = \begin{pmatrix}
1 & x_1^2 & x_1^2 & ... & x_1^m 
\\
1 & x_2^2 & x_2^2 & ... & x_2^m 
\\
\vdots & \vdots & \vdots & \ddots & \vdots 
\\
1 & x_n^2 & x_n^2 & ... & x_n^m 
\end{pmatrix}
\end{equation}

Note that the calculation of $X^T$ is trivial, and we need $X$ only once, namely in the product $X^TX$. We will save RAM by storing only $X^T$, and expressing $X$ from $X^T$, when we need it. The routine calculates the coefficients in increasing order, therefore, before returning, we have to reverse the array.

## Examples

### polyval

In [416]:
%%micropython -unix 1

import ulab

p = [1, 1, 1, 0]
x = [0, 1, 2, 3, 4]
print('coefficients: ', p)
print('independent values: ', x)
print('\nvalues of p(x): ', ulab.polyval(p, x))

# the same works with ndarrays
a = ulab.ndarray(x)
print('\nndarray (a): ', a)
print('value of p(a): ', ulab.polyval(p, a))

coefficients:  [1, 1, 1, 0]
independent values:  [0, 1, 2, 3, 4]

values of p(x):  ndarray([0.0, 3.0, 14.0, 39.0, 84.0], dtype=float)

ndarray (a):  ndarray([0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
value of p(a):  ndarray([0.0, 3.0, 14.0, 39.0, 84.0], dtype=float)




### polyfit

First a perfect parabola with zero shift, and leading coefficient of 1. 

In [422]:
%%micropython -unix 1

import ulab

x = ulab.ndarray([-3, -2, -1, 0, 1, 2, 3])
y = ulab.ndarray([9, 4, 1, 0, 1, 4, 9])
print('independent values: ', x)
print('dependent values: ', y)

print('fit values', ulab.polyfit(x, y, 2))

independent values:  ndarray([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], dtype=float)
dependent values:  ndarray([9.0, 4.0, 1.0, 0.0, 1.0, 4.0, 9.0], dtype=float)
fit values ndarray([1.00000011920929, 0.0, 0.0], dtype=float)




We can now take a more meaningful example: the data points scatter here:

In [423]:
%%micropython -unix 1

import ulab

x = ulab.ndarray([-3, -2, -1, 0, 1, 2, 3])
y = ulab.ndarray([10, 5, 1, 0, 1, 4.2, 9.1])
print('independent values: ', x)
print('dependent values: ', y)

print('fit values', ulab.polyfit(x, y, 2))

independent values:  ndarray([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], dtype=float)
dependent values:  ndarray([10.0, 5.0, 1.0, 0.0, 1.0, 4.199999809265137, 9.100000381469727], dtype=float)
fit values ndarray([1.065476179122925, -0.1535714119672775, 0.06666660308837891], dtype=float)




Finally, let us see, what this looks like in numpy:

In [419]:
x = array([-3, -2, -1, 0, 1, 2, 3])
y = array([10, 5, 1, 0, 1, 4.2, 9.1])
print('independent values: ', x)
print('dependent values: ', y)

print('fit values: ', polyfit(x, y, 2))

independent values:  [-3 -2 -1  0  1  2  3]
dependent values:  [10.   5.   1.   0.   1.   4.2  9.1]
fit values:  [ 1.06547619 -0.15357143  0.06666667]


Look at that! The difference to numpy is minuscule!

## poly.h

In [251]:
%%ccode poly.h

#ifndef _POLY_
#define _POLY_

mp_obj_t poly_polyval(mp_obj_t , mp_obj_t );
mp_obj_t poly_polyfit(size_t  , const mp_obj_t *);

#endif

written 315 bytes to poly.h


## poly.c

In [346]:
%%ccode poly.c

#include "py/obj.h"
#include "py/runtime.h"
#include "py/objarray.h"
#include "ndarray.h"
#include "linalg.h"
#include "poly.h"


bool object_is_nditerable(mp_obj_t o_in) {
    if(mp_obj_is_type(o_in, &ulab_ndarray_type) || 
      mp_obj_is_type(o_in, &mp_type_tuple) || 
      mp_obj_is_type(o_in, &mp_type_list) || 
      mp_obj_is_type(o_in, &mp_type_range)) {
        return true;
    }
    return false;
}

size_t get_nditerable_len(mp_obj_t o_in) {
    if(mp_obj_is_type(o_in, &ulab_ndarray_type)) {
        ndarray_obj_t *in = MP_OBJ_TO_PTR(o_in);
        return in->array->len;
    } else {
        return (size_t)mp_obj_get_int(mp_obj_len_maybe(o_in));
    }
}

mp_obj_t poly_polyval(mp_obj_t o_p, mp_obj_t o_x) {
    // TODO: return immediately, if o_p is not an iterable
    // TODO: there is a bug here: matrices won't work, 
    // because there is a single iteration loop
    size_t m, n;
    if(MP_OBJ_IS_TYPE(o_x, &ulab_ndarray_type)) {
        ndarray_obj_t *ndx = MP_OBJ_TO_PTR(o_x);
        m = ndx->m;
        n = ndx->n;
    } else {
        mp_obj_array_t *ix = MP_OBJ_TO_PTR(o_x);
        m = 1;
        n = ix->len;
    }
    // polynomials are going to be of type float, except, when both 
    // the coefficients and the independent variable are integers
    ndarray_obj_t *out = create_new_ndarray(m, n, NDARRAY_FLOAT);
    mp_obj_iter_buf_t x_buf;
    mp_obj_t x_item, x_iterable = mp_getiter(o_x, &x_buf);

    mp_obj_iter_buf_t p_buf;
    mp_obj_t p_item, p_iterable;

    mp_float_t x, y;
    mp_float_t *outf = (mp_float_t *)out->array->items;
    uint8_t plen = mp_obj_get_int(mp_obj_len_maybe(o_p));
    mp_float_t *p = m_new(mp_float_t, plen);
    p_iterable = mp_getiter(o_p, &p_buf);
    uint16_t i = 0;    
    while((p_item = mp_iternext(p_iterable)) != MP_OBJ_STOP_ITERATION) {
        p[i] = mp_obj_get_float(p_item);
        i++;
    }
    i = 0;
    while ((x_item = mp_iternext(x_iterable)) != MP_OBJ_STOP_ITERATION) {
        x = mp_obj_get_float(x_item);
        y = p[0];
        for(uint8_t j=0; j < plen-1; j++) {
            y *= x;
            y += p[j+1];
        }
        outf[i++] = y;
    }
    m_del(mp_float_t, p, plen);
    return MP_OBJ_FROM_PTR(out);
}

mp_obj_t poly_polyfit(size_t  n_args, const mp_obj_t *args) {
    if((n_args != 2) && (n_args != 3)) {
        mp_raise_ValueError("number of arguments must be 2, or 3");
    }
    if(!object_is_nditerable(args[0])) {
        mp_raise_ValueError("input data must be an iterable");
    }
    uint16_t lenx, leny;
    uint8_t deg;
    mp_float_t *x, *XT, *y, *prod;

    if(n_args == 2) { // only the y values are supplied
        // TODO: this is actually not enough: the first argument can very well be a matrix, 
        // in which case we are between the rock and a hard place
        leny = (uint16_t)mp_obj_get_int(mp_obj_len_maybe(args[0]));
        deg = (uint8_t)mp_obj_get_int(args[1]);
        if(leny < deg) {
            mp_raise_ValueError("more degrees of freedom than data points");
        }
        lenx = leny;
        x = m_new(mp_float_t, lenx); // assume uniformly spaced data points
        for(size_t i=0; i < lenx; i++) {
            x[i] = i;
        }
        y = m_new(mp_float_t, leny);
        fill_array_iterable(y, args[0]);
    } else if(n_args == 3) {
        lenx = (uint16_t)mp_obj_get_int(mp_obj_len_maybe(args[0]));
        leny = (uint16_t)mp_obj_get_int(mp_obj_len_maybe(args[0]));
        if(lenx != leny) {
            mp_raise_ValueError("input vectors must be of equal length");
        }
        deg = (uint8_t)mp_obj_get_int(args[2]);
        if(leny < deg) {
            mp_raise_ValueError("more degrees of freedom than data points");
        }
        x = m_new(mp_float_t, lenx);
        fill_array_iterable(x, args[0]);
        y = m_new(mp_float_t, leny);
        fill_array_iterable(y, args[1]);
    }
    
    // one could probably express X as a function of XT, 
    // and thereby save RAM, because X is used only in the product
    XT = m_new(mp_float_t, (deg+1)*leny); // XT is a matrix of shape (deg+1, len) (rows, columns)
    for(uint8_t i=0; i < leny; i++) { // column index
        XT[i+0*lenx] = 1.0; // top row
        for(uint8_t j=1; j < deg+1; j++) { // row index
            XT[i+j*leny] = XT[i+(j-1)*leny]*x[i];
        }
    }
    
    prod = m_new(mp_float_t, (deg+1)*(deg+1)); // the product matrix is of shape (deg+1, deg+1)
    mp_float_t sum;
    for(uint16_t i=0; i < deg+1; i++) { // column index
        for(uint16_t j=0; j < deg+1; j++) { // row index
            sum = 0.0;
            for(size_t k=0; k < lenx; k++) {
                // (j, k) * (k, i) 
                // Note that the second matrix is simply the transpose of the first: 
                // X(k, i) = XT(i, k) = XT[k*lenx+i]
                sum += XT[j*lenx+k]*XT[i*lenx+k]; // X[k*(deg+1)+i];
            }
            prod[j*(deg+1)+i] = sum;
        }
    }
    if(!linalg_invert_matrix(prod, deg+1)) {
        // Although X was a Vandermonde matrix, whose inverse is guaranteed to exist, 
        // we bail out here, if prod couldn't be inverted: if the values in x are not all 
        // distinct, prod is singular
        m_del(mp_float_t, XT, (deg+1)*lenx);
        m_del(mp_float_t, x, lenx);
        m_del(mp_float_t, y, lenx);
        m_del(mp_float_t, prod, (deg+1)*(deg+1));
        mp_raise_ValueError("could not invert Vandermonde matrix");
    } 
    // at this point, we have the inverse of X^T * X
    // y is a column vector; x is free now, we can use it for storing intermediate values
    for(uint16_t i=0; i < deg+1; i++) { // row index
        sum = 0.0;
        for(uint16_t j=0; j < lenx; j++) { // column index
            sum += XT[i*lenx+j]*y[j];
        }
        x[i] = sum;
    }
    // XT is no longer needed
    m_del(mp_float_t, XT, (deg+1)*leny);
    
    ndarray_obj_t *beta = create_new_ndarray(deg+1, 1, NDARRAY_FLOAT);
    mp_float_t *betav = (mp_float_t *)beta->array->items;
    // x[0..(deg+1)] contains now the product X^T * y; we can get rid of y
    m_del(float, y, leny);
    
    // now, we calculate beta, i.e., we apply prod = (X^T * X)^(-1) on x = X^T * y; x is a column vector now
    for(uint16_t i=0; i < deg+1; i++) {
        sum = 0.0;
        for(uint16_t j=0; j < deg+1; j++) {
            sum += prod[i*(deg+1)+j]*x[j];
        }
        betav[i] = sum;
    }
    m_del(mp_float_t, x, lenx);
    m_del(mp_float_t, prod, (deg+1)*(deg+1));
    for(uint8_t i=0; i < (deg+1)/2; i++) {
        // We have to reverse the array, for the leading coefficient comes first. 
        SWAP(mp_float_t, betav[i], betav[deg-i]);
    }
    return MP_OBJ_FROM_PTR(beta);
}

written 6859 bytes to poly.c


# Fast Fourier transform

The original idea of the implementation of the fast Fourier transform is taken from Numerical recipes. The main modification is that the present FFT kernel requires two input vectors of float type: one for the real part, and one for the imaginary part, while in Numerical recipes, the real and imaginary parts occupy alternating positions in the same array. 

However, since `ndarray` cannot hold complex types, it makes sense to starts with two separate vectors. This is especially true for our particular case, since the data are most probably real, coming from an ADC or similar. By separating the real and imaginary parts at the very beginning, we can process *real* data by not providing the imaginary part. If only one argument is supplied, it is assumed to be real, and the imaginary part is automatically filled in.

Now, the implementation computes the transform in place. This means that RAM space could be saved, if the old data are not required anymore. The problem, however, is that the results are of type float, irrespective of the input type. If one can somehow guarantee that the input type is also float, then the old data can be overwritten. This is what happens in the `spectrum` function that overwrites the input array.

## Examples

### Full FFT

In [435]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([0, 1, 2, 3, 0, 1, 2, 3])
re, im = ulab.fft(a)
print('real part: ', re)
print('imag part: ', im)

real part:  ndarray([12.0, 0.0, -3.999999761581421, 0.0, -4.0, 0.0, -4.0, 0.0], dtype=float)
imag part:  ndarray([0.0, 0.0, 3.999999523162842, 0.0, 0.0, 0.0, -3.999999523162842, 0.0], dtype=float)




The same Fourier transform on numpy:

In [436]:
fft.fft([0, 1, 2, 3, 0, 1, 2, 3])

array([12.+0.j,  0.+0.j, -4.+4.j,  0.+0.j, -4.+0.j,  0.+0.j, -4.-4.j,
        0.+0.j])

### Spectrum

In [447]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([0, 1, 2, 3, 0, 1, 2, 3])
ulab.spectrum(a)
print(a)

ndarray([12.0, 0.0, 5.656853675842285, 0.0, 4.0, 0.0, 5.656853675842285, 0.0], dtype=float)




And watch this: if you need the spectrum, but do not want to overwrite your data, you can do the following

In [440]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([0, 1, 2, 3, 0, 1, 2, 3])
re, im = ulab.fft(a)
print('spectrum: ', ulab.sqrt(re*re+im*im))

spectrum:  ndarray([12.0, 0.0, 5.656853675842285, 0.0, 4.0, 0.0, 5.656853675842285, 0.0], dtype=float)




## fft.h

In [1672]:
%%ccode fft.h

#ifndef _FFT_
#define _FFT_

#ifndef MP_PI
#define MP_PI MICROPY_FLOAT_CONST(3.14159265358979323846)
#endif

#define SWAP(t, a, b) { t tmp = a; a = b; b = tmp; }

mp_obj_t fft_fft(size_t , const mp_obj_t *);
mp_obj_t fft_ifft(size_t , const mp_obj_t *);
mp_obj_t fft_spectrum(size_t , const mp_obj_t *);
#endif

written 491 bytes to fft.h


## fft.c

In [169]:
%%ccode fft.c

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "py/runtime.h"
#include "py/binary.h"
#include "py/obj.h"
#include "py/objarray.h"
#include "ndarray.h"
#include "fft.h"

enum FFT_TYPE {
    FFT_FFT,
    FFT_IFFT,
    FFT_SPECTRUM,
};

void fft_kernel(mp_float_t *real, mp_float_t *imag, int n, int isign) {
    // This is basically a modification of four1 from Numerical Recipes
    // The main difference is that this function takes two arrays, one 
    // for the real, and one for the imaginary parts. 
    int j, m, mmax, istep;
    mp_float_t tempr, tempi;
    mp_float_t wtemp, wr, wpr, wpi, wi, theta;

    j = 0;
    for(int i = 0; i < n; i++) {
        if (j > i) {
            SWAP(mp_float_t, real[i], real[j]);
            SWAP(mp_float_t, imag[i], imag[j]);
        }
        m = n >> 1;
        while (j >= m && m > 0) {
            j -= m;
            m >>= 1;
        }
        j += m;
    }

    mmax = 1;
    while (n > mmax) {
        istep = mmax << 1;
        theta = -2.0*isign*MP_PI/istep;
        wtemp = MICROPY_FLOAT_C_FUN(sin)(0.5 * theta);
        wpr = -2.0 * wtemp * wtemp;
        wpi = MICROPY_FLOAT_C_FUN(sin)(theta);
        wr = 1.0;
        wi = 0.0;
        for(m = 0; m < mmax; m++) {
            for(int i = m; i < n; i += istep) {
                j = i + mmax;
                tempr = wr * real[j] - wi * imag[j];
                tempi = wr * imag[j] + wi * real[j];
                real[j] = real[i] - tempr;
                imag[j] = imag[i] - tempi;
                real[i] += tempr;
                imag[i] += tempi;
            }
            wtemp = wr;
            wr = wr*wpr - wi*wpi + wr;
            wi = wi*wpr + wtemp*wpi + wi;
        }
        mmax = istep;
    }
}

mp_obj_t fft_fft_ifft_spectrum(size_t n_args, mp_obj_t arg_re, mp_obj_t arg_im, uint8_t type) {
    if(!MP_OBJ_IS_TYPE(arg_re, &ulab_ndarray_type)) {
        mp_raise_NotImplementedError("FFT is defined for ndarrays only");
    } 
    if(n_args == 2) {
        if(!MP_OBJ_IS_TYPE(arg_im, &ulab_ndarray_type)) {
            mp_raise_NotImplementedError("FFT is defined for ndarrays only");
        }
    }
    // Check if input is of length of power of 2
    ndarray_obj_t *re = MP_OBJ_TO_PTR(arg_re);
    uint16_t len = re->array->len;
    if((len & (len-1)) != 0) {
        mp_raise_ValueError("input array length must be power of 2");
    }
    
    ndarray_obj_t *out_re = create_new_ndarray(1, len, NDARRAY_FLOAT);
    mp_float_t *data_re = (mp_float_t *)out_re->array->items;
    
    if(re->array->typecode == NDARRAY_FLOAT) { 
        // By treating this case separately, we can save a bit of time.
        // I don't know if it is worthwhile, though...
        memcpy((mp_float_t *)out_re->array->items, (mp_float_t *)re->array->items, re->bytes);
    } else {
        for(size_t i=0; i < len; i++) {
            data_re[i] = ndarray_get_float_value(re->array->items, re->array->typecode, i);
        }
    }
    ndarray_obj_t *out_im = create_new_ndarray(1, len, NDARRAY_FLOAT);
    mp_float_t *data_im = (mp_float_t *)out_im->array->items;

    if(n_args == 2) {
        ndarray_obj_t *im = MP_OBJ_TO_PTR(arg_im);
        if (re->array->len != im->array->len) {
            mp_raise_ValueError("real and imaginary parts must be of equal length");
        }
        if(im->array->typecode == NDARRAY_FLOAT) {
            memcpy((mp_float_t *)out_im->array->items, (mp_float_t *)im->array->items, im->bytes);
        } else {
            for(size_t i=0; i < len; i++) {
                data_im[i] = ndarray_get_float_value(im->array->items, im->array->typecode, i);
            }
        }
    }
    if((type == FFT_FFT) || (type == FFT_SPECTRUM)) {
        fft_kernel(data_re, data_im, len, 1);
        if(type == FFT_SPECTRUM) {
            for(size_t i=0; i < len; i++) {
                data_re[i] = MICROPY_FLOAT_C_FUN(sqrt)(data_re[i]*data_re[i] + data_im[i]*data_im[i]);
            }
        }
    } else { // inverse transform
        fft_kernel(data_re, data_im, len, -1);
        // TODO: numpy accepts the norm keyword argument
        for(size_t i=0; i < len; i++) {
            data_re[i] /= len;
            data_im[i] /= len;
        }
    }
    if(type == FFT_SPECTRUM) {
        return MP_OBJ_TO_PTR(out_re);
    } else {
        mp_obj_t tuple[2];
        tuple[0] = out_re;
        tuple[1] = out_im;
        return mp_obj_new_tuple(2, tuple);
    }
}

mp_obj_t fft_fft(size_t n_args, const mp_obj_t *args) {
    if(n_args == 2) {
        return fft_fft_ifft_spectrum(n_args, args[0], args[1], FFT_FFT);
    } else {
        return fft_fft_ifft_spectrum(n_args, args[0], mp_const_none, FFT_FFT);        
    }
}

mp_obj_t fft_ifft(size_t n_args, const mp_obj_t *args) {
    if(n_args == 2) {
        return fft_fft_ifft_spectrum(n_args, args[0], args[1], FFT_IFFT);
    } else {
        return fft_fft_ifft_spectrum(n_args, args[0], mp_const_none, FFT_IFFT);
    }
}

mp_obj_t fft_spectrum(size_t n_args, const mp_obj_t *args) {
    if(n_args == 2) {
        return fft_fft_ifft_spectrum(n_args, args[0], args[1], FFT_SPECTRUM);
    } else {
        return fft_fft_ifft_spectrum(n_args, args[0], mp_const_none, FFT_SPECTRUM);
    }
}

written 5401 bytes to fft.c


# Numerical

## General comments

This section contains miscellaneous functions that did not fit in the other submodules. These include `linspace`, `min/max`, `argmin/argmax`, `sum`, `mean`, `std`. These latter functions work with iterables, or ndarrays. When the ndarray is two-dimensional, an `axis` keyword can be supplied, in which case, the function returns a vector, otherwise a scalar.

Since the return values of `mean`, and `std` are most probably floats, these functions return ndarrays of type float, while `min/max` and `clip` do not change the type, and `argmin/argmax` return `uint8`, if the values are smaller than 255, otherwise, `uint16`.

### roll

Note that at present, arrays are always rolled to the left, even when the user specifies right. The reason for that is inner working of `memcpy`: one can shift contiguous chunks to the left only. If one tries to shift to the right, then the same value will be written into the new array over and over again.

## Examples

In [22]:
%%micropython -unix 1

import ulab

print(ulab.linspace(0, 10, 11))
print(ulab.sum([1, 2, 3]))

a = ulab.ndarray([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [55, 66, 77, 88, 99]], dtype=ulab.int8)
print(a)
ulab.roll(a, -1, axis=0)
print(a)
ulab.roll(a, 1, axis=1)
print(a)

ndarray([0.0, 1.0, 2.0, ..., 8.0, 9.0, 10.0], dtype=float)
6.0
ndarray([[1, 2, 3, 4, 5],
	 [6, 7, 8, 9, 10],
	 [55, 66, 77, 88, 99]], dtype=int8)
ndarray([[55, 66, 77, 88, 99],
	 [1, 2, 3, 4, 5],
	 [6, 7, 8, 9, 10]], dtype=int8)
ndarray([[66, 77, 88, 99, 55],
	 [2, 3, 4, 5, 1],
	 [7, 8, 9, 10, 6]], dtype=int8)




In [819]:
linspace(0, 10, 11, endpoint=False, dtype=int8, retstep=True)

(array([0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int8), 0.9090909090909091)

In [863]:
a = array([0, 1, 2, 3], dtype=float)
abs(a), -a

(array([0., 1., 2., 3.]), array([-0., -1., -2., -3.]))

In [834]:
%%micropython -unix 1

import ulab

print(ulab.linspace(0, 10, num=11, endpoint=True, retstep=True, dtype=ulab.int8))
print(ulab.linspace(0, 10, num=11, endpoint=False, retstep=True, dtype=ulab.int16))

(ndarray([0, 1, 2, ..., 8, 9, 10], dtype=int8), 1.0)
(ndarray([0, 0, 1, ..., 7, 8, 9], dtype=int16), 0.9090909361839294)




In [971]:
%%micropython -unix 1

import ulab

a = ulab.ndarray([0, 1, 2, -3], dtype=ulab.float)
print(abs(a))

ndarray([0.0, 1.0, 2.0, 3.0], dtype=float)




## numerical.h

In [157]:
%%ccode numerical.h

#ifndef _NUMERICAL_
#define _NUMERICAL_

#include "ndarray.h"

mp_obj_t numerical_linspace(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_sum(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_mean(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_std(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_min(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_max(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_argmin(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_argmax(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_roll(size_t , const mp_obj_t *, mp_map_t *);

// TODO: implement minimum/maximum, and cumsum
mp_obj_t numerical_minimum(mp_obj_t , mp_obj_t );
mp_obj_t numerical_maximum(mp_obj_t , mp_obj_t );
mp_obj_t numerical_cumsum(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_flip(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_diff(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_sort(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_sort_inplace(size_t , const mp_obj_t *, mp_map_t *);
mp_obj_t numerical_argsort(size_t , const mp_obj_t *, mp_map_t *);

// this macro could be tighter, if we moved the ifs to the argmin function, assigned <, as well as >
#define ARG_MIN_LOOP(in, type, start, stop, stride, op) do {\
    type *array = (type *)(in)->array->items;\
    if(((op) == NUMERICAL_MAX) || ((op) == NUMERICAL_ARGMAX)) {\
        for(size_t i=(start)+(stride); i < (stop); i+=(stride)) {\
            if((array)[i] > (array)[best_idx]) {\
                best_idx = i;\
            }\
        }\
    } else{\
        for(size_t i=(start)+(stride); i < (stop); i+=(stride)) {\
            if((array)[i] < (array)[best_idx]) best_idx = i;\
        }\
    }\
} while(0)

#define CALCULATE_DIFF(in, out, type, M, N, inn, increment) do {\
    type *source = (type *)(in)->array->items;\
    type *target = (type *)(out)->array->items;\
    for(size_t i=0; i < (M); i++) {\
        for(size_t j=0; j < (N); j++) {\
            for(uint8_t k=0; k < n+1; k++) {\
                target[i*(N)+j] -= stencil[k]*source[i*(inn)+j+k*(increment)];\
            }\
        }\
    }\
} while(0)

#define HEAPSORT(type, ndarray) do {\
    type *array = (type *)(ndarray)->array->items;\
    type tmp;\
    for (;;) {\
        if (k > 0) {\
            tmp = array[start+(--k)*increment];\
        } else {\
            q--;\
            if(q == 0) {\
                break;\
            }\
            tmp = array[start+q*increment];\
            array[start+q*increment] = array[start];\
        }\
        p = k;\
        c = k + k + 1;\
        while (c < q) {\
            if((c + 1 < q)  &&  (array[start+(c+1)*increment] > array[start+c*increment])) {\
                c++;\
            }\
            if(array[start+c*increment] > tmp) {\
                array[start+p*increment] = array[start+c*increment];\
                p = c;\
                c = p + p + 1;\
            } else {\
                break;\
            }\
        }\
        array[start+p*increment] = tmp;\
    }\
} while(0)

// This is pretty similar to HEAPSORT above; perhaps, the two could be combined somehow
// On the other hand, since this is a macro, it doesn't really matter
// Keep in mind that initially, index_array[start+s*increment] = s
#define HEAP_ARGSORT(type, ndarray, index_array) do {\
    type *array = (type *)(ndarray)->array->items;\
    type tmp;\
    uint16_t itmp;\
    for (;;) {\
        if (k > 0) {\
            k--;\
            tmp = array[start+index_array[start+k*increment]*increment];\
            itmp = index_array[start+k*increment];\
        } else {\
            q--;\
            if(q == 0) {\
                break;\
            }\
            tmp = array[start+index_array[start+q*increment]*increment];\
            itmp = index_array[start+q*increment];\
            index_array[start+q*increment] = index_array[start];\
        }\
        p = k;\
        c = k + k + 1;\
        while (c < q) {\
            if((c + 1 < q)  &&  (array[start+index_array[start+(c+1)*increment]*increment] > array[start+index_array[start+c*increment]*increment])) {\
                c++;\
            }\
            if(array[start+index_array[start+c*increment]*increment] > tmp) {\
                index_array[start+p*increment] = index_array[start+c*increment];\
                p = c;\
                c = p + p + 1;\
            } else {\
                break;\
            }\
        }\
        index_array[start+p*increment] = itmp;\
    }\
} while(0)

#endif

written 4779 bytes to numerical.h


## numerical.c

### Argument parsing

Since most of these functions operate on matrices along an axis, it might make sense to factor out the parsing of arguments and keyword arguments. The void function `numerical_parse_args` fills in the pointer for the matrix/array, and the axis.

In [172]:
%%ccode numerical.c

#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "py/obj.h"
#include "py/objint.h"
#include "py/runtime.h"
#include "py/builtin.h"
#include "py/misc.h"
#include "numerical.h"

enum NUMERICAL_FUNCTION_TYPE {
    NUMERICAL_MIN,
    NUMERICAL_MAX,
    NUMERICAL_ARGMIN,
    NUMERICAL_ARGMAX,
    NUMERICAL_SUM,
    NUMERICAL_MEAN,
    NUMERICAL_STD,
};

mp_obj_t numerical_linspace(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj) } },
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj) } },
        { MP_QSTR_num, MP_ARG_INT, {.u_int = 50} },
        { MP_QSTR_endpoint, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_true_obj)} },
        { MP_QSTR_retstep, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_false_obj)} },
        { MP_QSTR_dtype, MP_ARG_KW_ONLY | MP_ARG_INT, {.u_int = NDARRAY_FLOAT} },
    };

    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(2, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);

    uint16_t len = args[2].u_int;
    if(len < 2) {
        mp_raise_ValueError("number of points must be at least 2");
    }
    mp_float_t value, step;
    value = mp_obj_get_float(args[0].u_obj);
    uint8_t typecode = args[5].u_int;
    if(args[3].u_obj == mp_const_true) step = (mp_obj_get_float(args[1].u_obj)-value)/(len-1);
    else step = (mp_obj_get_float(args[1].u_obj)-value)/len;
    ndarray_obj_t *ndarray = create_new_ndarray(1, len, typecode);
    if(typecode == NDARRAY_UINT8) {
        uint8_t *array = (uint8_t *)ndarray->array->items;
        for(size_t i=0; i < len; i++, value += step) array[i] = (uint8_t)value;
    } else if(typecode == NDARRAY_INT8) {
        int8_t *array = (int8_t *)ndarray->array->items;
        for(size_t i=0; i < len; i++, value += step) array[i] = (int8_t)value;
    } else if(typecode == NDARRAY_UINT16) {
        uint16_t *array = (uint16_t *)ndarray->array->items;
        for(size_t i=0; i < len; i++, value += step) array[i] = (uint16_t)value;
    } else if(typecode == NDARRAY_INT16) {
        int16_t *array = (int16_t *)ndarray->array->items;
        for(size_t i=0; i < len; i++, value += step) array[i] = (int16_t)value;
    } else {
        mp_float_t *array = (mp_float_t *)ndarray->array->items;
        for(size_t i=0; i < len; i++, value += step) array[i] = value;
    }
    if(args[4].u_obj == mp_const_false) {
        return MP_OBJ_FROM_PTR(ndarray);
    } else {
        mp_obj_t tuple[2];
        tuple[0] = ndarray;
        tuple[1] = mp_obj_new_float(step);
        return mp_obj_new_tuple(2, tuple);
    }
}

mp_obj_t numerical_sum_mean_std_array(mp_obj_t oin, uint8_t optype) {
    mp_float_t value, sum = 0.0, sq_sum = 0.0;
    mp_obj_iter_buf_t iter_buf;
    mp_obj_t item, iterable = mp_getiter(oin, &iter_buf);
    mp_int_t len = mp_obj_get_int(mp_obj_len(oin));
    while ((item = mp_iternext(iterable)) != MP_OBJ_STOP_ITERATION) {
        value = mp_obj_get_float(item);
        sum += value;
        if(optype == NUMERICAL_STD) {
            sq_sum += value*value;
        }
    }
    if(optype ==  NUMERICAL_SUM) {
        return mp_obj_new_float(sum);
    } else if(optype == NUMERICAL_MEAN) {
        return mp_obj_new_float(sum/len);
    } else {
        sum /= len; // this is now the mean!
        return mp_obj_new_float(MICROPY_FLOAT_C_FUN(sqrt)(sq_sum/len-sum*sum));
    }
}

STATIC mp_float_t numerical_sum_mean_std_single_line(void *data, size_t start, size_t stop, 
                                                  size_t stride, uint8_t typecode, uint8_t optype) {
    
    mp_float_t sum = 0.0, sq_sum = 0.0, value;
    size_t len = 0;
    for(size_t i=start; i < stop; i+=stride, len++) {
        value = ndarray_get_float_value(data, typecode, i);        
        sum += value;
        if(optype == NUMERICAL_STD) {
            sq_sum += value*value;
        }
    }
    if(len == 0) {
        mp_raise_ValueError("data length is 0!");
    }
    if(optype ==  NUMERICAL_SUM) {
        return sum;
    } else if(optype == NUMERICAL_MEAN) {
        return sum/len;
    } else {
        sum /= len; // this is now the mean!
        return MICROPY_FLOAT_C_FUN(sqrt)(sq_sum/len-sum*sum);
    }
}

STATIC mp_obj_t numerical_sum_mean_std_matrix(mp_obj_t oin, mp_obj_t axis, uint8_t optype) {
    ndarray_obj_t *in = MP_OBJ_TO_PTR(oin);
    if((axis == mp_const_none) || (in->m == 1) || (in->n == 1)) { 
        // return the value for the flattened array
        return mp_obj_new_float(numerical_sum_mean_std_single_line(in->array->items, 0, 
                                                      in->array->len, 1, in->array->typecode, optype));
    } else {
        uint8_t _axis = mp_obj_get_int(axis);
        size_t m = (_axis == 0) ? 1 : in->m;
        size_t n = (_axis == 0) ? in->n : 1;
        size_t len = in->array->len;
        mp_float_t sms;
        // TODO: pass in->array->typcode to create_new_ndarray
        ndarray_obj_t *out = create_new_ndarray(m, n, NDARRAY_FLOAT);

        // TODO: these two cases could probably be combined in a more elegant fashion...
        if(_axis == 0) { // vertical
            for(size_t i=0; i < n; i++) {
                sms = numerical_sum_mean_std_single_line(in->array->items, i, len, 
                                                               n, in->array->typecode, optype);
                ((float_t *)out->array->items)[i] = sms;
            }
        } else { // horizontal
            for(size_t i=0; i < m; i++) {
                sms = numerical_sum_mean_std_single_line(in->array->items, i*in->n, 
                                                               (i+1)*in->n, 1, in->array->typecode, optype);
                ((float_t *)out->array->items)[i] = sms;
            }
        }
    return MP_OBJ_FROM_PTR(out);
    }
}

size_t numerical_argmin_argmax_array(ndarray_obj_t *in, size_t start, 
                                       size_t stop, size_t stride, uint8_t op) {
    size_t best_idx = start;
    if(in->array->typecode == NDARRAY_UINT8) {
        ARG_MIN_LOOP(in, uint8_t, start, stop, stride, op);
    } else if(in->array->typecode == NDARRAY_INT8) {
        ARG_MIN_LOOP(in, int8_t, start, stop, stride, op);
    } else if(in->array->typecode == NDARRAY_UINT16) {
        ARG_MIN_LOOP(in, uint16_t, start, stop, stride, op);
    } else if(in->array->typecode == NDARRAY_INT16) {
        ARG_MIN_LOOP(in, uint16_t, start, stop, stride, op);
    } else if(in->array->typecode == NDARRAY_FLOAT) {
        ARG_MIN_LOOP(in, mp_float_t, start, stop, stride, op);
    }
    return best_idx;
}

void copy_value_into_ndarray(ndarray_obj_t *target, ndarray_obj_t *source, size_t target_idx, size_t source_idx) {
    // since we are simply copying, it doesn't matter, whether the arrays are signed or unsigned, 
    // we can cast them in any way we like
    // This could also be done with byte copies. I don't know, whether that would have any benefits
    if((target->array->typecode == NDARRAY_UINT8) || (target->array->typecode == NDARRAY_INT8)) {
        ((uint8_t *)target->array->items)[target_idx] = ((uint8_t *)source->array->items)[source_idx];
    } else if((target->array->typecode == NDARRAY_UINT16) || (target->array->typecode == NDARRAY_INT16)) {
        ((uint16_t *)target->array->items)[target_idx] = ((uint16_t *)source->array->items)[source_idx];
    } else { 
        ((float *)target->array->items)[target_idx] = ((float *)source->array->items)[source_idx];
    }
}
 
STATIC mp_obj_t numerical_argmin_argmax(mp_obj_t oin, mp_obj_t axis, uint8_t optype) {
    if(MP_OBJ_IS_TYPE(oin, &mp_type_tuple) || MP_OBJ_IS_TYPE(oin, &mp_type_list) || 
        MP_OBJ_IS_TYPE(oin, &mp_type_range)) {
        // This case will work for single iterables only 
        size_t idx = 0, best_idx = 0;
        mp_obj_iter_buf_t iter_buf;
        mp_obj_t iterable = mp_getiter(oin, &iter_buf);
        mp_obj_t best_obj = MP_OBJ_NULL;
        mp_obj_t item;
        mp_uint_t op = MP_BINARY_OP_LESS;
        if((optype == NUMERICAL_ARGMAX) || (optype == NUMERICAL_MAX)) op = MP_BINARY_OP_MORE;
        while ((item = mp_iternext(iterable)) != MP_OBJ_STOP_ITERATION) {
            if ((best_obj == MP_OBJ_NULL) || (mp_binary_op(op, item, best_obj) == mp_const_true)) {
                best_obj = item;
                best_idx = idx;
            }
            idx++;
        }
        if((optype == NUMERICAL_ARGMIN) || (optype == NUMERICAL_ARGMAX)) {
            return MP_OBJ_NEW_SMALL_INT(best_idx);
        } else {
            return best_obj;
        }
    } else if(mp_obj_is_type(oin, &ulab_ndarray_type)) {
            ndarray_obj_t *in = MP_OBJ_TO_PTR(oin);
            size_t best_idx;
            if((axis == mp_const_none) || (in->m == 1) || (in->n == 1)) {
                // return the value for the flattened array                
                best_idx = numerical_argmin_argmax_array(in, 0, in->array->len, 1, optype);
                if((optype == NUMERICAL_ARGMIN) || (optype == NUMERICAL_ARGMAX)) {
                    return MP_OBJ_NEW_SMALL_INT(best_idx);
                } else {
                    if(in->array->typecode == NDARRAY_FLOAT) {
                        return mp_obj_new_float(ndarray_get_float_value(in->array->items, in->array->typecode, best_idx));
                    } else {
                        return mp_binary_get_val_array(in->array->typecode, in->array->items, best_idx);
                    }
                }
            } else { // we have to work with a full matrix here
                uint8_t _axis = mp_obj_get_int(axis);
                size_t m = (_axis == 0) ? 1 : in->m;
                size_t n = (_axis == 0) ? in->n : 1;
                size_t len = in->array->len;
                ndarray_obj_t *ndarray = NULL;
                if((optype == NUMERICAL_MAX) || (optype == NUMERICAL_MIN)) {
                    ndarray = create_new_ndarray(m, n, in->array->typecode);
                } else { // argmin/argmax
                    // TODO: one might get away with uint8_t, if both m, and n < 255
                    ndarray = create_new_ndarray(m, n, NDARRAY_UINT16);
                }

                // TODO: these two cases could probably be combined in a more elegant fashion...
                if(_axis == 0) { // vertical
                    for(size_t i=0; i < n; i++) {
                        best_idx = numerical_argmin_argmax_array(in, i, len, n, optype);
                        if((optype == NUMERICAL_MIN) || (optype == NUMERICAL_MAX)) {
                            copy_value_into_ndarray(ndarray, in, i, best_idx);
                        } else {
                            ((uint16_t *)ndarray->array->items)[i] = (uint16_t)(best_idx / n);
                        }
                    }
                } else { // horizontal
                    for(size_t i=0; i < m; i++) {
                        best_idx = numerical_argmin_argmax_array(in, i*in->n, (i+1)*in->n, 1, optype);
                        if((optype == NUMERICAL_MIN) || (optype == NUMERICAL_MAX)) {
                             copy_value_into_ndarray(ndarray, in, i, best_idx);
                        } else {
                            ((uint16_t *)ndarray->array->items)[i] = (uint16_t)(best_idx - i*in->n);
                        }
                    }
                }
                return MP_OBJ_FROM_PTR(ndarray);
            }
            return mp_const_none;
        }
    mp_raise_TypeError("input type is not supported");
}

STATIC mp_obj_t numerical_function(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args, uint8_t type) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj)} } ,
        { MP_QSTR_axis, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj)} },
    };

    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(1, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);
    
    mp_obj_t oin = args[0].u_obj;
    mp_obj_t axis = args[1].u_obj;
    if((axis != mp_const_none) && (mp_obj_get_int(axis) != 0) && (mp_obj_get_int(axis) != 1)) {
        // this seems to pass with False, and True...
        mp_raise_ValueError("axis must be None, 0, or 1");
    }
    
    if(MP_OBJ_IS_TYPE(oin, &mp_type_tuple) || MP_OBJ_IS_TYPE(oin, &mp_type_list) || 
        MP_OBJ_IS_TYPE(oin, &mp_type_range)) {
        switch(type) {
            case NUMERICAL_MIN:
            case NUMERICAL_ARGMIN:
            case NUMERICAL_MAX:
            case NUMERICAL_ARGMAX:
                return numerical_argmin_argmax(oin, axis, type);
            case NUMERICAL_SUM:
            case NUMERICAL_MEAN:
            case NUMERICAL_STD:
                return numerical_sum_mean_std_array(oin, type);
            default: // we should never reach this point, but whatever
                return mp_const_none;
        }
    } else if(MP_OBJ_IS_TYPE(oin, &ulab_ndarray_type)) {
        switch(type) {
            case NUMERICAL_MIN:
            case NUMERICAL_MAX:
            case NUMERICAL_ARGMIN:
            case NUMERICAL_ARGMAX:
                return numerical_argmin_argmax(oin, axis, type);
            case NUMERICAL_SUM:
            case NUMERICAL_MEAN:
            case NUMERICAL_STD:
                return numerical_sum_mean_std_matrix(oin, axis, type);            
            default:
                mp_raise_NotImplementedError("operation is not implemented on ndarrays");
        }
    } else {
        mp_raise_TypeError("input must be tuple, list, range, or ndarray");
    }
    return mp_const_none;
}

mp_obj_t numerical_min(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    return numerical_function(n_args, pos_args, kw_args, NUMERICAL_MIN);
}

mp_obj_t numerical_max(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    return numerical_function(n_args, pos_args, kw_args, NUMERICAL_MAX);
}

mp_obj_t numerical_argmin(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    return numerical_function(n_args, pos_args, kw_args, NUMERICAL_ARGMIN);
}

mp_obj_t numerical_argmax(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    return numerical_function(n_args, pos_args, kw_args, NUMERICAL_ARGMAX);
}

mp_obj_t numerical_sum(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    return numerical_function(n_args, pos_args, kw_args, NUMERICAL_SUM);
}

mp_obj_t numerical_mean(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    return numerical_function(n_args, pos_args, kw_args, NUMERICAL_MEAN);
}

mp_obj_t numerical_std(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    return numerical_function(n_args, pos_args, kw_args, NUMERICAL_STD);
}

mp_obj_t numerical_roll(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj) } },
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj) } },
        { MP_QSTR_axis, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj)} },
    };

    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(2, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);
    
    mp_obj_t oin = args[0].u_obj;
    int16_t shift = mp_obj_get_int(args[1].u_obj);
    if((args[2].u_obj != mp_const_none) && 
           (mp_obj_get_int(args[2].u_obj) != 0) && 
           (mp_obj_get_int(args[2].u_obj) != 1)) {
        mp_raise_ValueError("axis must be None, 0, or 1");
    }

    ndarray_obj_t *in = MP_OBJ_TO_PTR(oin);
    uint8_t _sizeof = mp_binary_get_size('@', in->array->typecode, NULL);
    size_t len;
    int16_t _shift;
    uint8_t *array = (uint8_t *)in->array->items;
    // TODO: transpose the matrix, if axis == 0. Though, that is hard on the RAM...
    if(shift < 0) {
        _shift = -shift;
    } else {
        _shift = shift;
    }
    if((args[2].u_obj == mp_const_none) || (mp_obj_get_int(args[2].u_obj) == 1)) { // shift horizontally
        uint16_t M;
        if(args[2].u_obj == mp_const_none) {
            len = in->array->len;
            M = 1;
        } else {
            len = in->n;
            M = in->m;
        }
        _shift = _shift % len;
        if(shift < 0) _shift = len - _shift;
        // TODO: if(shift > len/2), we should move in the opposite direction. That would save RAM
        _shift *= _sizeof;
        uint8_t *tmp = m_new(uint8_t, _shift);
        for(size_t m=0; m < M; m++) {
            memmove(tmp, &array[m*len*_sizeof], _shift);
            memmove(&array[m*len*_sizeof], &array[m*len*_sizeof+_shift], len*_sizeof-_shift);
            memmove(&array[(m+1)*len*_sizeof-_shift], tmp, _shift);
        }
        m_del(uint8_t, tmp, _shift);
        return mp_const_none;
    } else {
        len = in->m;
        // temporary buffer
        uint8_t *_data = m_new(uint8_t, _sizeof*len);
        
        _shift = _shift % len;
        if(shift < 0) _shift = len - _shift;
        _shift *= _sizeof;
        uint8_t *tmp = m_new(uint8_t, _shift);

        for(size_t n=0; n < in->n; n++) {
            for(size_t m=0; m < len; m++) {
                // this loop should fill up the temporary buffer
                memmove(&_data[m*_sizeof], &array[(m*in->n+n)*_sizeof], _sizeof);
            }
            // now, the actual shift
            memmove(tmp, _data, _shift);
            memmove(_data, &_data[_shift], len*_sizeof-_shift);
            memmove(&_data[len*_sizeof-_shift], tmp, _shift);
            for(size_t m=0; m < len; m++) {
                // this loop should dump the content of the temporary buffer into data
                memmove(&array[(m*in->n+n)*_sizeof], &_data[m*_sizeof], _sizeof);
            }            
        }
        m_del(uint8_t, tmp, _shift);
        m_del(uint8_t, _data, _sizeof*len);
        return mp_const_none;
    }
}

mp_obj_t numerical_flip(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj) } },
        { MP_QSTR_axis, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj)} },
    };

    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(1, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);
    
    if(!mp_obj_is_type(args[0].u_obj, &ulab_ndarray_type)) {
        mp_raise_TypeError("flip argument must be an ndarray");
    }
    if((args[1].u_obj != mp_const_none) && 
           (mp_obj_get_int(args[1].u_obj) != 0) && 
           (mp_obj_get_int(args[1].u_obj) != 1)) {
        mp_raise_ValueError("axis must be None, 0, or 1");
    }

    ndarray_obj_t *in = MP_OBJ_TO_PTR(args[0].u_obj);
    mp_obj_t oout = ndarray_copy(args[0].u_obj);
    ndarray_obj_t *out = MP_OBJ_TO_PTR(oout);
    uint8_t _sizeof = mp_binary_get_size('@', in->array->typecode, NULL);
    uint8_t *array_in = (uint8_t *)in->array->items;
    uint8_t *array_out = (uint8_t *)out->array->items;    
    size_t len;
    if((args[1].u_obj == mp_const_none) || (mp_obj_get_int(args[1].u_obj) == 1)) { // flip horizontally
        uint16_t M = in->m;
        len = in->n;
        if(args[1].u_obj == mp_const_none) { // flip flattened array
            len = in->array->len;
            M = 1;
        }
        for(size_t m=0; m < M; m++) {
            for(size_t n=0; n < len; n++) {
                memcpy(array_out+_sizeof*(m*len+n), array_in+_sizeof*((m+1)*len-n-1), _sizeof);
            }
        }
    } else { // flip vertically
        for(size_t m=0; m < in->m; m++) {
            for(size_t n=0; n < in->n; n++) {
                memcpy(array_out+_sizeof*(m*in->n+n), array_in+_sizeof*((in->m-m-1)*in->n+n), _sizeof);
            }
        }
    }
    return out;
}

mp_obj_t numerical_diff(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj) } },
        { MP_QSTR_n, MP_ARG_KW_ONLY | MP_ARG_INT, {.u_int = 1 } },
        { MP_QSTR_axis, MP_ARG_KW_ONLY | MP_ARG_INT, {.u_int = -1 } },
    };

    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(1, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);
    
    if(!mp_obj_is_type(args[0].u_obj, &ulab_ndarray_type)) {
        mp_raise_TypeError("diff argument must be an ndarray");
    }
    
    ndarray_obj_t *in = MP_OBJ_TO_PTR(args[0].u_obj);
    size_t increment, N, M;
    if((args[2].u_int == -1) || (args[2].u_int == 1)) { // differentiate along the horizontal axis
        increment = 1;
    } else if(args[2].u_int == 0) { // differtiate along vertical axis
        increment = in->n;
    } else {
        mp_raise_ValueError("axis must be -1, 0, or 1");        
    }
    if((args[1].u_int < 0) || (args[1].u_int > 9)) {
        mp_raise_ValueError("n must be between 0, and 9");
    }
    uint8_t n = args[1].u_int;
    int8_t *stencil = m_new(int8_t, n+1);
    stencil[0] = 1;
    for(uint8_t i=1; i < n+1; i++) {
        stencil[i] = -stencil[i-1]*(n-i+1)/i;
    }

    ndarray_obj_t *out;
    
    if(increment == 1) { // differentiate along the horizontal axis 
        if(n >= in->n) {
            out = create_new_ndarray(in->m, 0, in->array->typecode);
            m_del(uint8_t, stencil, n);
            return MP_OBJ_FROM_PTR(out);
        }
        N = in->n - n;
        M = in->m;
    } else { // differentiate along vertical axis
        if(n >= in->m) {
            out = create_new_ndarray(0, in->n, in->array->typecode);
            m_del(uint8_t, stencil, n);
            return MP_OBJ_FROM_PTR(out);
        }
        M = in->m - n;
        N = in->n;
    }
    out = create_new_ndarray(M, N, in->array->typecode);
    if(in->array->typecode == NDARRAY_UINT8) {
        CALCULATE_DIFF(in, out, uint8_t, M, N, in->n, increment);
    } else if(in->array->typecode == NDARRAY_INT8) {
        CALCULATE_DIFF(in, out, int8_t, M, N, in->n, increment);
    }  else if(in->array->typecode == NDARRAY_UINT16) {
        CALCULATE_DIFF(in, out, uint16_t, M, N, in->n, increment);
    } else if(in->array->typecode == NDARRAY_INT16) {
        CALCULATE_DIFF(in, out, int16_t, M, N, in->n, increment);
    } else {
        CALCULATE_DIFF(in, out, mp_float_t, M, N, in->n, increment);
    }
    m_del(int8_t, stencil, n);
    return MP_OBJ_FROM_PTR(out);
}

mp_obj_t numerical_sort_helper(mp_obj_t oin, mp_obj_t axis, uint8_t inplace) {
    if(!mp_obj_is_type(oin, &ulab_ndarray_type)) {
        mp_raise_TypeError("sort argument must be an ndarray");
    }

    ndarray_obj_t *ndarray;
    mp_obj_t out;
    if(inplace == 1) {
        ndarray = MP_OBJ_TO_PTR(oin);
    } else {
        out = ndarray_copy(oin);
        ndarray = MP_OBJ_TO_PTR(out);
    }
    size_t increment, start_inc, end, N;
    if(axis == mp_const_none) { // flatten the array
        ndarray->m = 1;
        ndarray->n = ndarray->array->len;
        increment = 1;
        start_inc = ndarray->n;
        end = ndarray->n;
        N = ndarray->n;
    } else if((mp_obj_get_int(axis) == -1) || 
              (mp_obj_get_int(axis) == 1)) { // sort along the horizontal axis
        increment = 1;
        start_inc = ndarray->n;
        end = ndarray->array->len;
        N = ndarray->n;
    } else if(mp_obj_get_int(axis) == 0) { // sort along vertical axis
        increment = ndarray->n;
        start_inc = 1;
        end = ndarray->m;
        N = ndarray->m;
    } else {
        mp_raise_ValueError("axis must be -1, 0, None, or 1");        
    }
    
    size_t q, k, p, c;

    for(size_t start=0; start < end; start+=start_inc) {
        q = N; 
        k = (q >> 1);
        if((ndarray->array->typecode == NDARRAY_UINT8) || (ndarray->array->typecode == NDARRAY_INT8)) {
            HEAPSORT(uint8_t, ndarray);
        } else if((ndarray->array->typecode == NDARRAY_INT16) || (ndarray->array->typecode == NDARRAY_INT16)) {
            HEAPSORT(uint16_t, ndarray);
        } else {
            HEAPSORT(mp_float_t, ndarray);
        }
    }
    if(inplace == 1) {
        return mp_const_none;
    } else {
        return out;
    }
}

// numpy function
mp_obj_t numerical_sort(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj) } },
        { MP_QSTR_axis, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_int = -1 } },
    };

    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(1, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);

    return numerical_sort_helper(args[0].u_obj, args[1].u_obj, 0);
}
// method of an ndarray
mp_obj_t numerical_sort_inplace(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj) } },
        { MP_QSTR_axis, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_int = -1 } },
    };

    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(1, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);

    return numerical_sort_helper(args[0].u_obj, args[1].u_obj, 1);
}

mp_obj_t numerical_argsort(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
    static const mp_arg_t allowed_args[] = {
        { MP_QSTR_, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_PTR(&mp_const_none_obj) } },
        { MP_QSTR_axis, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_int = -1 } },
    };
    mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
    mp_arg_parse_all(1, pos_args, kw_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);
    if(!mp_obj_is_type(args[0].u_obj, &ulab_ndarray_type)) {
        mp_raise_TypeError("argsort argument must be an ndarray");
    }

    ndarray_obj_t *ndarray = MP_OBJ_TO_PTR(args[0].u_obj);
    size_t increment, start_inc, end, N, m, n;
    if(args[1].u_obj == mp_const_none) { // flatten the array
        m = 1;
        n = ndarray->array->len;
        ndarray->m = m;
        ndarray->n = n;
        increment = 1;
        start_inc = ndarray->n;
        end = ndarray->n;
        N = n;
    } else if((mp_obj_get_int(args[1].u_obj) == -1) || 
              (mp_obj_get_int(args[1].u_obj) == 1)) { // sort along the horizontal axis
        m = ndarray->m;
        n = ndarray->n;
        increment = 1;
        start_inc = n;
        end = ndarray->array->len;
        N = n;
    } else if(mp_obj_get_int(args[1].u_obj) == 0) { // sort along vertical axis
        m = ndarray->m;
        n = ndarray->n;
        increment = n;
        start_inc = 1;
        end = m;
        N = m;
    } else {
        mp_raise_ValueError("axis must be -1, 0, None, or 1");
    }

    // at the expense of flash, we could save RAM by creating 
    // an NDARRAY_UINT16 ndarray only, if needed, otherwise, NDARRAY_UINT8
    ndarray_obj_t *indices = create_new_ndarray(m, n, NDARRAY_UINT16);
    uint16_t *index_array = (uint16_t *)indices->array->items;
    // initialise the index array
    // if array is flat: 0 to indices->n
    // if sorting vertically, identical indices are arranged row-wise
    // if sorting horizontally, identical indices are arranged colunn-wise
    for(uint16_t start=0; start < end; start+=start_inc) {
        for(uint16_t s=0; s < N; s++) {
            index_array[start+s*increment] = s;
        }
    }

    size_t q, k, p, c;
    for(size_t start=0; start < end; start+=start_inc) {
        q = N; 
        k = (q >> 1);
        if((ndarray->array->typecode == NDARRAY_UINT8) || (ndarray->array->typecode == NDARRAY_INT8)) {
            HEAP_ARGSORT(uint8_t, ndarray, index_array);
        } else if((ndarray->array->typecode == NDARRAY_INT16) || (ndarray->array->typecode == NDARRAY_INT16)) {
            HEAP_ARGSORT(uint16_t, ndarray, index_array);
        } else {
            HEAP_ARGSORT(mp_float_t, ndarray, index_array);
        }
    }
    return MP_OBJ_FROM_PTR(indices);
}

written 28599 bytes to numerical.c


# ulab module

This module simply brings all components together, and does not contain new function definitions.

## ulab.c

In [9]:
%%ccode ulab.c

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "py/runtime.h"
#include "py/binary.h"
#include "py/obj.h"
#include "py/objarray.h"

#include "ndarray.h"
#include "linalg.h"
#include "vectorise.h"
#include "poly.h"
#include "fft.h"
#include "numerical.h"

#define ULAB_VERSION 0.263

typedef struct _mp_obj_float_t {
    mp_obj_base_t base;
    mp_float_t value;
} mp_obj_float_t;

mp_obj_float_t ulab_version = {{&mp_type_float}, ULAB_VERSION};

MP_DEFINE_CONST_FUN_OBJ_1(ndarray_shape_obj, ndarray_shape);
MP_DEFINE_CONST_FUN_OBJ_1(ndarray_rawsize_obj, ndarray_rawsize);
MP_DEFINE_CONST_FUN_OBJ_KW(ndarray_flatten_obj, 1, ndarray_flatten);
MP_DEFINE_CONST_FUN_OBJ_1(ndarray_asbytearray_obj, ndarray_asbytearray);

MP_DEFINE_CONST_FUN_OBJ_1(linalg_transpose_obj, linalg_transpose);
MP_DEFINE_CONST_FUN_OBJ_2(linalg_reshape_obj, linalg_reshape);
MP_DEFINE_CONST_FUN_OBJ_KW(linalg_size_obj, 1, linalg_size);
MP_DEFINE_CONST_FUN_OBJ_1(linalg_inv_obj, linalg_inv);
MP_DEFINE_CONST_FUN_OBJ_2(linalg_dot_obj, linalg_dot);
MP_DEFINE_CONST_FUN_OBJ_KW(linalg_zeros_obj, 0, linalg_zeros);
MP_DEFINE_CONST_FUN_OBJ_KW(linalg_ones_obj, 0, linalg_ones);
MP_DEFINE_CONST_FUN_OBJ_KW(linalg_eye_obj, 0, linalg_eye);
MP_DEFINE_CONST_FUN_OBJ_1(linalg_det_obj, linalg_det);
MP_DEFINE_CONST_FUN_OBJ_1(linalg_eig_obj, linalg_eig);

MP_DEFINE_CONST_FUN_OBJ_1(vectorise_acos_obj, vectorise_acos);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_acosh_obj, vectorise_acosh);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_asin_obj, vectorise_asin);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_asinh_obj, vectorise_asinh);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_atan_obj, vectorise_atan);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_atanh_obj, vectorise_atanh);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_ceil_obj, vectorise_ceil);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_cos_obj, vectorise_cos);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_erf_obj, vectorise_erf);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_erfc_obj, vectorise_erfc);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_exp_obj, vectorise_exp);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_expm1_obj, vectorise_expm1);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_floor_obj, vectorise_floor);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_gamma_obj, vectorise_gamma);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_lgamma_obj, vectorise_lgamma);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_log_obj, vectorise_log);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_log10_obj, vectorise_log10);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_log2_obj, vectorise_log2);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_sin_obj, vectorise_sin);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_sinh_obj, vectorise_sinh);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_sqrt_obj, vectorise_sqrt);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_tan_obj, vectorise_tan);
MP_DEFINE_CONST_FUN_OBJ_1(vectorise_tanh_obj, vectorise_tanh);

STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_linspace_obj, 2, numerical_linspace);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_sum_obj, 1, numerical_sum);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_mean_obj, 1, numerical_mean);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_std_obj, 1, numerical_std);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_min_obj, 1, numerical_min);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_max_obj, 1, numerical_max);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_argmin_obj, 1, numerical_argmin);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_argmax_obj, 1, numerical_argmax);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_roll_obj, 2, numerical_roll);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_flip_obj, 1, numerical_flip);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_diff_obj, 1, numerical_diff);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_sort_obj, 1, numerical_sort);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_sort_inplace_obj, 1, numerical_sort_inplace);
STATIC MP_DEFINE_CONST_FUN_OBJ_KW(numerical_argsort_obj, 1, numerical_argsort);

STATIC MP_DEFINE_CONST_FUN_OBJ_2(poly_polyval_obj, poly_polyval);
STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(poly_polyfit_obj, 2, 3, poly_polyfit);

STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(fft_fft_obj, 1, 2, fft_fft);
STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(fft_ifft_obj, 1, 2, fft_ifft);
STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(fft_spectrum_obj, 1, 2, fft_spectrum);

STATIC const mp_rom_map_elem_t ulab_ndarray_locals_dict_table[] = {
    { MP_ROM_QSTR(MP_QSTR_shape), MP_ROM_PTR(&ndarray_shape_obj) },
    { MP_ROM_QSTR(MP_QSTR_rawsize), MP_ROM_PTR(&ndarray_rawsize_obj) },
    { MP_ROM_QSTR(MP_QSTR_flatten), MP_ROM_PTR(&ndarray_flatten_obj) },    
    { MP_ROM_QSTR(MP_QSTR_asbytearray), MP_ROM_PTR(&ndarray_asbytearray_obj) },
    { MP_ROM_QSTR(MP_QSTR_transpose), MP_ROM_PTR(&linalg_transpose_obj) },
    { MP_ROM_QSTR(MP_QSTR_reshape), MP_ROM_PTR(&linalg_reshape_obj) },
    { MP_ROM_QSTR(MP_QSTR_sort), MP_ROM_PTR(&numerical_sort_inplace_obj) },
};

STATIC MP_DEFINE_CONST_DICT(ulab_ndarray_locals_dict, ulab_ndarray_locals_dict_table);

const mp_obj_type_t ulab_ndarray_type = {
    { &mp_type_type },
    .name = MP_QSTR_ndarray,
    .print = ndarray_print,
    .make_new = ndarray_make_new,
    .subscr = ndarray_subscr,
    .getiter = ndarray_getiter,
    .unary_op = ndarray_unary_op,
    .binary_op = ndarray_binary_op,
    .locals_dict = (mp_obj_dict_t*)&ulab_ndarray_locals_dict,
};

STATIC const mp_map_elem_t ulab_globals_table[] = {
    { MP_OBJ_NEW_QSTR(MP_QSTR___name__), MP_OBJ_NEW_QSTR(MP_QSTR_ulab) },
    { MP_ROM_QSTR(MP_QSTR___version__), MP_ROM_PTR(&ulab_version) },
    { MP_OBJ_NEW_QSTR(MP_QSTR_array), (mp_obj_t)&ulab_ndarray_type },
    { MP_OBJ_NEW_QSTR(MP_QSTR_size), (mp_obj_t)&linalg_size_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_inv), (mp_obj_t)&linalg_inv_obj },
    { MP_ROM_QSTR(MP_QSTR_dot), (mp_obj_t)&linalg_dot_obj },
    { MP_ROM_QSTR(MP_QSTR_zeros), (mp_obj_t)&linalg_zeros_obj },
    { MP_ROM_QSTR(MP_QSTR_ones), (mp_obj_t)&linalg_ones_obj },
    { MP_ROM_QSTR(MP_QSTR_eye), (mp_obj_t)&linalg_eye_obj },
    { MP_ROM_QSTR(MP_QSTR_det), (mp_obj_t)&linalg_det_obj },
    { MP_ROM_QSTR(MP_QSTR_eig), (mp_obj_t)&linalg_eig_obj },    
    { MP_OBJ_NEW_QSTR(MP_QSTR_acos), (mp_obj_t)&vectorise_acos_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_acosh), (mp_obj_t)&vectorise_acosh_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_asin), (mp_obj_t)&vectorise_asin_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_asinh), (mp_obj_t)&vectorise_asinh_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_atan), (mp_obj_t)&vectorise_atan_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_atanh), (mp_obj_t)&vectorise_atanh_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_ceil), (mp_obj_t)&vectorise_ceil_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_cos), (mp_obj_t)&vectorise_cos_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_erf), (mp_obj_t)&vectorise_erf_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_erfc), (mp_obj_t)&vectorise_erfc_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_exp), (mp_obj_t)&vectorise_exp_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_expm1), (mp_obj_t)&vectorise_expm1_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_floor), (mp_obj_t)&vectorise_floor_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_gamma), (mp_obj_t)&vectorise_gamma_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_lgamma), (mp_obj_t)&vectorise_lgamma_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_log), (mp_obj_t)&vectorise_log_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_log10), (mp_obj_t)&vectorise_log10_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_log2), (mp_obj_t)&vectorise_log2_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_sin), (mp_obj_t)&vectorise_sin_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_sinh), (mp_obj_t)&vectorise_sinh_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_sqrt), (mp_obj_t)&vectorise_sqrt_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_tan), (mp_obj_t)&vectorise_tan_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_tanh), (mp_obj_t)&vectorise_tanh_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_linspace), (mp_obj_t)&numerical_linspace_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_sum), (mp_obj_t)&numerical_sum_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_mean), (mp_obj_t)&numerical_mean_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_std), (mp_obj_t)&numerical_std_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_min), (mp_obj_t)&numerical_min_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_max), (mp_obj_t)&numerical_max_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_argmin), (mp_obj_t)&numerical_argmin_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_argmax), (mp_obj_t)&numerical_argmax_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_roll), (mp_obj_t)&numerical_roll_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_flip), (mp_obj_t)&numerical_flip_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_diff), (mp_obj_t)&numerical_diff_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_sort), (mp_obj_t)&numerical_sort_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_argsort), (mp_obj_t)&numerical_argsort_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_polyval), (mp_obj_t)&poly_polyval_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_polyfit), (mp_obj_t)&poly_polyfit_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_fft), (mp_obj_t)&fft_fft_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_ifft), (mp_obj_t)&fft_ifft_obj },
    { MP_OBJ_NEW_QSTR(MP_QSTR_spectrum), (mp_obj_t)&fft_spectrum_obj },
    // class constants
    { MP_ROM_QSTR(MP_QSTR_uint8), MP_ROM_INT(NDARRAY_UINT8) },
    { MP_ROM_QSTR(MP_QSTR_int8), MP_ROM_INT(NDARRAY_INT8) },
    { MP_ROM_QSTR(MP_QSTR_uint16), MP_ROM_INT(NDARRAY_UINT16) },
    { MP_ROM_QSTR(MP_QSTR_int16), MP_ROM_INT(NDARRAY_INT16) },
    { MP_ROM_QSTR(MP_QSTR_float), MP_ROM_INT(NDARRAY_FLOAT) },
};

STATIC MP_DEFINE_CONST_DICT (
    mp_module_ulab_globals,
    ulab_globals_table
);

const mp_obj_module_t ulab_user_cmodule = {
    .base = { &mp_type_module },
    .globals = (mp_obj_dict_t*)&mp_module_ulab_globals,
};

MP_REGISTER_MODULE(MP_QSTR_ulab, ulab_user_cmodule, MODULE_ULAB_ENABLED);

written 9752 bytes to ulab.c


## makefile

In [411]:
%%writefile ../../../ulab/code/micropython.mk

USERMODULES_DIR := $(USERMOD_DIR)

# Add all C files to SRC_USERMOD.
SRC_USERMOD += $(USERMODULES_DIR)/ndarray.c
SRC_USERMOD += $(USERMODULES_DIR)/linalg.c
SRC_USERMOD += $(USERMODULES_DIR)/vectorise.c
SRC_USERMOD += $(USERMODULES_DIR)/poly.c
SRC_USERMOD += $(USERMODULES_DIR)/fft.c
SRC_USERMOD += $(USERMODULES_DIR)/numerical.c
SRC_USERMOD += $(USERMODULES_DIR)/ulab.c

# We can add our module folder to include paths if needed
# This is not actually needed in this example.
CFLAGS_USERMOD += -I$(USERMODULES_DIR)

Overwriting ../../../ulab/code/micropython.mk


## make

### unix port

In [6]:
%cd ../../../micropython/ports/unix/

/home/v923z/sandbox/micropython/v1.11/micropython/ports/unix


In [7]:
!make clean

Use make V=1 or set BUILD_VERBOSE in your environment to increase build verbosity.
rm -f micropython
rm -f micropython.map
rm -rf build 


In [None]:
!make USER_C_MODULES=../../../ulab all

### stm32 port

In [162]:
%cd ../../../micropython/ports/stm32/

/home/v923z/sandbox/micropython/v1.11/micropython/ports/stm32


In [None]:
!make BOARD=PYBV11 USER_C_MODULES=../../../ulab all

# Change log

In [10]:
%%writefile ../../../ulab/docs/ulab-change-log.md

Tue, 31 Dec 2019

version 0.263

    changed declaration of ulab_ndarray_type to extern

Fri, 29 Nov 2019

version 0.262

    fixed error in macro in vectorise.h

Thu, 28 Nov 2019

version 0.261

    fixed bad indexing error in linalg.dot

Tue, 6 Nov 2019

version 0.26

    added in-place sorting (method of ndarray), and argsort
    
Mon, 4 Nov 2019

version 0.25

    added first implementation of sort, and fixed section on compiling the module in the manual

Thu, 31 Oct 2019

version 0.24

    added diff to numerical.c
    
Tue, 29 Oct 2019

version 0.23

    major revamp of subscription method

Sat, 19 Oct 2019

version 0.21

    fixed trivial bug in .rawsize()

Sat, 19 Oct 2019

version 0.22

    fixed small error in linalg_det, and implemented linalg_eig.


Thu, 17 Oct 2019

version 0.21

    implemented uniform interface for fft, and spectrum, and added ifft.

Wed, 16 Oct 2019

version 0.20

    Added flip function to numerical.c, and moved the size function to linalg. In addition, 
    size is a function now, and not a method.

Tue, 15 Oct 2019

version 0.19

    fixed roll in numerical.c: it can now accept the axis=None keyword argument, added determinant to linalg.c

Mon, 14 Oct 2019

version 0.18

    fixed min/man function in numerical.c; it conforms to numpy behaviour

Fri, 11 Oct 2019

version 0.171

    found and fixed small bux in roll function

Fri, 11 Oct 2019

version 0.17

    universal function can now take arbitrary typecodes

Fri, 11 Oct 2019

version 0.161

    fixed bad error in iterator, and make_new_ndarray 
    
Thu, 10 Oct 2019

varsion 0.16

    changed ndarray to array in ulab.c, so as to conform to numpy's notation
    extended subscr method to include slices (partially works)
    
Tue, 8 Oct 2019

version 0.15

    added inv, neg, pos, and abs unary operators to ndarray.c
    
Mon, 7 Oct 2019

version 0.14

    made the internal binary_op function tighter, and added keyword arguments to linspace
    
Sat, 4 Oct 2019

version 0.13

    added the <, <=, >, >= binary operators to ndarray

Fri, 4 Oct 2019

version 0.12

    added .flatten to ndarray, ones, zeros, and eye to linalg

Thu, 3 Oct 2019

version 0.11
    
    binary operators are now based on macros

Overwriting ../../../ulab/docs/ulab-change-log.md
