# Using Fortran for Jupyter Notebook


Numpy is a great tool, however sometimes the speed of numpy is not enough for writing some fast scientific computing code.  For example,
- The speed of vector operations and element-wise operations has significant difference. 
    - Thinking of Jacobi and Gauss-Seidel methods, the later one has to be implementing in an element-wise way.

Of course, you can compile function in the dynamic library and write python wrappers, but there is a much convenient way.  You can write functions in fortran and use those directly from python code.

First you'll need to install:
- pip install fortran-magic

In [1]:
import numpy as np
x = np.random.normal(size=1000)
y = np.random.normal(size=1000)

def py_add(x, y):
    z = np.empty(np.size(x))
    for i in range(np.size(x)):
        z[i] = x[i] + y[i]
    return z

## Numpy
### vertor add operation

In [2]:
%%timeit
z = x+y

1.19 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


### element-wise add operation

In [3]:
%%timeit
z = py_add(x, y)

389 µs ± 5.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


- One can see that in numpy, the vector operation is much faster than the element-wise operation

## Fortran 
### vertor add operation

In [4]:
# this need to be run in a separate cell before defining a fortran function.
%load_ext fortranmagic

  self._lib_dir = os.path.join(get_ipython_cache_dir(), 'fortran')


In [6]:
%%fortran
# also the fortran function need to be defined and run in a separate cell before using
subroutine add_function(x, y, z)
    real, intent(in) :: x(:), y(:)
    real, intent(out) :: z(size(x))
    z = x + y
end subroutine add_function

In [10]:
%%timeit
z = add_function(x, y)

3.6 µs ± 114 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


### element-wise add operation

In [7]:
%%fortran
subroutine add_function2(x, y, z)
    real, intent(in) :: x(:), y(:)
    real, intent(out) :: z(size(x))
    integer :: i
    do i = 1, size(x)
        z(i) = x(i) + y(i)
    end do
end subroutine add_function2

In [11]:
%%timeit
z = add_function2(x, y)

3.63 µs ± 272 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


- One can see that in Fortran, the vector operation has almost the same speed as the element-wise operation 
- Clearly, the element-wise operation in Fortran is much more faster than in numpy
- However, why the vector operation in Fortran is slower than in numpy?
    - well, at this moment, I do not know what is really going on, interesting!
    - openmp and O3 optimization seem not the problem 

### More about Using Fortran in Python
- The -vvv option, this will print all compiler warnings and output
- The --opt='-O3' for O3 optimization level
- If you want to use openmp with fortran in python, the following is how you can do it with gfortan:

In [12]:
%%fortran -vvv --f90flags='-fopenmp' --extra='-lgomp'
subroutine add_function3(x, y, z)
    use omp_lib
    real, intent(in) :: x(:), y(:)
    real, intent(out) :: z(size(x))
    z = x + y
end subroutine add_function3

Running...
   /Users/xiaozhouli/anaconda/bin/python -m numpy.f2py --f90flags='-fopenmp' -lgomp -m _fortran_magic_6defa99ba4d8df3103ce6b9c126d9153 -c /Users/xiaozhouli/.ipython/fortran/_fortran_magic_6defa99ba4d8df3103ce6b9c126d9153.f90
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "_fortran_magic_6defa99ba4d8df3103ce6b9c126d9153" sources
f2py options: []
f2py:> /var/folders/y8/ym0vp3qj65x661bsr96p3_600000gn/T/tmpc1zeo96z/src.macosx-10.7-x86_64-3.6/_fortran_magic_6defa99ba4d8df3103ce6b9c126d9153module.c
creating /var/folders/y8/ym0vp3qj65x661bsr96p3_600000gn/T/tmpc1zeo96z/src.macosx-10.7-x86_64-3.6
Reading fortran codes...
	Reading file '/Users/xiaozhouli/.ipython/fortran/_fortran_magic_6defa99ba4d8df3103ce6b9c126d9153.f90' (format:free)
Post-processing...
	Block: _fortr


Ok. The following fortran objects are ready to use: add_function3


In [13]:
%%timeit
z = add_function3(x, y)

3.54 µs ± 153 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [14]:
%%fortran --opt='-O3'
subroutine add_function4(x, y, z)
    use omp_lib
    real, intent(in) :: x(:), y(:)
    real, intent(out) :: z(size(x))
    z = x + y
end subroutine add_function4

In [15]:
%%timeit
z = add_function4(x, y)

3.77 µs ± 177 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
