# C Output and Parameter Interfaces

### NRPy+ Source Code for this module: 
+ ### [Coutput.py](../edit/Coutput.py)
+ ### [NRPy_param_funcs.py](../edit/NRPy_param_funcs.py)

## Useful reference material. This is required reading if you are unfamiliar with programming or [computer algebra systems](https://en.wikipedia.org/wiki/Computer_algebra_system). Otherwise, you should be able to pick up the syntax as you follow the tutorial.
+ ### [Python Tutorial](https://docs.python.org/3/tutorial/index.html)
+ ### [SymPy Tutorial](http://docs.sympy.org/latest/tutorial/intro.html)

## Common Subexpression Elimination in [SymPy](http://www.sympy.org/)

Let's start with a simple SymPy worksheet that makes use of SymPy's built in C code generator function, [ccode](http://docs.sympy.org/dev/modules/utilities/codegen.html)(), to evaluate the expression $x = b \sin a + \frac{c}{\sin 2a}$.

In [73]:
# First import the SymPy package. 
#  Importing in this way enables us to access all of sympy's functions
#  provided the function call is prefixed by "sp."
import sympy as sp

# Declare some variables
a,b,c = sp.symbols("a b c")

# Set x = b*sin(a) + c/sin(a).
x = b*sp.sin(2*a) + c/(sp.sin(2*a))

# Convert the expression into C code
sp.ccode(x)

'b*sin(2*a) + c/sin(2*a)'

Notice computation of the above expression in C requires 3 multiplications, one division, two sin() function calls, and one addition. Multiplications, additions, and subtractions typically require one clock cycle per SIMD element on a modern CPU, while divisions can require ~3x longer, and transcendental functions ~20x longer than adds/multiplies (See, e.g., [this page](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX&expand=118), [this page](http://www.agner.org/optimize/microarchitecture.pdf), or [this page](http://nicolas.limare.net/pro/notes/2014/12/16_math_speed/) for more details). 

Our goal in generating C codes is to minimize the number of floating point operations, and SymPy provides a means to do this, known as [common subexpression elimination](https://en.wikipedia.org/wiki/Common_subexpression_elimination), or CSE.

CSE algorithms search for common patterns within expressions and declares them as new variables, so they need not be computed again. To call SymPy's CSE algorithm, we need only pass the expression to $\texttt{sp.cse}$():

In [72]:
print(sp.cse(x))

([(x0, sin(2*a))], [b*x0 + c/x0])


Interpreting the above, SymPy returned a list with two elements: the first element, $(\texttt{x0, sin(2*a)})$, declares a new variable $\texttt{x0}$, which should be set to $\texttt{sin(2*a)}$. The second element yields the expression for our original expression $x$ in terms of original variables as well as the new variable $\texttt{x0}$. 

$$\texttt{x0} = sin(2*a)$$ is the common subexpression, so that the final expression $x$ is given by $$x = b*\texttt{x0} + c/\texttt{x0}.$$

Thus, at the cost of a new variable assignment, SymPy's CSE has decreased the computational cost by one multiplication and one sin() function call.

NRPy+ makes full use of SymPy's CSE algorithm in generating optimized C codes.

*Caveat: In order for a CSE to function optimally, it needs to know something about the cost of basic mathematical operations versus the cost of declaring a new variable. SymPy's CSE algorithm does not make any assumptions about cost, instead opting to declare new variables any time a common pattern is found more than once. The degree to which this is suboptimal is unclear.*

## NRPy+'s C code output routine, Coutput()

NRPy+ is capable