#  Hand-compiling C programs


Enter your name and student ID.

 * Name:
 * Student ID:


* <font color="red">As this notebook is rather long, you may want to use the index shown by clicking the list icon on the left</font>

<a name="intro"> </a>
# 1. Introduction
* your final goal of the course is to build a compiler of a tiny subset of C language
* a compiler reads a program, builds a parse tree, and finally translates it to a _<font color="blue">machine code</font>_
* _<font color="blue">a machine code</font>_ is a series of instructions a processor can directly execute and is formatted in a way a machine can easily interpret (e.g., an instruction has a fixed number of bits, a fixed part of which encodes the kind of instructions)
* _<font color="blue">an assembly code</font>_ is practically the same as a machine code (a single assembly instruction almost directly corresponds to a single machine instruction) but is formatted as a text easier for humans to read
* it is common that a compiler actually generates an assembly code, which is then converted to a real machine code by another software (which you can consider as a stage in a compilation process), called _<font color="blue">an assembler</font>_
* before actually building a compiler, you need to know the assembly code of the target machine and at least roughly know how expressions and statements in the source language are mapped to assembly code
* even if you are not building a compiler, such a knowledge is useful in its own right to understand how programming languages work, to write an efficient program in them, and/or to diagnose your programs (especially for unsafe languages such as C or C++)
* to that end, the goal of this notebook is to look at and understand assembly code generated from C language and translate some C functions into assembly by hand

<a name="let_the_compiler_generate_assembly"> </a>
# 2. Let the C compiler generate assembly code
* `gcc/g++` is a compiler for C/C++
* `-S` option instructs the compiler to emit assembly code and stop the compilation after that


In [None]:
%%writefile call.c
long bbb();

long aaa() {
  return bbb() + 1;
}

In [None]:
gcc -O3 -S call.c
cat call.s

* we choose 64 bit ARM instruction set (called _<font color="blue">arm64</font>_ or _<font color="blue">aarch64</font>_) as the target assembly code
 * this notebook teaches you the necessary part of arm64 assembly. don't worry if you do not know about it
 * this notebook requires a basic knowledge about C language; if you are new to C, study a C primer for this notebook
 * nobody can remember all instructions. the point is to understand the minimum set of instructions necessary by a compiler and to be able to examine instructions corresponding to specific C expressions
 <!-- * [reference](https://www.felixcloutier.com/x86/) -->
 * [cheat sheet](https://taura.github.io/programming-languages/html/arm64_assembly_cheat_sheet.html)
 * [a quick reference](https://developer.arm.com/documentation/ddi0602/latest/)

* below, you are going to learn what kind of expressions or statements are converted to what kind of assembly code, gradually changing functions from trivial ones to more substantial ones

<a name="arithmetic"> </a>
# 3. Arithmetic

In [None]:
%%writefile arith.c
long add(long x, long y, long z) {
  return x + y + z + 150;
}

long sub(long x, long y, long z) {
  return x - y - z - 150;
}

long mul(long x, long y, long z) {
  return x * y * z;
}

long div(long x, long y, long z) {
  return x / y / z;
}

double fadd(double x, double y, double z) {
  return x + y + z + 1.25;
}

double fsub(double x, double y, double z) {
  return x - y - z - 1.25;
}

double fmul(double x, double y, double z) {
  return x * y * z * 1.25;
}

double fdiv(double x, double y, double z) {
  return x / y / z / 1.25;
}

* compile it by:


In [None]:
gcc -O3 -S arith.c
cat arith.s

* observe:
  * parameters are passed following the ARM64 ABI (first three integer parameters on `x0, x1,` and `x2` and first three floating-point number parameters on `d0, d1,` and `d2`)
  * return value is returned following the ARM64 ABI, too (integer on `x0` and floating-point number on `d0`)
  * there are instructions corresponding to integer/floating-point number arithmetic

<a name="problem_ax_by_cz"> </a>
# <font color="green"> Problem 1 :  ax+by+cz</font>
* write a function `ax_by_cz` that returns ax+by+cz for a given input a, x, b, y, c, and z (whose types are long) __in assembly__

* that is, the translation of the following C function
```
long ax_by_cz(long a, long x, long b, long y, long c, long z) {
  return a * x + b * y + c * z;
}
```

* fill the following assembly function in assembly code (what follows `------- write your answer here -------`) with instructions and execute it

* it is saved with the name `ax_by_dz.s`


In [None]:
BEGIN SOLUTION
END SOLUTION
%%writefile ax_by_cz.s
	.arch armv8-a
	.file	"ax_by_cz.c"
	.text
	.align	2
	.p2align 4,,11
	.global	ax_by_cz
	.type	ax_by_cz, %function
ax_by_cz:
.LFB0:
	.cfi_startproc
	// ------- write your answer here -------
	.cfi_endproc
.LFE0:
	.size	ax_by_cz, .-ax_by_cz
	.ident	"GCC: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0"
	.section	.note.GNU-stack,"",@progbits

* compile it with the following C code that calls your `ax_by_cz` function


In [None]:
BEGIN SOLUTION
END SOLUTION
%%writefile check_ax_by_cz.c
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
long ax_by_cz(long a, long x, long b, long y, long c, long z);

int main(int argc, char ** argv) {
  assert(argc == 7);
  long a = atol(argv[1]);
  long x = atol(argv[2]);
  long b = atol(argv[3]);
  long y = atol(argv[4]);
  long c = atol(argv[5]);
  long z = atol(argv[6]);
  long r = ax_by_cz(a, x, b, y, c, z);
  long rc = a * x + b * y + c * z;
  if (r == rc) {
    printf("OK %ld %ld\n", r, rc);
    return 0;
  } else {
    printf("NG %ld %ld\n", r, rc);
    return 1;
  }
}

In [None]:
BEGIN SOLUTION
END SOLUTION
gcc -o check_ax_by_cz -O3 check_ax_by_cz.c ax_by_cz.s

* execute the cell below; if you see three OK's and no errors, you are done


In [None]:
BEGIN SOLUTION
END SOLUTION
./check_ax_by_cz 1 1 1 1 1 1
./check_ax_by_cz 1 2 3 4 5 6

#  If things do not go well
* If your program compiles but does not produce the correct answer, run it within a debugger (gdb)
* To that end, first compile the program with -g


In [None]:
BEGIN SOLUTION
END SOLUTION
gcc -o check_ax_by_cz -O0 -g check_ax_by_cz.c ax_by_cz.s

* Go to terminal (login with SSH) and run the debugger
```
cd notebooks/pl07_compile_c
gdb check_ax_by_cz
(gdb) break ax_by_cz
(gdb) run 1 2 3 4 5 6
```

* This way, you can step into the ax_by_cz function
* You continue doing `step` to execute one instruction at a time
* At each instruction you can see the values in the registers by, e.g.,
```
(gdb) print $x0
```
or
```
(gdb) info registers
```

* Of course, you can use vscode or emacs for a better debugging experience

#  Notes on literal (immediate) values
* using literal (immediate) values in arbitrary expressions (e.g., `x + 1234567` or `x * 3.141592`) is trivial in any high-level programming language, but not in machine code
* machine languages generally have restrictions on using such values directly in instructions, because the number of bits used for an instruction is limited (32 bits in ARM64); there is no space to encode arbitrary 32 bit, let alone 64 bit, numbers
* you do not have to know the details in this exercise, but this becomes an issue when you build a compiler, which needs to represent arbitrary integers and floating-point numbers in assembly
* details are left to your investigation, but for now I just note that in ARM64,
  * a `mov` or `movz` (move-and-zero) instruction can set one of the four 16-bit words of an integer register to the specified 16-bit value, and zeros the remaining three 16-bit words
  * a `movk` (move-and-keep) instruction can set one of the four 16-bit words of an integer register to the specified 16-bit value, and leaves the remaining three 16-bit words unchanged
  * by combining them (a `mov` + up to three `movk`s), you can set an arbitrary 64 bit value to a register
  * `fmov` instruction can set a floating-point register to certain "simple" numbers --- numbers whose exponents and mantissa can be represented by a small number of bits; in my quick investigation, `fmov` can take numbers of the form: $\pm 1.xxxx \times 2^{(yyy-3)}$; i.e., numbers whose exponents are 3 bits and mantissa 4 bits (positive and negative)
  * each non-simple number is constructed first by constructing its bit representation on an integer register with `mov` and `movk` and then moving the value from the integer register to the floating-point register (which can be done using `fmov`)
* you may investigate this by changing the immediate values in the following function
```
long imm() { return 1234567; }
double fimm() { return 1.234; }
```

<a name="many_args"> </a>
#  Notes on passing many parameters
* there are only so many registers, so you cannot pass arbitrary number of parameters with registers
* what if we pass so many parameters?
* this is another thing you don't have to get into in this exercise but you have to know when building a compiler


In [None]:
%%writefile args_many.c
long add_many(long a00, long a01, long a02, long a03, long a04, 
              long a05, long a06, long a07, long a08, long a09,
              long a10, long a11, long a12, long a13, long a14,
              long a15, long a16, long a17, long a18, long a19) {
  return (a00 + a01 + a02 + a03 + a04
          + a05 + a06 + a07 + a08 + a09
          + a10 + a11 + a12 + a13 + a14
          + a15 + a16 + a17 + a18 + a19);
}


In [None]:
gcc -O3 -S args_many.c
cat args_many.s

* it seems that the first eight parameters are passed by registers
* eighth and further registers are passed via the stack, specifically addresses starting from the address in `sp` upon entry to the function


<a name="pointer"> </a>
# 4. Pointers
* if you have trouble understanding pointers in C language, read the following and the scales may fall from your eyes
* motto is "pointers are just integers and nothing more than that"

<a name="pointer_deref"></a>
## 4-1. pointer dereferencing

In [None]:
%%writefile ptr_deref.c 
long long_ptr_deref(long * p) {
  return *p;
}

In [None]:
gcc -O3 -S ptr_deref.c 
cat ptr_deref.s

* what is generated from `*p` seems 
```
        ldr     x0, [x0]
```
which is a _<font color="blue">load instruction</font>_ that reads the eight bytes at the address in `x0` and puts the value on `x0`. the following is therefore observed

* __points you learned:__
 1. a pointer parameter (`p`) is passed by `x0`, just like an integer parameter
 1. a pointer value of C is in fact an "address", which is merely an integer in the assembly code level
 1. dereferencing a pointer `p` (*p) refers to the value stored at the address in pointer `p`.  a load instruction is therefore used to extract the value

## 4-2. accessing an array element = pointer dereferencing
* various superficially different expressions in C all end up reading certain addresses
* referencing an array element, for example


In [None]:
%%writefile array_index_long.c
long array_index_long(long * p) {
  return p[0] + p[10];
}

In [None]:
gcc -O3 -S array_index_long.c
cat array_index_long.s

* what is generated for
```
    p[0] + p[10];
```
appears 
```
        ldr     x1, [x0]
        ldr     x0, [x0, 80]
        add     x0, x1, x0
```
* the first instruction
```
        ldr     x1, [x0]
```
reads eight bytes from the address in `x0`, the second from the address in `x0 + 80`. the reason for `+ 80` is that a single long takes eight bytes
* the third instruction
```
        add     x0, x1, x0
```
adds the two numbers
* note that `*p` and `p[0]` end up using the same instruction
* this is where a famous narrative about C "arrays are pointers" comes from

* let's do the same thing for pointers to `int` (32 bit integer)


In [None]:
%%writefile array_index_int.c
int array_index_int(int * p) {
  return p[0] + p[10];
}

In [None]:
gcc -O3 -S array_index_int.c
cat array_index_int.s

* the only differences that arise by the size of `int`, which is four bytes, are that
 * the destination registers are `w` registers
 * the offset used for `p[10]` is 40 instead of 80

## 4-3. accessing a structure field $\approx$ pointer dereferencing
* accessing a structure field through a pointer, like `p->x` is another expression ending up with a similar instruction


In [None]:
%%writefile struct_field.c
typedef struct {
  long x;
  long y;
  long z;
} point;
  
long struct_field(point * p) {
  return p->x + p->y + p->z;
}

In [None]:
gcc -O3 -S struct_field.c
cat struct_field.s

* from the generated instructions, we can observe load instructions access elements as follows
 * `ldp x1, x3, [x0]` (load pair) for `p->x` and `p->y`
 * `ldr x2, [x0, 16]` for `p->z`

* the reason why these addresses are eight bytes apart is that each of `x` and `y` is eight bytes large

* as you can imagine, the same for array of structures
* guess what kind of instructions are generated for the function below and see it


In [None]:
%%writefile struct_array_field.c
typedef struct {
  long x;
  long y;
  long z;
} point;
  
long struct_array_field(point * p) {
  return p[10].x + p[10].y + p[10].z;
}

In [None]:
gcc -O3 -S struct_array_field.c
cat struct_array_field.s

* you can observe that load instructions access elements as follows
 * `ldr` accesses `x0+240` for `p->x`,
 * `ldp` accesses `x0+248` for `p->y` and `p->z`

* the result is expected, as the size of a structure element is (presumably) 8 bytes x 3 = 24 bytes

## 4-4. dereferencing pointers multiple times
* an example of dereferencing pointers multiple times (nothing conceptually new here)
* guess what kind of instructions are generated for the function below and see it


In [None]:
%%writefile ptr_ptr.c
typedef struct node {
  struct node * l;
  struct node * r;
} node;
  
node * left_right(node * n) {
  return n->l->r;
}

In [None]:
gcc -O3 -S ptr_ptr.c
cat ptr_ptr.s

<a name="pointer_deref_assign"> </a>
## 4-5. pointer dereferencing + assignment

In [None]:
%%writefile struct_array_field_assign.c
typedef struct {
  long x;
  long y;
  long z;
} point;
  
void struct_array_field(point * p) {
  p[10].x = 30;
  p[10].z = 50;
}

In [None]:
gcc -O3 -S struct_array_field_assign.c
cat struct_array_field_assign.s

* what corresponds to
```
    p[10].x = 30;
```
appears
```
        mov     x2, 30
        ...
        str     x2, [x0, 240]
```

<a name="problem_l2_norm_long"> </a>
# <font color="green"> Problem 2 :  square norm</font>
* write a function `l2_norm_long` in assembly that computes the square norm of three-element vector of longs

* in other words, write an assembly code corresponding to the following C function

```
long l2_norm_long(long * x) {
    return x[0] * x[0] + x[1] * x[1] + x[2] * x[2];
}
```


In [None]:
BEGIN SOLUTION
END SOLUTION
%%writefile l2_norm_long.s
	.arch armv8-a
	.file	"l2_norm_long.c"
	.text
	.align	2
	.p2align 4,,11
	.global	l2_norm_long
	.type	l2_norm_long, %function
l2_norm_long:
.LFB0:
	.cfi_startproc
	// ------- write your answer here -------
	.cfi_endproc
.LFE0:
	.size	l2_norm_long, .-l2_norm_long
	.ident	"GCC: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0"
	.section	.note.GNU-stack,"",@progbits

In [None]:
BEGIN SOLUTION
END SOLUTION
%%writefile check_l2_norm_long.c
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
long l2_norm_long(long *);

int main(int argc, char ** argv) {
  assert(argc == 4);
  long x[3] = { atol(argv[1]), atol(argv[2]), atol(argv[3]) };
  long l2 = l2_norm_long(x);
  long l2c = x[0] * x[0] + x[1] * x[1] + x[2] * x[2];
  if (l2 == l2c) {
    printf("OK %ld %ld\n", l2, l2c);
    return 0;
  } else {
    printf("NG %ld %ld\n", l2, l2c);
    return 1;
  }
}

In [None]:
BEGIN SOLUTION
END SOLUTION
gcc -o check_l2_norm_long -O3 check_l2_norm_long.c l2_norm_long.s

In [None]:
BEGIN SOLUTION
END SOLUTION
./check_l2_norm_long 1 2 3
./check_l2_norm_long 3 4 5

<a name="function_calls"> </a>
# 5. Function calls
* if a function calls another function, the assembly code for the function becomes more complex
* this is because:
  * it overwrites `x30` (link register) to call a function with `bl` instruction,
  * which means it has to preserve `x30` on the stack before doing so,
  * which in turn means it has to extend the stack (`sp`) and set the frame pointer (`x29`) to the same address,
  * which then means it has to preserve `x29` on the stack, too
* in summary, it has to do something like
```
        stp     x29, x30, [sp, -16]!
```
to extend the stack and preserve `x29` and `x30` before making a function call and restore them before it returns
* observe this in the following simple example


In [None]:
%%writefile sigmoid.c
#include <math.h>

double sigmoid(double x) {
  return 1.0 / (1.0 + exp(-x));
}

In [None]:
gcc -O3 -S sigmoid.c
cat sigmoid.s

* for details, study how a function call works explained in [How Programming Languages Work (Basics)](https://taura.github.io/programming-languages/slides/05-implementation-basics.pdf) slide deck

<a name="compilation_framework"> </a>
# 6. A general framework for hand-compilation
* the following problems will be too complex to tackle without a general _framework_ or tactic
* the main gaps between high level languages and assembly language are
  + assembly language does not have structured compound statements but have branch instructions ($\approx$ goto statement) only
  + assembly language does not allow nested expressions
  + assembly language does not allow to introduce new variables but only has a fixed number of variables of fixed names (i.e., registers)
* when hand-compiling, filling the three gaps simultanesouly is overwhelming
* instead, convert the source program one step at a time by
  + converting loops and if statements into goto's,
  + breaking nested expressions into a series of simple assignments (`a * x + b * y` $\rightarrow$ `s = a * x; t = b * y; u = s + t`),
  + and assining registers to variables
* also, when you call a function, save values necessary after the call into the stack

<a name="problem_normal_dist"> </a>
# <font color="green"> Problem 3 :  normal distribution</font>
* write a function that takes a floating-point (`double`) number $x$ and calculates the following value
$$ \mbox{normal}(x) \equiv \frac{1}{\sqrt{2\pi}}\exp(-x^2/2) $$

* to obtain the value of $\pi$, use $\pi/4 = \atan2(1.0, 1.0)$
* in other words, 

$$ \mbox{normal}(x) = \frac{1}{\sqrt{8 \;\mbox{atan2}(1.0, 1.0)}} \exp(-x^2/2) $$

* see the `normal_c` below for the equivalent C function (your jobs is to write sum_array in assembly language equivalent to it)


In [None]:
BEGIN SOLUTION
END SOLUTION
%%writefile normal.s
	.arch armv8-a
	.file	"normal.c"
	.text
	.align	2
	.p2align 4,,11
	.global	normal
	.type	normal, %function
normal:
.LFB0:
	.cfi_startproc
	// ------- write your answer here -------
	.cfi_endproc
.LFE0:
	.size	normal, .-normal
	.section	.rodata.cst8,"aM",@progbits,8
	.align	3
.LC0:
	.word	536225541
	.word	1074007443
	.ident	"GCC: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0"
	.section	.note.GNU-stack,"",@progbits

In [None]:
BEGIN SOLUTION
END SOLUTION
%%writefile check_normal.c
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double normal(double x);

double normal_c(double x) {
  return exp(- x * x * 0.5) / sqrt(8.0 * atan2(1.0, 1.0));
}

int main(int argc, char ** argv) {
  assert(argc == 2);
  double x = atof(argv[1]);
  double y = normal(x);
  double yc = normal_c(x);
  if (fabs(y - yc) < 1.0e-6) {
    printf("OK %f %f\n", y, yc);
    return 0;
  } else {
    printf("NG %f %f\n", y, yc);
    return 1;
  }
}

In [None]:
BEGIN SOLUTION
END SOLUTION
gcc -o check_normal -O3 check_normal.c normal.s -lm

In [None]:
BEGIN SOLUTION
END SOLUTION
./check_normal 0.0
./check_normal 1.0
./check_normal 2.0

<a name="control_flows"> </a>
# 7. Control flows
* how control flows such as if statements or loops (while or for) are compiled?

<a name="if_stmt_conditional"> </a>
## 7-1. if statement (by conditional branch instruction)

In [None]:
%%writefile branch.c
long g(long x, long y) {
  if (y <= 5) {
    return 0;
  } else {
    return x / (y - 5);
  }
}

In [None]:
gcc -O3 -S branch.c
cat branch.s

* you may observe unremarkable compare-then-conditional-branch structure

```
        cmp     x1, 5   # y - 5
        ble     .L3     # if (y - 5 <= 0.0) goto .L3
        ...
.L3:
        ...
```

<a name="if_stmt_conditional"> </a>
## 7-2. if statement (without conditional branch instruction)
* superscalar processors speculatively decode instructions much ahead of currently executing instructions
* when they encounter a branch instruction, they "predict" the branch outcome (whether the branch is taken or not) and decode the instructions on the predicted path
* mispredicting the branch outcome results in rolling back the processor state to the state before the branch, which degrades performance
* as such, compilers try to avoid branch instructions where profitable and use _conditional instructions_ instead


In [None]:
%%writefile add_or_mul_long.c
long add_or_mul_long(long x, long y, long z) {
  if (x < y) {
    return y + z;
  } else {
    return y * z;
  }
}

In [None]:
gcc -O3 -S add_or_mul_long.c
cat add_or_mul_long.s

* the point is `csel x0,x1,x3, ge` ("conditional select") instruction that sets either `x1` or `x3` to `x0`, depending on the value in the condition code register (= the result of the last `cmp` instruction)
* more specifically
```
        cmp     x0, x1          # x0 - x1
        ...
        csel    x0, x1, x3, ge
```
the above `csel` effectively performs
```
x0 = (x0 - x1 >= 0 ? x1 : x3) 
```
* note that the generated code calculates _both_ `y + z` _and_ `y * z` to avoid branch instructions
* this is profitable as both expressions are cheap; if one branch incurs a large or unknown cost, it is not profitable to do so

<a name="cmp_floats"> </a>
## 7-3. comparing floating-point numbers
* if you think you need different instructions to compare floating-point numbers, you are getting used to assembly languages
* do not think you have to remember each of them, let `gcc -S` teach you instruction name and google it


In [None]:
%%writefile add_or_mul_double.c
double add_or_mul_double(double x, double y, double z) {
  if (x < y) {
    return y + z;
  } else {
    return y * z;
  }
}

In [None]:
gcc -O3 -S add_or_mul_double.c
cat add_or_mul_double.s

* observe that `fcmpe` is the comparison instruction

<a name="loops"> </a>
## 7-4. loops (while and for)
* in assembly languages, loops are made of comparison and conditional branches just as if statements are


In [None]:
%%writefile fact.c
long fact (long n) {
  long i = 1;
  long p = 1;
  while (i <= n) {
    p = p * i;
    i = i + 1;
  }
  return p;
}

In [None]:
gcc -O3 -S fact.c
cat fact.s

* comments to generated instructions

```
fact:
.LFB0:
        .cfi_startproc
        cmp     x0, 0      # cc = n - 0
        ble     .L4        # if (n - 0 <= 0) goto .L4
        add     x2, x0, 1  # x2 = n + 1
        mov     x0, 1      # x0 = 1  (x0 <-> p)
        mov     x1, x0     # x1 = x0 (x1 <-> i)
        .p2align 3,,7
.L3:
        mul     x0, x0, x1 # x0 = x0 * x1 (p = p * i)
        add     x1, x1, 1  # x1 = x1 + 1  (i = i + 1)
        cmp     x1, x2     # cc = x1 - x2 (cc = i - (n + 1))
        bne     .L3        # if (i - (n + 1) != 0) goto .L3
        ret
        .p2align 2,,3
.L4:
        mov     x0, 1
        ret
```

* in general
```
while (condition) 
    S;
```
is equivalent to
```
    goto LC;
LB:
    S;
LC:
    c = condition;
    if (c) goto LB;
```
and you can translate $S$ and _condition_

* the following code is more common (both are correct)

```
    c = condition;
    if (!c) goto LE;
LB:
    S;
    c = condition;
    if (c) goto LB;
LE:
```

* for-statement has a different syntax but is a special case of while-statement

```
for (init ; condition; increment)
    S;
```

is equivalent to

```
init;
while (condition) {
    S;
    increment;
}
```


In [None]:
%%writefile ax_b.c
double ax_b(double x0, double a, double b, long n) {
  double x = x0;
  for (long i = 0; i < n; i++) {
    x = a * x + b;
  }
  return x;
}

In [None]:
gcc -O3 -S ax_b.c
cat ax_b.s

<a name="problem_sum_array_long"> </a>
# <font color="green"> Problem 4 :  sum of long arrays</font>
* write a function `sum_array(a, n)` that computes the sum of an n-element array of longs `a` in assembly
* see the `sum_array_c` below for the equivalent C function (your jobs is to write sum_array in assembly language equivalent to it)


In [None]:
BEGIN SOLUTION
END SOLUTION
%%writefile sum_array.s
	.arch armv8-a
	.file	"sum_array.c"
	.text
	.align	2
	.p2align 4,,11
	.global	sum_array
	.type	sum_array, %function
sum_array:
.LFB0:
	// ------- write your answer here -------
	.cfi_endproc
.LFE0:
	.size	sum_array, .-sum_array
	.ident	"GCC: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0"
	.section	.note.GNU-stack,"",@progbits

In [None]:
BEGIN SOLUTION
END SOLUTION
%%writefile check_sum_array.c
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
long sum_array(long *, long);

long sum_array_c(long * a, long n) {
  long s = 0;
  for (long i = 0; i < n; i++) {
    s += a[i];
  }
  return s;
}

int main(int argc, char ** argv) {
  long n = argc - 1;
  long * a = (long *)malloc(sizeof(long) * n);
  for (long i = 0; i < n; i++) {
    a[i] = atol(argv[i + 1]);
  }
  long sa = sum_array(a, n);
  long sa_c = sum_array_c(a, n);
  if (sa == sa_c) {
    printf("OK %ld %ld\n", sa, sa_c);
    return 0;
  } else {
    printf("NG %ld %ld\n", sa, sa_c);
    return 1;
  }
}

In [None]:
BEGIN SOLUTION
END SOLUTION
gcc -o check_sum_array -O3 check_sum_array.c sum_array.s

In [None]:
BEGIN SOLUTION
END SOLUTION
./check_sum_array 1 2 3 4 5
./check_sum_array 1 2 3 4 5 -6
./check_sum_array 1 -2 3 -4 5 -6 7

<a name="problem_max_array_double"> </a>
# <font color="green"> Problem 5 :  maximum value in double array</font>
* write a function `max_array(a, n)` that computes the maximum value of an n-element array of doubles `a` in assembly
* you may assume all elements are positive and return 0 if there are no elements (i.e., n = 0)
* see `max_array_c` for an equivalent C function


In [None]:
BEGIN SOLUTION
END SOLUTION
%%writefile max_array.s
	.file	"max_array.c"
	.text
	.globl	max_array
	.type	max_array, @function
max_array:
.LFB0:
	.cfi_startproc
/* ------- 解答をここに書く. write your answer here ------- */

	.cfi_endproc
.LFE0:
	.size	max_array, .-max_array
	.ident	"GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0"
	.section	.note.GNU-stack,"",@progbits

In [None]:
BEGIN SOLUTION
END SOLUTION
%%writefile check_max_array.c
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
double max_array(double *, long);

double max_array_c(double * a, long n) {
  double m = 0.0;
  for (long i = 0; i < n; i++) {
    if (a[i] > m) m = a[i];
  }
  return m;
}

int main(int argc, char ** argv) {
  long n = argc - 1;
  double * a = (double *)malloc(sizeof(double) * n);
  for (long i = 0; i < n; i++) {
    a[i] = atof(argv[i + 1]);
  }
  double ma = max_array(a, n);
  double ma_c = max_array_c(a, n);
  if (ma == ma_c) {
    printf("OK %f %f\n", ma, ma_c);
    return 0;
  } else {
    printf("NG %f %f\n", ma, ma_c);
    return 1;
  }
}

In [None]:
BEGIN SOLUTION
END SOLUTION
gcc -o check_max_array -O3 check_max_array.c max_array.s

In [None]:
BEGIN SOLUTION
END SOLUTION
./check_max_array 1.2 2.3 3.4
./check_max_array 1.2 3.4 2.3
./check_max_array 3.4 1.2 2.3
./check_max_array