# Fundamental engineering practices

Writing computer code is only a fraction of software engineering.  A large chunk of efforts is spent in the coding infrastructure.  The keyword of making the engineering system is automation.

1. Automation
2. Version control and regression
3. Work that cannot be automated

# Bash scripting

Shell script is the most common way for automation.  A shell is responsible for taking commands from users.  Every operating system provides shells.  Because of the ubiqitious Linux, `bash` becomes the most popular shell.  A bash shell script should work on almost all computer systems.

Here I'll introduce some useful tricks for scripting bash.

## Overview

Structure of a simplest script:

1. Shebang.
2. Comment/document.
3. Setup.
4. Action.

Example (`clone-python.sh`):

```bash
#!/bin/bash
#
# This script clones the cpython repository.

# setup environment variables.
root=${ROOT:-~/tmp}
pkgname=python
pkgbranch=${VERSION:-3.7}
pkgfull=$pkgname-$pkgbranch
pkgrepo=https://github.com/python/cpython.git

# clone.
mkdir -p $root
cd $root
echo `pwd`
if [ ! -d $pkgfull ] ; then
  git clone -q -b $pkgbranch $pkgrepo $pkgfull
fi
```

A shell script file contains commands to `bash`.  Executing the bash script file is almost the same as typing those commands directly in an interactive shell.  Bash shell scripts are the most common way to record the commands and automate the work.

## Variables

Variables are essential in programming langauges.  Variables in bash do not have types, but there are two kinds of variables distinguished by their scopes.  One is the _shell variable_, which lives in the current shell.  The other is the _environment variable_, which is also visible in child processes.

```bash
shell_var="shell_value"

env_var="env_value"
export env_var
export env_var2="other_env_value"
```

Default value when variable isn't set:

In [None]:
# show the fallback value since the variable isn't set
!unset THISENVVAR; echo ${THISENVVAR:-no such thing}

In [None]:
# use the variable value
!THISENVVAR="some value"; echo ${THISENVVAR:-no such thing}

## Sub-process vs source

A bash script may be run in two ways.  One is to run it like an ordinary program.  A new process will be created by the current shell, and the script will be run in that process.  The other way is to use `source` (or its POSIX-compatible synonym, `.`) to run it in the current shell.  The latter makes the shell script work like a replay of the command sequence in it.

Assume we have a bash script called `dosomething.sh`:

```bash
#!/bin/bash
export MYENVVAR="MYENVVAR is set to what I want"
echo "do something"
```

The variable isn't set in the calling shell:

In [None]:
!unset MYENVVAR; ./dosomething.sh; echo ${MYENVVAR:-"MYENVVAR is not set"}

The variable gets exported to the calling shell:

In [None]:
!unset MYENVVAR; source ./dosomething.sh; echo ${MYENVVAR:-"MYENVVAR is not set"}

## Redirection

When executing a command in a bash script it's commonplace to redirect the output to a file or another command.

In [None]:
!echo "a line output" > line.log ; cat line.log

Sometimes we want to redirect both stdout and stderr to a file.  The idiom is:

In [None]:
!echo "a line output" > line.log 2>&1 ; cat line.log

Note that `2>&1` should be written after `> line.log`.

In [None]:
!cp nothisfile.txt another.txt 2>&1 > /dev/null

In [None]:
!cp nothisfile.txt another.txt > /dev/null 2>&1

## Branching

To write smart scirpts we need the `if`-`else` branching construct.  The following example detects the OS and runs different commands to obtain the number of (logical) processors on the machine:

```bash
#!/bin/bash
if [[ "$(uname)" == "Darwin" ]] ; then
  NP=${NP:-$(sysctl -n hw.ncpu)}
elif [[ "$(uname)" == "Linux" ]] ; then
  NP=${NP:-$(cat /proc/cpuinfo | grep processor | wc -l)}
else
  NP=${NP:=1}
fi
echo "NP may be set to $NP"
```

In [None]:
!uname; ./shownp.sh

## Function

`bash` allows us to write functions to collect commands and rerun it over and over in a script.

```bash
#!/bin/bash
runcmd () {
  echo "run command: ${@:2}"
  { time "${@:2}" ; } > $1 2>&1
  echo "done; log file: $(cd "$(dirname $1)" && pwd)/$1"
}
runcmd line.log echo "information shown"
```

In [None]:
!./bashfunction.sh ; cat line1.log; cat line2.log

# Makefile

`Makefile` is the input file of a tool called `make`.  `make` has many derived implementations since its creation in 1976 at Bell Labs.  The most popular implementation is GNU `make`, which is also required in building the Linux kernel.  We will be focusing on GNU `make`.

A Makefile consists of rules in the following format:

```make
target : prerequisites [...]
        recipe (1)
        recipe (2)
        ...
```

Note a tab is **required** at the beginning of each recipe line.  And rules and recipes are line-based.  If a recipe should use a single line and no more, or it needs to use `\` for line continuation.  So is the rule.

## `make`: Automating Your Recipes

`make` keeps track of the file timestamps.
* If the source file is older than its object file, `make` knows that it doesn't need to invoke the compiler.
* If, in the other way around, the source file is newer than its object file, or the executable is newer than the object and library file, `make` will run the building tools according to the recipes written in the `Makefile`.

> Make originated with a visit from Steve Johnson (author of yacc, etc.), storming into my office, cursing the Fates that had caused him to waste a morning debugging a correct program (bug had been fixed, file hadn't been compiled, cc \*.o was therefore unaffected). As I had spent a part of the previous evening coping with the same disaster on a project I was working on, the idea of a tool to solve it came up. It began with an elaborate idea of a dependency analyzer, boiled down to something much simpler, and turned into Make that weekend. Use of tools that were still wet was part of the culture. Makefiles were text files, not magically encoded binaries, because that was the Unix ethos: printable, debuggable, understandable stuff.
>
> _Stuart Feldman_

## Makefile format

Use the simple hello world program as an example for writing a make file.  First we set a variable `CXX` to designate the compiler command to be used:

```
CXX = g++
```

Write the first rule for linking the executable.  The first rule is the default rule that `make` will use when it is invoked without a target.

```
hello: hello.o hellomain.o
	$(CXX) hello.o hellomain.o -o hello
```

Then write two rules for the object files.  First `hello.o`:

```
hello.o: hello.cpp hello.hpp
	$(CXX) -c hello.cpp -o hello.o
```

Second `hellomain.o`:

```
hellomain.o: hellomain.cpp hello.hpp
	$(CXX) -c hellomain.cpp -o hellomain.o
```

Now we can use a single command to run all the recipes for building `hello`:

In [None]:
!cd make1; rm -f hello *.o; make

`make` the second time.  Nothing needs to be done:

In [None]:
!cd make1; make

If we change one of the source files (say, `hello.cpp`), `make` knows from the prerequisites (dependencies) that the other one doesn't need to be rebuilt.

In [None]:
!cd make1; touch hello.cpp; make

Change the shared prerequisites (the header file `hello.hpp`).  Everything needs to be rebuilt:

In [None]:
!cd make1; touch hello.hpp; make

## Automatic variables

We found some duplicated file names in the recipes in the above example.  `make` provides _automatic variables_ that allow us to remove them.

* `$@` is the file name of the target of the rule.
* `$^` is the file names of all the prerequisites.
* `$<` is the file name of the first prerequisite.

Aided by the automatic variables, we can simplify the recipes:

```
hello: hello.o hellomain.o
	$(CXX) $^ -o $@

hello.o: hello.cpp hello.hpp
	$(CXX) -c $< -o $@

hellomain.o: hellomain.cpp hello.hpp
	$(CXX) -c $< -o $@
```

The new `Makefile` works exactly the same as the previous one, but doesn't have the duplicated file names.

In [None]:
!cd make2; rm -f hello *.o; make

## Implicit rule

Even with the automatic variable, we see duplicated recipes for the two object file targets.  It can be removed by rewriting the *implicit rule* for `.o` file:

```
%.o: %.cpp hello.hpp
	$(CXX) -c $< -o $@
```

`%` in the target will match any non-empty characters, and it is expanded in the prerequisite.  Thus, the `Makefile` will become much simpler.  And there's fewer places for mistakes:

```
CXX = g++

hello: hello.o hellomain.o
	$(CXX) $^ -o $@

%.o: %.cpp hello.hpp
	$(CXX) -c $< -o $@
```

In [None]:
!cd make3; rm -f hello *.o; make

## Popular phony target

It is handy to have some targets that are not files, and use them to accomplish some pre-defined operations.  For example, almost all practical `Makefile`\ s has a target called `clean`, and it removes all the built files.

In [None]:
!cd make4; make clean

These targets are called _phony targets_ (not real files).  The above operation is accomplished by the following rule:

```
.PHONY: clean
clean:
	rm -rf hello *.o
```

Another common use of phony targets is to redirect the default rule:

```
# If the following two lines are commented out, the default target becomes hello.o.
.PHONY: default
default: hello

# Implicit rules will be skipped when searching for default.
#%.o: %.cpp hello.hpp
#	$(CXX) -c $< -o $@

hello.o: hello.cpp hello.hpp
	$(CXX) -c $< -o $@

hellomain.o: hellomain.cpp hello.hpp
	$(CXX) -c $< -o $@

hello: hello.o hellomain.o
	$(CXX) $^ -o $@
```

In [None]:
!cd make4; make clean; make

# Cmake

Automation is needed to simplify entangled operations which induce human errors.  Cross-platform building is a common example of such operations.  We've seen in a previous example (a bash shell script) how it comes to us:

```
#!/bin/bash
if [[ "$(uname)" == "Darwin" ]] ; then
  NP=${NP:-$(sysctl -n hw.ncpu)}
elif [[ "$(uname)" == "Linux" ]] ; then
  NP=${NP:-$(cat /proc/cpuinfo | grep processor | wc -l)}
else
  NP=${NP:=1}
fi
echo "NP may be set to $NP"
```

As the software grows, such simple conditional statements fail to handle the complexity.  It applies to both shell scripts and make files.  We need a dedicated tool for orchestrating the build processs.  Cmake is such a tool.

Although it has "make" in the name, cmake is _not_ a variant of make.  It requires its own configuration file, called `CMakeLists.txt`.  On Linux, we usually let cmake to generate GNU make files, and then run make to build the software.  This is a so-called two-stage building process.  Cmake provides many helpers so that we may relatively easily configure the real build commands to deal with compiler flags, library and executable file names, and third-party librarires (dependencies).

It is easy to let cmake use a separate build directory (it's the default behavior); the built files will be in a different directory from the source tree.  In this way, a single source tree may easily produce multiple binary trees.

Since cmake is only used to deal with complex configuration, we may not use a simple example to show how it is used.  Instead, high-level information about what it does will be provided.

## How to run cmake

By default cmake expects to be run in a separate build directory.  Assume the current working directory is the project root.  The common way to invoke cmake for building the project is:

```
$ mkdir -p build/dev
$ cd build/dev
$ cmake ../.. -DCMAKE_BUILD_TYPE=Release
-- The C compiler identification is AppleClang 10.0.1.10010046
-- The CXX compiler identification is AppleClang 10.0.1.10010046
...
-- Configuring done
-- Generating done
-- Build files have been written to: /absolute/path/to/build/dev
```

## Select C++ standards

We may use cmake to pick which standard the C++ compiler should use:

```
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
```

Different compilers may have different options for the C++ standard.  clang and gcc use `-std=`, while msvc uses `/std:`.  The cmake variables know what to use for each of the supported compilers.  The generated make file will result in a recipe like:

```
c++ -O3 -DNDEBUG -fPIC -flto -std=c++14 -o CMakeFiles/_libst.dir/src/python/libst.cpp.o -c /absolute/path/to/src/python/libst.cpp
```

## Add a custom option

Cmake allows to add any custom option that is consumed from the command line.  For example, a new `DEBUG_SYMBOL` optiona can be added by the following cmake list code:

```
option(DEBUG_SYMBOL "add debug information" ON)

if(DEBUG_SYMBOL)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g")
endif()
```

The option is supplied to cmake as such:

```
cmake root -DDEBUG_SYMBOL=ON
```

# Git version control system

# Automatic testing

# Wrap to Python: pybind11

# Continuous integration

# Code review

# Timing

# Homework

1. Write a bash shell script to build all of the example programs in the previous lectures.
2. Write a Makefile to build all of the example programs in the previous lecture.

# References

* https://www.gnu.org/software/bash/manual/bash.html
* https://www.gnu.org/software/make/manual/make.html