Skip to content
Hannes Datta edited this page Mar 21, 2022 · 6 revisions

The main principles we follow in writing code are summarized in this Building Block and the document Code and Data for the Social Sciences. See especially the Code Style appendix and chapters 1, 2, 6, and 7.

Language-specific style guides can be found in the following pages from the TSH website:

A few key principles to remember when working with GitHub are:

  1. Repositories are organized into self-contained modules
  2. Each module can be run independently, declares its inputs and external dependencies, and clearly separates inputs from outputs
  3. Each module has a build script that runs its steps from beginning to end
  4. The build script must be run to completion before any commit or merge to master

Be a good code citizen

Team members should take the time to improve the code they are modifying or extending even if they did not write it themselves. A core of good code plus a long series of edits and accretions equals bad code. The problem is that the logical structure that made sense for the program when it was small no longer makes sense as it grows. It is critical to regularly look at the program as a whole and improve the logical structure through reorganization and abstraction. Programmers call this “refactoring.” Even if your immediate task only requires modifying a small part of a program, we encourage you to take the time to improve the program more broadly. At a minimum, you should guarantee that the code quality of the program overall is at least as good as it was when you started. A resource on how to implement refactoring can be found here.

Keep it short

No line of code should be more than 100 characters long. All languages we work in allow you to break a logical line across multiple lines on the page (e.g, using /// in Stata or ... in Matlab). You may want to set your editor to show a “margin” at 100 characters.

Functions should not typically be longer than 200 lines.

Clone this wiki locally