# Troubleshooting runtime errors

There are several places to look for information if a job fails. The logs files will probably give you a hint. 

## Looking at the log files

The log files are files in the format ``$model.log.*`` 
- When the model is running, it produces the log files in the **run directory**: ``RUNDIR``. 
- When the run completes successfully, the model moves the log files into the **archive** directory: ``DOUT_S_ROOT``
- When the model fails, the log files remains in the run directory ``RUNDIR``

![CESM directories and namelists](../../images/troubleshooting/CESM_directories_and_log_files.png)

*<p style="text-align: center;"> Figure: Overview of the CESM directories and the log files. </p>*



First, check the latest ``cpl.log.*``, which will often tell you when the model failed. If a run completed successfully, the last several lines of the ``cpl.log.*`` file will have a string like ``SUCCESSFUL TERMINATION OF CESM``. 
If you don't see this message, it means the run has failed. 

Check these things first when a job fails:
- Did the model time out?
- Was a disk quota limit hit?
- Did a machine go down?
- Did a file system become full?
If any of those things happened, take appropriate corrective action and resubmit the job.

If it is not clear that any of the above caused a case to fail, check the rest of the component log files ``$model.log.*`` for error messages. It takes a bit of practice to interpret message errors. We will look at an example in this chapter exercices. 

## Running with more debugging information

If you cannot find the reason of the crash in the **log** files, there are two ways to add more debugging information. 
- Increase the value of the run-time xml variable ``INFO_DBUG`` (This **does NOT require rebuilding**): 
```
./xmlchange INFO_DBUG=2. 
```
This adds more information to the ``cpl.log`` file that can be useful if you can’t tell what component is aborting the run, or where bad coupling fields are originating.

- Try rebuilding and rerunning with the variable DEBUG set to TRUE (This ** requires rebuilding**): 
```
./xmlchange DEBUG=TRUE.
```
This adds various runtime checks that trap conditions such as out-of-bounds array indexing, divide by 0, and other floating point exceptions.
Before running, you must rebuild run 
```
./case.build --clean-all
qcmd -- ./case.build.
```
Note that the model will run **significantly slower** in ``DEBUG mode``, so this may not be feasible if the model has to run a long time before producing the error. 

<div class="alert alert-info" style="text-align: center;">

More information about troubleshooting can be found in the [CIME documentation](https://esmci.github.io/cime/versions/master/html/users_guide/troubleshooting.html).

</div>