# Architectures

## Impact of choice of Architecture

The chosen model architecture can highly impact results.  If, for example, not looking at the time axis,  we compare performance of models between `resnet` and  `convnext_in22` we see a score difference between 83 and 87.  The default choice should no longer be blindly `resnet` as the community has moved on from that.

Beyond mere scoring, another important factor to take into account is how well an architecture is suiteable for transfer-learning.  [Have a look](https://www.kaggle.com/code/jhoward/the-best-vision-models-for-fine-tuning?scriptVersionId=102071666&cellId=16) at `convnext_large_in22k` for example.

## GPU Impact

Larger architectures will require a GPU with more memory as they have more parameters.  There's however a way to circumvent this.  See [Practical Deep Learning for Coders - Lesson 7](https://youtu.be/p4ZZq0736Po?t=445).

### Listing GPU Memory
Jeremy reports on GPU memory used by doing:

In [None]:
import gc
def report_gpu():
    print(torch.cuda.list_gpu_processes())
    gc.collect()
    torch.cuda.empty_cache()

### Gradient Accumulation

Using a cool trick called [`gradient accumulation`](https://youtu.be/p4ZZq0736Po) one can run largest models even on a modest GPU.  FastAI has this built in.


## References

- [Practical Deep Learning for Coders - Lesson 6 (evaluating different architectures](https://youtu.be/AdhG64NF76E?t=5201)
- [Which image models are best (Notebook from Jeremy on Kaggle](https://www.kaggle.com/code/jhoward/which-image-models-are-best)
- [The best vision models for fine tuning (Notebook from Jeremy on Kaggle](https://www.kaggle.com/code/jhoward/the-best-vision-models-for-fine-tuning/notebook)