# Assignment 1: Introduction to histopathology image analysis

The goal of this assignment is to define and analyze the problem that you will be working on. In the main project, you will develop a deep neural network model that can identify metastases in image patches from histological preparations of lymph node tissue from breast cancer patients. Then, you will submit results of the developed model to an open compentition where it will be compared to models developed by other researchers. 

The following three subsections give an introduction to deep learning, medical image analysis challenges/competitions, and detection of lymph node metastases. This very brief text along with the provided references are a good starting point for further exploration these topics. 

### Deep learning for medical image analysis 

In the coming decade, artificial intelligence will affect many aspects of human life and society ranging from home automation and self-driving cars to algorithmic detection of “fake news” on social media. This technology will also inevitably influence healthcare and in turn medical imaging. One of the most active fields of research in artificial intelligence is deep machine learning [1][2]. This concept works by learning abstract, hierarchical feature representations of input data that are predictive of a certain target (e.g. a class label in the case of classification). In contrast to classical machine learning methods, deep learning eliminates the step of “manual” feature engineering that is a tedious process and sub-optimal compared with learned representations.

In the field of medical image analysis, the use of deep learning methods has already resulted in notable progress for a variety of tasks. Some of the first applications where automatic methods were shown to outperform medical experts have been in histopathology image analysis [3]. 


### Medical image analysis challenges

One of the developments that brought about the success of deep learning methods in computer vision was the availability of large, public annotated image datasets. In medical image analysis, this was mirrored in the form of medical image analysis challenges – friendly competitions in which researchers worldwide evaluate their solutions on the same data with the same criteria, in a blinded manner [4]. Organizing a challenge constitutes collecting and publicly distributing a training dataset that can be used to address a specific clinical task (such as detection of lymph node metastases). The ground truth for the testing is set is retained by the organizers of the challenge, which facilitates fair and independent evaluation and comparison of methods. The organization of medical image analysis challenges has had a tremendous impact on the advancement of the state of the art in the field. In the coming decade, such methods will continue to push the field forward. 

For the main project work, you will use a dataset of image patches [9] derived from the CAMELYON16 challenge [3]. This Patch-CAMELYON dataset is itself published as a challenge on the Kaggle platform [10]. You will submit the results of your project work to this platform for independent evaluation. 

Some other examples of histopathology image analysis challenges organized by our group are AMIDA13 [5] and TUPAC16 [6].

### Detection of lymph node metastases

The presence of metastases in sentinel axillary lymph nodes is an important component of breast cancer staging [7]. This is evaluated by examination of histological preparations of lymph node tissue by pathologists. Although manual assessment by a pathologists is the current standard practice, this procedure is notoriousely tedious to perform and can result in missed metastases. 

The move towards digitalization of histopathology labs with the introduction of whole-slide image scanners [8], has opened the possibility of automating some of the tasks performed by pathologists with the use of image analysis methods.

<img src="images/camelyon16_example.png" width=300>


### Setting up the working environment

Although you are free to use a programming language and environment of your own choice for the main project work, the examples in the assignments are in Python. If you need to refresh your Python programming skills, we recommend the following educational modules that we have prepared: https://github.com/tueimage/essential-skills 

The Python essentials module describes how to install the Anaconda Python distribution. The neural network examples in the assignments use the Keras neural networks API. We recommend installing Keras with Tensorflow as the underlying framework. Instructions on how to install the Tensorflow and Keras Python packages can be found here: https://www.tensorflow.org/install, https://keras.io/#installation, but is should be as simple as typing the following commands in your command prompt:

`pip install tensorflow`

`pip install keras`

Note that if you have a GPU that is supported by Tensorflow, you should install the Tensorflow GPU package instead:

`pip install tensorflow-gpu`

This will move the training of the neural networks to the GPU, which is much faster than on the CPU. 


### References

[1] LeCun, Y., Bengio, Y., & Hinton, G., 2015. Deep learning. Nature, 521.

[2] Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der Laak, J.A., Van Ginneken, B. and Sánchez, C.I., 2017. A survey on deep learning in medical image analysis. Medical image analysis, 42.

[3] Bejnordi, B. E., Veta, M., Van Diest, P. J., Van Ginneken, B., Karssemeijer, N., Litjens, G., van der Laak, J. A. W. M. ,the CAMELYON16 Consortium, 2017. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA, 318.

[4] Van Ginneken, B. Why Challenges? https://grand-challenge.org/Why_Challenges

[5] Veta, M., et al., 2015. Assessment of algorithms for mitosis detection in breast cancer histopathology images. Medical image analysis, 20.

[6] Veta, M., et al., 2018. Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge. arXiv preprint.

[7] Donegan, W.L., 1997. Tumor‐related prognostic factors for breast cancer. CA: a cancer journal for clinicians, 47.

[8] Madabhushi, A. and Lee, G., 2016. Image analysis and machine learning in digital pathology: Challenges and opportunities. Medical Image Analysis, 33.

[9] Veeling, B., The PatchCamelyon (PCam) deep learning classification benchmark. https://github.com/basveeling/pcam

[10] Histopathologic Cancer Detection. https://www.kaggle.com/c/histopathologic-cancer-detection


## Exercise 1

What is the clinical utility of evaluating the presence of metastases in sentinal lymph nodes in breast cancer patients? In other words, how is this information used in the clinical decision making process for breast cancer patients?

## Exercise 2

The PatchCamelyon dataset is derived from the CAMELYON16 dataset of whole-slide images. Describe how a neural network classification model trained on small image patches can be applied to larger, whole-slide images with the goal of detecting metastases.

## Exercise 3

TODO Dataset explorations

## Submission checklist

- Exercise 1: Answers to the questions
- Exercise 2: Answers to the questions
- Exercise 3: Answers to the questions