The growing development and advancement of Artificial Intelligence (AI) techniques in healthcare along with AI-powered tools has significantly improved over the years, whether in image analysis, robotics-assisted surgery, patient monitoring, medical device automation, personalized-medicine and drug identification ... (see references 1 & 2)
During my 9-months training in data science I worked on a project for the development of an AI for image-based disease classification made based on a structured ophthalmic database of "real-life" patients.
The dataset was provided for the "Peking University International Competition on Ocular Disease Intelligent Recognition (ODIR-2019)" (Source : https://odir2019.grand-challenge.org/) and is based on fundoscopy images of 5,000 patients from different hospitals and medical centers across China.
Fundoscopy is a cost-effective examination of the inner anatomical details of the eyeball allowing for the diagnosis of eye diseases and the identification of vision-loss associated risk factors
Indeed, through the image-based analysis of the retina, it is possible to detect a broad range of alterations of the eye anatomy (see https://www.exetereye.co.uk/the-eye/eye-anatomy/ for further info).
As the light enters the eye, it passes through the crystalline lens that refracts it. With age, this lens can opacify in a condition named, cataract, where patients experience a blurry vision and eventually blindness.
Once refracted, the light is focused onto the macula, a specific spot of the retina. This spot can also be altered with age, causing a disease with vision loss or impairment named Age-related Macular Degeneration (AMD).
Then, as it hits the retina, the light is converted into information transmitted to the brain through the optic nerve which can be damaged in some diseases such as glaucoma.
The retina is also home to the cilioretinal arteries that supply it with blood. Diseases affecting the morphology, shape or diameter of the blood vessels, such as hypertension or diabetes, will affect cilioretinal arteries and eventually lead to blindness.
Moreover, any modifications in the shape of the eyeball can prevent the light from being focused onto the retina causing a blurry vision. This happens in myopia which is marked by choroidal thinning and morphologic changes in the retina.
The dataset provided in this ODIR-19 challenge contains information regarding 5000 patients and info related to their sex, age, and diagnositc keywords made by medical doctors for each of the left and right eye-fundus.
Based on these diagnositc keywords, labels are assigned to each patient.
The aim here is to develop an AI-based classification of the diseases based on the dataset provided using the patient's labels and color eye-fundi.
1. Data organization
The dataset is a patient-based dataset, meaning each row contains all data for each individual patient, and thus includes both eye-fundi and each of their related information.
However, one disease may not affect each eye at the same time, the same intensity, if at all.
Moreover, pooling both patient's eye fundi increases the combination of diagnositc keywords, thus labels, possible, which may in return affect the efficiency of our classification model.
--> As we seek to classify the eye-fundi based on specific diseases, we will need a fundi-based dataset to allow our model to train on each fundus individually
2. Data labeling
Within the dataset are 8 columns related to labels : N, D, G, C, A, H, M and O, corresponding to the terms Normal, Diabetes, Glaucoma, Cataract, AMD, Hypertension, Myopia and Other diseases/abnormalities, respectively.
In the current dataset, the annotated labels are assigned to each patient based on the following rules:
(1) labels are determined based on diagnosis keywords given by medical doctors,
(2) the Normal label is assigned to a given patient if and only if both or his/her left and right diagnosis keywords are "normal fundus";
(3) disease-related labels are assigned whenever at least one of both fundi is not diagnosed as "normal fundus";
(4) all suspected diseases or abnormalities as considered as fully diagnosed diseases or abnormalities.
(5) one patient may be assigned one or multiple labels.
--> 1) It seems important to have a look at the correspondance between the labels and their corresponding diagnotic keywords to assess the homogeneity of the terms and their assignation to a given label (especially with the different hospitals and medical centers being involved).
--> 2/3) Since disease-related labels are assigned even if one of the two fundus' diagnostic keywords is 'normal fundus', it seems important to treat each fundus separetely.
--> 4) Finally, given that a disease-related label can be assigned even when the diseases is only 'suspected', it might indicate that the dataset contains images of different stage of the disease. Such disparity could create a biais when running our classification model.
3. Image quality
Finally, because the images come from different medical centers and hospitals, there are differences in their quality and size.
However, the organizers of the challence indicates that the following ones present background issues.
--> Decision was made to discard them from the dataset (2174_right.jpg 2175_left.jpg 2176_left.jpg 2177_left.jpg 2177_right.jpg 2178_right.jpg 2179_left.jpg 2179_right.jpg 2180_left.jpg 2180_right.jpg 2181_left.jpg 2181_right.jpg 2182_left.jpg 2182_right.jpg 2957_left.jpg 2957_right.jpg)
4. Image analysis
Given the broad range of diseases fundoscopy can detect, and knowing their identification might rely on the analysis of different key structural details of the retina, there might be a limitation with using a "simple" non-segmented image analysis of the fundi.
--> To potentialize the classification by our future model, image segmentation may be required.
- Bajwa J. et al. Artificial intelligence in healthcare: transforming the practice of medicine. Future Healthcare Journal, 2021 Vol 8, No 2: e188–94.
- Bohr A & Memarzadeh K. The rise of artificial intelligence in healthcare applications. Artificial Intelligence in Healthcare, 2020, Pages 25-60 Chapter