The UK Biobank is a large-scale, population-based cohort study involving over 500,000 UK residents aged 40–69 years. Recruitment took place between 2006 and 2010, during which participants completed extensive health questionnaires, underwent detailed physical assessments, and provided biological samples. The study tracked health-related events through data linkage to hospital admission records and mortality registries. In 2009, the UK Biobank introduced ophthalmic examinations to further expand the scope of the collected data[1].
Homepage of the dataset: https://www.ukbiobank.ac.uk/
The data we currently possess mainly includes: Retinal images, and health-realted data.
Dataset path: /scratch/users/nus/hongyu.h
Dataset path: /scratch/users/nus/hongyu.h/SERI_UKbiobank/1_Fundus_Images
UKBB_FP_01 : 46514
UKBB_FP_02 : 40826
UKBB_FP_03 : 38786
UKBB_FP_04 : 49705
UKB_new_2024: 1169 + 949 + 1154 + 1 +923 = 4196
Total count of retinal images:180027
Interpretation of the naming format: Participants ID_Data field_Instances_Array
Detailed information is available on the official website.
1017168_21016_2_1.png |
1021021_21015_2_1.png |
Dataset path: /scratch/users/nus/hongyu.h/SERI_UKbiobank/2_Phenptype
The raw data we have is stored in the "/0_Header_Tab" folder, and the other folders contain the organized data required for other projects.
In the "/0_Header_Tab" folder, we have header.txt file and its corresponding xxx.tab file. The .tab file serves as the main data file, and the header file is created by extracting the header from the main data file.
Therefore, if you want to find the data needed for your project, you can search for the field ID on the UK Biobank website and then check if the header files contain the field ID.
If we want to extract ethnic data, we need to go to the UKBB website and find the field ID for ethnic data.
It can be found that the field ID for ethnic data is 21000, and detailed information about the ethnic data can also be obtained.
Next, let's find the location of the ethnic data within the header file.
grep -n '21000' *.txt
We can obtain the search result
D3_ukb41252_tab_8330_header.txt:4737: 4737 f.20160.0.0
D3_ukb41252_tab_8330_header.txt:4738: 4738 f.20160.1.0
D3_ukb41252_tab_8330_header.txt:4739: 4739 f.20160.2.0
D3_ukb41252_tab_8330_header.txt:4740: 4740 f.20160.3.0
According to the ethnic data detailed information, we select 'f.20160.0.0.', which is located at row 4737 in D3_ukb41252_tab_8330_header.txt.
Then, we can extract the ethnic data from the corresponding tab file.
cut -f 1,4737 /scratch/users/nus/hongyu.h/SERI_UKBiobank/2_Phenotype/ukb41252.tab > xxx/ethnic.txt
Through the simple steps mentioned above, we can locate and extract the necessary data for the project.
I've created a "3_Project" folder to help keep our project data organized. Feel free to create your own subfolders within it to store any data you've prepared for your specific needs.
To make things easier, I've also set up a "basic" folder that has essential data like gender and ethnic, since most of our projects rely on this info. Having it all in one place should save us some time and effort.
Let me know if you have any thoughts or suggestions!
[1] Zhu, Zhuoting, et al. "Retinal age gap as a predictive biomarker for mortality risk." British Journal of Ophthalmology 107.4 (2023): 547-554.


