Category_Comparison_Algorithim

This little algorithim finds pairs of individuals from a dataframe (generated from the main file) and compares then according to a scoring system. The algorithim itself came about as my solution to the following question.

Assume a cohort of 1000 individuals with ages range from 20 to 40, M/F gender, and 8 chromosomes that may be of African/Asian/European ancestry (e.g., 25-M-Af-Eu-Af-As-Eu-Eu-As-As). Your goal is to develop a Matlab/R code that will optimize pairing of the most similar individuals. The scoring scheme is as follows: every category/chromosome where individuals match equals one point (age difference of 5<=years is considered a match). An identical individual pair worth 10 points and the maximum score you can get is 10,000. Leaving individuals unpaired has a penalty of 5 points per individual.Include a code to generate this dataset and some plot showing the scoring distribution of your code for multiple datasets.

Steps carried out by R script for comparison algorithim

generates dataframe
creates distance matrix for all combinations of matched pairs according to age difference metric.
creates function to calculate score of matched rows.
make 1000 x 1000 matrix of where age difference are greater than 5
make 1000 x 1000 matrix of all normalised scores between pairs
Add the two 1000 x 1000 matrices together and remove all entries greater than 1
convert matrix of scores back to unnormalised form.
Use this matrix to collect all pairs corresponding to the 10, 7, 6, 5, 4, 3,2,1, -5 scores.
remove duplicates from 10's data frame remove duplicates from 7's dataframe remove individuals already in 10 from 7 and combine both data frames remove duplicates
repeat step 9 for 6's 5's, 4's 3' 2's 1's and -5's data frames.

The R script "main.R" takes about 10 minutes to run. It then generates a .xlsx file showing all the matched pairs and their corresponding score. The "run_results_summary.xlsx" file shows the number of matched pairs, total score, median and mean of 15 different runs of the "main.R" script.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSe		LICENSe
README.md		README.md
main.R		main.R
run_results_summary.xlsx		run_results_summary.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Category_Comparison_Algorithim

About

Releases

Packages

Languages

License

tony-blake/Category_Comparison_Algorithim

Folders and files

Latest commit

History

Repository files navigation

Category_Comparison_Algorithim

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages