Imputation of Missing Genotypes with Intelegent K-Nearest Neighbore Algorithm

Vanaei, Fatemeh; Ghafouri-Kesbi, Farhad; Zamani, Pouya; Ahmadi, Ahmad

doi:10.52547/rap.13.35.130

Volume 13, Issue 35 (3-2022) Res Anim Prod 2022, 13(35): 130-138 | Back to browse issues page

‎ 10.52547/rap.13.35.130

‎ 20.1001.1.22518622.1401.13.35.9.5

Mendeley

Zotero

RefWorks

Vanaei F, Ghafouri-Kesbi F, Zamani P, Ahmadi A. (2022). Imputation of Missing Genotypes with Intelegent K-Nearest Neighbore Algorithm. Res Anim Prod. 13(35), 130-138. doi:10.52547/rap.13.35.130
URL: http://rap.sanru.ac.ir/article-1-1199-en.html

Imputation of Missing Genotypes with Intelegent K-Nearest Neighbore Algorithm

Fatemeh Vanaei¹, Farhad Ghafouri-Kesbi ^*¹, Pouya Zamani¹, Ahmad Ahmadi¹

1- Department of Animal Science, Faculty of Agriculture, Bu-Ali Sina University, Hamedan

Abstract: (2981 Views)

Extended Abstract
Introduction and Objective: Genotype imputation in genomic selection schemes has been considered by researchers in recent years because it can reduce the costs of genomic selection without having a negative impact on the accuracy of genomic selection. In the genotype imputation process, markers that their genotypic information has been missed for any reason are imputed using various statistical methods.
Material and Methods:To constructe genotypic matrix, a one morgan genome including one chromosome for 250 and 1000 individuals was simulated on which in different scenarios 250, 500, 750, 1000, 1500 and 2000 single necleotide polymorphismes (SNP) was distributed. In order to create genomic matrix including missing genotypes, genotypic information of respectively, 5%, 10%, 25%, 50%, 75% and 90% of SNPs was masked and then imputed with KNN. The percent of genotypes correctly imputed (the ratio of genotypes correctly imputed to total masked genotypes) as well as the correlation between primary genotypic matrix (no missing genotype) and imputed genotypic matrix were used as imputation accuracy.
Results: In the population including 250 individuals, the accuracy of imputation in the scenarios of 5%, 10%, 25%, 50%, 75% and 90% missing genotypes, were 0.82, 0.82, 0.80, 0.76, 0.62 and 0.40, respectively, but by increasing the size of the population to 1000 individuals, the imputation accuracies as 0.83, 0.83, 0.82, 0.82, 0.71 and 0.54 were obtained which in the scenarios of 75% and 90% of missing genotypes the increase in imputation accuracy was noticable. The correlation between the primary genotype matrix and the imputed genotypic matrix also decreased with increasing percentage of missing genotypes. In a fixed population size, by increasing the number of SNP from 250 to 2000, imputation accuracy increased from 0.67 to 0.84. In addition, an inverse relationship was observed between MAF and imputation accuracy in a way that by increasing MAF from 0.01 to 0.5, imputation accuracy decreased by 15%. Computation time increased following increase in dimension of genotypic matrix. Bu increasing the percent of missing genotypes, the accuracy of predicted genomic breeding values decreased. In the scenarios of 5 and 10% of missing genotypes, no change in accuracy was observed, but in the scenarios of 75 and 90% of the missing genotypes, the accuracy of prediction of breeding values decreased by 16 and 32%, respectively.
Conclusion: In general, imputation accuracy of KNN was acceptable in such a way that up to 50% of missing genotypes, KNN imputed missing genotypes with 80% accuracy and therefore one could recommend this algorithm for genomic selection schems.

Keywords: Genotype imputation, K-nearest neighbor, Minor allel frequency, Single nucleotide polymorphism

Full-Text [PDF 1120 kb] (879 Downloads)

Type of Study: Research | Subject: ژنتیک و اصلاح نژاد دام
Received: 2021/05/5 | Accepted: 2021/12/7

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Designed & Developed by : Yektaweb

how do you evaluate this site?
	Excellent
	Good
	Average
	weak

Research On Animal Production

Related Websites

Site Keywords

Vote