How Many Differentially Expressed Genes: A Perspective from the Comparison of Genotypic and Phenotypic Distances

Document Type


Publication Date



Microarray data analysis, Differentially expressed genes, Genetic distance, Genotypically and phenotypically significant DEGsgps, DEGs

Digital Object Identifier (DOI)



Identifying differentially expressed genes is critical in microarray data analysis. Many methods have been developed by combining p-value, fold-change, and various statistical models to determine these genes. When using these methods, it is necessary to set up various pre-determined cutoff values. However, many of these cutoff values are somewhat arbitrary and may not have clear connections to biology. In this study, a genetic distance method based on gene expression level was developed to analyze eight sets of microarray data extracted from the GEO database. Since the genes used in distance calculation have been ranked by fold-change, the genetic distance becomes more stable when adding more genes in the calculation, indicating there is an optimal set of genes which are sufficient to characterize the stable difference between samples. This set of genes is differentially expressed genes representing both the genotypic and phenotypic differences between samples.

Was this content written or created while at USF?


Citation / Publisher Attribution

Genomics, v. 110, issue 1, p. 67-73