Molecular Medicine Faculty Publications

Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

Jiangang Liu, Indiana University School of Medicine
Robert A. Jolly, Lilly Research Laboratories
Aaron T. Smith, Lilly Research Laboratories
George H. Searfoss, Lilly Research Laboratories
Keith M. Goldstein, Lilly Research Laboratories
Vladimir N. Uversky, University of South FloridaFollow
Keith Dunker, Indiana University School of Medicine
Shuyu Li, Indiana University School of Medicine
Craig E. Thomas, Lilly Research Laboratories
Tao Wei, Lilly Research Laboratories

Document Type

Article

Publication Date

2011

Keywords

Necrosis, Predictive Toxicology, Toxicity, Microarrays, Algorithms, Inflammation, Biomarkers, Drug Interactions

Digital Object Identifier (DOI)

https://doi.org/10.1371/journal.pone.0024233

Abstract

Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses.

Was this content written or created while at USF?

Yes

Citation / Publisher Attribution

PLoS ONE, v. 6, issue 9, art. e24233

© 2011 Liu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Scholar Commons Citation

Liu, Jiangang; Jolly, Robert A.; Smith, Aaron T.; Searfoss, George H.; Goldstein, Keith M.; Uversky, Vladimir N.; Dunker, Keith; Li, Shuyu; Thomas, Craig E.; and Wei, Tao, "Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery" (2011). Molecular Medicine Faculty Publications. 454.
https://digitalcommons.usf.edu/mme_facpub/454

Download

Find in your library

Included in

Medicine and Health Sciences Commons

COinS

Molecular Medicine Faculty Publications

Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

Document Type

Publication Date

Keywords

Digital Object Identifier (DOI)

Abstract

Was this content written or created while at USF?

Citation / Publisher Attribution

Scholar Commons Citation

Included in

Search

Browse By

Useful Links

Molecular Medicine Faculty Publications

Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

Authors

Document Type

Publication Date

Keywords

Digital Object Identifier (DOI)

Abstract

Was this content written or created while at USF?

Citation / Publisher Attribution

Scholar Commons Citation

Included in

Share

Search

Browse By

Useful Links