Taxonomic Landscape of the Dark Proteomes: Whole-Proteome Scale Interplay Between Structural Darkness, Intrinsic Disorder, and Crystallization Propensity

Document Type


Publication Date



dark proteomes, intrinsic disorder, protein universe, structural darkness, X-ray crystallography

Digital Object Identifier (DOI)


Growth rate of the protein sequence universe dramatically exceeds the speed of expansion for the protein structure universe, generating an immense dark proteome that includes proteins with unknown structure. A whole-proteome scale analysis of 5.4 million proteins from 987 proteomes in the three domains of life and viruses to systematically dissect an interplay between structural coverage, degree of putative intrinsic disorder, and predicted propensity for structure determination is performed. It has been found that Archaean and Bacterial proteomes have relatively high structural coverage and low amounts of disorder, whereas Eukaryotic and Viral proteomes are characterized by a broad spread of structural coverage and higher disorder levels. The analysis reveals that dark proteomes (i.e., proteomes containing high fractions of proteins with unknown structure) have significantly elevated amounts of intrinsic disorder and are predicted to be difficult to solve structurally. Although the majority of dark proteomes are of viral origin, many dark viral proteomes have at least modest crystallization propensity and only a handful of them are enriched in the intrinsic disorder. The disorder, structural coverage, and propensity are mapped for structural determination onto a novel proteome-level sequence similarity network to analyze the interplay of these characteristics in the taxonomic landscape.

Was this content written or created while at USF?


Citation / Publisher Attribution

PROTEOMICS, v. 18, issue 21-22, art. 1800243