Document Type


Publication Date



Cluster, PONDR, Intrinsically, Disordered, Kidera, Unfolded, Pfam, Protein, Cancer, Diabetes, Phylogenetic, Mammals, Eukaryota, Viruses, Bacteria, Archaea, APOC1, ANFB, DBND1, BAALC, PPR1A, ATTY, DSS1, TR13C, MYBB, LZTS2, HNF1A, NFM, APC, BRCA2

Digital Object Identifier (DOI)


The Pfam database groups regions of proteins by how well hidden Markov models (HMMs) can be trained to recognize similarities among them. Conservation pressure is probably in play here. The Pfam seed training set includes sequence and structure information, being drawn largely from the PDB. A long standing hypothesis among intrinsically disordered protein (IDP) investigators has held that conservation pressures are also at play in the evolution of different kinds of intrinsic disorder, but we find that predicted intrinsic disorder (PID) is not always conserved across Pfam domains. Here we analyze distributions and clusters of PID regions in 193024 members of the version 23.0 Pfam seed database. To include the maximum information available for proteins that remain unfolded in solution, we employ the 10 linearly independent Kidera factors1–3 for the amino acids, combined with PONDR4 predictions of disorder tendency, to transform the sequences of these Pfam members into an 11 column matrix where the number of rows is the length of each Pfam region. Cluster analyses of the set of all regions, including those that are folded, show 6 groupings of domains. Cluster analyses of domains with mean VSL2b scores greater than 0.5 (half predicted disorder or more) show at least 3 separated groups. It is hypothesized that grouping sets into shorter sequences with more uniform length will reveal more information about intrinsic disorder and lead to more finely structured and perhaps more accurate predictions. HMMs could be trained to include this information.

Rights Information

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 License

Was this content written or created while at USF?


Citation / Publisher Attribution

Intrinsically Disordered Proteins, v. 1, issue 1, art. e25724