Predicting TF Proteins by Incorporating Evolution Information Through PSSM

Document Type


Publication Date



Proteins, Feature Extraction, Deep Learning, Databases, Convolution, Predictive Models, Neural Networks

Digital Object Identifier (DOI)


Transcription factors (TFs) are DNA binding proteins involved in the regulation of gene expression. They exist in all organisms and activate or repress transcription by binding to specific DNA sequences. Traditionally, TFs have been identified by experimental methods that are time-consuming and costly. In recent years, various computational methods have been developed to identify TF to overcome these limitations. However, there is a room for further improvement in the predictive performance of these tools in terms of accuracy. We report here a novel computational tool, TFnet, that provides accurate and comprehensive TF predictions from protein sequences. The accuracy of these predictions is substantially better than the results of the existing TF predictors and methods. Especially, it outperforms comparable methods significantly when sequence similarity to other known sequences in the database drops below 40%. Ablation tests reveal that the high predictive performance stems from innovative ways used in TFnet to derive sequence Position-Specific Scoring Matrix (PSSM) and encode inputs.

Was this content written or created while at USF?


Citation / Publisher Attribution

IEEE/ACM Transactions on Computational Biology and Bioinformatics, v. 20, issue 2, p. 1319-1326