Start Date

5-12-2025 12:00 PM

End Date

5-12-2025 1:00 PM

Description

Protein function prediction is of high importance when it comes to applications such as bioinformatics, of which drug discovery and disease research, in particular, are key components. Experimental methods for determining protein function are accurate but typically cumbersome and expensive, and this is why computational approaches are needed for this. This study specifically explores, through a research-based approach, different deep learning methods to predict protein functions using amino acid sequences, structural information, and protein interaction data. The proposed framework combines Convolutional Neural Networks to capture local sequence patterns with Graph Neural Networks to model global structural relationships and protein interactions. Protein language models such as ESM-1b are also used to extract meaningful representations from protein sequences. The system predicts functions according to Gene Ontology categories of the following: Biological Process, Molecular Function, and Cellular Component. Model evaluation is made and planned using CAFA metrics including F-max, Area Under the Precision Recall curve (AUPR), and S-min. These metrics help assess prediction accuracy and semantic distance between predicted and true annotations, providing a standardized way to compare different predictive models. The study also considers the integration of sequence-based and structure-based information to better capture both local and global characteristics of proteins. By combining these complimentary representations, the framework aims to better understand complex biological patterns that influence protein functionality. Future work aims to extend these methods to intrinsically disordered proteins and regions (IDP/IDRs), which lack stable structures but still play very important biological roles in cellular signalling, gene regulation, and disease mechanisms. Understanding these proteins and regions remains a major challenge in computational biology, and improving predictive models for such proteins may contribute to more effective biological analysis and biomedical applications in the near and far future.

Share

COinS
 
Dec 5th, 12:00 PM Dec 5th, 1:00 PM

Evaluation of Deep Learning Approaches for Protein Function Prediction

Protein function prediction is of high importance when it comes to applications such as bioinformatics, of which drug discovery and disease research, in particular, are key components. Experimental methods for determining protein function are accurate but typically cumbersome and expensive, and this is why computational approaches are needed for this. This study specifically explores, through a research-based approach, different deep learning methods to predict protein functions using amino acid sequences, structural information, and protein interaction data. The proposed framework combines Convolutional Neural Networks to capture local sequence patterns with Graph Neural Networks to model global structural relationships and protein interactions. Protein language models such as ESM-1b are also used to extract meaningful representations from protein sequences. The system predicts functions according to Gene Ontology categories of the following: Biological Process, Molecular Function, and Cellular Component. Model evaluation is made and planned using CAFA metrics including F-max, Area Under the Precision Recall curve (AUPR), and S-min. These metrics help assess prediction accuracy and semantic distance between predicted and true annotations, providing a standardized way to compare different predictive models. The study also considers the integration of sequence-based and structure-based information to better capture both local and global characteristics of proteins. By combining these complimentary representations, the framework aims to better understand complex biological patterns that influence protein functionality. Future work aims to extend these methods to intrinsically disordered proteins and regions (IDP/IDRs), which lack stable structures but still play very important biological roles in cellular signalling, gene regulation, and disease mechanisms. Understanding these proteins and regions remains a major challenge in computational biology, and improving predictive models for such proteins may contribute to more effective biological analysis and biomedical applications in the near and far future.