ABLE: Attention Based Learning for Enzyme Classification

Abstract

Classifying proteins into their respective enzyme class is an interesting question for researchers for a variety of reasons. The open source Protein Data Bank (PDB) contains more than 1,60,000 structures, with more being added everyday. This paper proposes an attention-based bidirectional-LSTM model (ABLE) trained on over sampled data generated by SMOTE to analyse and classify a protein into one of the six enzyme classes or a negative class using only the primary structure of the protein described as a string by the FASTA sequence as an input. We achieve the highest F1-score of 0.834 using our proposed model on a dataset of proteins from the PDB. We baseline our model against eighteen other machine learning and deep learning networks, including CNN, LSTM, Bi-LSTM, GRU, and the state-of-the-art DeepEC model. We conduct experiments with two different oversampling techniques, SMOTE and ADASYN. To corroborate the obtained results, we perform extensive experimentation and statistical testing

Publication
In Computational Biology and Chemistry
Vamsi Nallapareddy
Vamsi Nallapareddy
Research Assistant at University College London

Research Interests: Computational Biology, Bioinformatics, and Deep Learning