Name: Poster 019: Classifying Enzyme Substrates Using Machine Learning
Start: 2026-04-30T14:00:00-0500
End: 2026-04-30T16:00:00-0500

Poster 019: Classifying Enzyme Substrates Using Machine Learning

Thursday April 30, 2026 2:00pm - 4:00pm CDT

Davies Center: Ojibwe Ballroom (330)

Knowing what types of enzymes a molecule will interact with can aid drug development by minimizing side effects due to unwanted interactions. In this project, we built and interpreted models for classifying enzyme substrates, aiming to address an information gap in this understanding: the distinguishing properties of the substrates of each major enzyme class. We utilized the machine learning technique XGBoost in Python to build a predictive model for each enzyme class using molecular data as well as top linear combinations of the data obtained using Principal Components Analysis. We will discuss the algorithm we developed to automatically tune the parameters of XGBoost to optimize the model. We will also present examples of how to interpret these models using graphs to visualize the impact of variables in each model and identifying common factors in the top contributing variables of significant principal components to characterize each enzyme class. For example, we found that the probability of a molecule interacting with oxidoreductase enzymes is positively associated with the number of nonpolar regions. A particular descriptor is NumHeteroAtoms, the number of non-carbon atoms in the molecule, which was negatively associated with the probability of interacting with oxidoreductases.

Presenters