Inspiration
Our motivation stemmed from the need to accelerate sequence alignment tasks, such as reducing computational costs in bioinformatics. Traditional methods are time-consuming and expansive to implement.
What it does
Our AI focuses on improving multiple sequence alignment (MSA) for biological sequences to enhance tasks like detecting antimicrobial resistance and studying evolutionary relationships. Our model predicts optimal sequence alignment moves using features like sequence position, gap density, and mutation rate. By using a Deep Q-Network (DQN) and LSTM networks, the model captures sequential dependencies in biological data and optimizes alignment scores.
How we built it
Using Jupyter Notebook, we created the AI based on the starter notebook. Initially, a simple reinforcement learning model was used, but the need for advanced feature extraction and optimization techniques led to improvements, including the use of a custom scoring system (Optimized Gearbox Score) and combining LSTM-based sequence prediction with reinforcement learning. This approach resulted in faster, more accurate alignments with higher scores, offering potential savings in time and computational resources.
Challenges we ran into
The most significant challenge was obtaining representative data that could be used to train the model. Specifically, confirmatory data often lacks detailed information on how well compounds align with their targets, which made it difficult to accurately predict the alignment scores. Additionally, finding appropriate features (such as mutation rates, gap density, or sequence entropy) that would reliably improve our predictions was not straightforward. Finally, the inherent complexity of biological data, especially when dealing with sequence-based information.
Accomplishments that we're proud of
Our team was able to build a functioning AI despite the lack of experience in the field of machine learning.
What we learned
During the project, we learned a lot about machine learning and how it works. More specifically, we gained insight in using reinforcement learning for optimizing sequences. Some of the members also learned new and interesting stuff in the field of medical science and were able to see real life application of some concepts. We also learned about real life application of biological science and the use of AI.
What's next for CRICEPR
For the future, we plan to further explore biological motifs, evolutionary conservation, substitution matrices, and integrating genetic algorithms to further refine the model. Testing on larger datasets and applying the model to drug discovery pipelines are also planned.
Built With
- google-colab
- jupyter-notebook
- pytorch
Log in or sign up for Devpost to join the conversation.