Skip to content

torischerle/QSAR-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

QSAR Drug Discovery Project

This repository contains code and data for a QSAR (Quantitative Structure-Activity Relationship) drug discovery project. The goal of this project is to develop predictive models that can estimate the biological activity of chemical compounds based on their molecular structure.

Features

  • Data preprocessing and feature extraction
  • Machine learning model training and evaluation
  • Random Forest Regressor implementation
  • Support for various molecular descriptors
  • Visualization of results
  • Hyperparameter tuning
  • Cross-validation
  • Documentation and examples

Requirements

  • Python 3.7+
  • pandas
  • scikit-learn
  • numpy
  • matplotlib
  • seaborn
  • RDKit (for cheminformatics tasks)

Dataset

The dataset used in this project is a collection of psychoactive compounds in a CSV file format from https://www.kaggle.com/datasets/thedevastator/psychedelic-drug-database File was renamed before import to QSARDrugAnalysis.csv.

This example uses SlogP as the target variable but in principle any other variable can be used. Generally, biological activity data (ie IC50, EC50, kcal/mol etc) will be used as the target variable. Currently, my research is not published yet, so I cannot share the actual dataset I am using. The dataset should contain molecular structures (e.g., SMILES strings) and their corresponding biological activity values.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages