Deep Learning-Based Imbalanced Data Classification for Drug Discovery

Korkmaz, Selcuk

Deep Learning-Based Imbalanced Data Classification for Drug Discovery

dc.authorid	Korkmaz, Selçuk/0000-0003-4632-6850
dc.authorwosid	Korkmaz, Selçuk/AAU-4677-2020
dc.contributor.author	Korkmaz, Selcuk
dc.date.accessioned	2024-06-12T11:17:42Z
dc.date.available	2024-06-12T11:17:42Z
dc.date.issued	2020
dc.department	Trakya Üniversitesi	en_US
dc.description.abstract	Drug discovery studies have become increasingly expensive and time-consuming processes. In the early phase of drug discovery studies, an extensive search has been performed to find drug-like compounds, which then can be optimized over time to become a marketed drug. One of the conventional ways of detecting active compounds is to perform an HTS (high-throughput screening) experiment. As of July 2019, the PubChem repository contains 1.3 million bioassays that are generated through HTS experiments. This feature of PubChem makes it a great resource for performing machine learning algorithms to develop classification models to detect active compounds for drug discovery studies. However, data sets obtained from PubChem are highly imbalanced. This imbalanced nature of the data sets has a negative impact on the classification performance of machine learning algorithms. Here, we explored the classification performance of deep neural networks (DNN) on imbalance compound data sets after applying various data balancing methods. We used five confirmatory HTS bioassays from the PubChem repository and applied one undersampling and three oversampling methods as data balancing methods. We used a fully connected, two-hidden-layer DNN model for the classification of active and inactive molecules. To evaluate the performance of the network, we calculated six performance metrics, including balanced accuracy, precision, recall, F1 score, Matthews correlation coefficient, and area under the ROC curve. The study results showed that the effect of imbalanced data on network performance could be mitigated to a degree by applying the data balancing methods. The level of imbalance, however, has a negative effect on the performance of the network.	en_US
dc.identifier.doi	10.1021/acs.jcim.9b01162
dc.identifier.endpage	4190	en_US
dc.identifier.issn	1549-9596
dc.identifier.issn	1549-960X
dc.identifier.issue	9	en_US
dc.identifier.pmid	32573225	en_US
dc.identifier.scopus	2-s2.0-85091807294	en_US
dc.identifier.scopusquality	Q1	en_US
dc.identifier.startpage	4180	en_US
dc.identifier.uri	https://doi.org/10.1021/acs.jcim.9b01162
dc.identifier.uri	https://hdl.handle.net/20.500.14551/24792
dc.identifier.volume	60	en_US
dc.identifier.wos	WOS:000576675900011	en_US
dc.identifier.wosquality	Q1	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	Scopus	en_US
dc.indekslendigikaynak	PubMed	en_US
dc.language.iso	en	en_US
dc.publisher	Amer Chemical Soc	en_US
dc.relation.ispartof	Journal Of Chemical Information And Modeling	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Available Python Package	en_US
dc.subject	Support Vector Machines	en_US
dc.subject	Nearest-Neighbor Rule	en_US
dc.subject	Neural-Networks	en_US
dc.title	Deep Learning-Based Imbalanced Data Classification for Drug Discovery	en_US
dc.type	Article	en_US

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Deep Learning-Based Imbalanced Data Classification for Drug Discovery

Dosyalar

Koleksiyon