Derin öğrenme ile ilaç moleküllerinin aktivitelerinin sınıflandırılması

Kanberiz, Hatice

View/Open

File_10220353 (1.276Mb)

Date

2020

Author

Kanberiz, Hatice

Metadata

Show full item record

Abstract

İlaç geliştirme çalışmalarının erken evresinde binlerce molekül arasından aktivite gösteren moleküller tespit edilerek ilaç geliştirme çalışmalarına harcanan süre ve maliyet azaltılmaya çalışılmaktadır. Bu amaçla yüksek verimli tarama deneyleri yapılarak moleküller aktif ve inaktif olarak sınıflandırılmaktadır. Bu deneylerden elde edilen veriler PubChem veri tabanına yüklenmektedir. Bu veri tabanındaki veriler kullanılarak makine öğrenimi algoritmaları yardımıyla sınıflandırma modelleri geliştirilebilir, böylece aktivite gösteren moleküller daha hızlı ve daha ucuz bir şekilde tespit edilebilir. Bu çalışmada PubChem veri tabanından elde edilen farklı derecelerde dengesizlik yapısına sahip 5 adet veri seti derin sinir ağları (DSA) algoritmasıyla eğitilmiştir. Eğitilen DSA algoritmasının performansı literatürde sıklıkla kullanılan destek vektör makineleri (DVM) ve random forest (RF) algoritmalarıyla karşılaştırılmıştır. Algoritmaların performans karşılaştırmasında dengeli doğruluk oranı, duyarlılık, pozitif kestirim değeri, F1 skor, MCC ölçütleri göz önüne alınmıştır. Bu ölçütler değerlendirildiğinde, pozitif kestirim değeri dışındaki diğer ölçütler açısından, özellikle dengesiz veri setlerinde performans değerlendirmesinde en önemli ölçütlerden olan F1 skor ve MCC açısından, DSA algoritmasının DVM ve RF algoritmalarına göre daha yüksek performans gösterdiği görülmüştür. Sonuç olarak, DSA algoritması dengesiz veri yapılarında diğer makine öğrenimi algoritmalarına göre daha iyi bir performans gösterdiği için ilaç geliştirme çalışmalarına harcanan süreyi ve maliyeti azaltmada tercih edilebilecek iyi bir makine öğrenimi algoritmasıdır.

In the early stages of drug development studies, molecules that are active among thousands of molecules are identified and the time and cost spent on drug development studies are tried to be reduced. For this purpose, molecules are classified as active and inactive by performing high-throughput screening experiments. The data obtained from these experiments are uploaded to PubChem database. By using the data in this database, classification models can be developed with the help of machine learning algorithms, so that the molecules showing activity can be detected faster and cheaper. In this study, 5 data sets with different degree of imbalance structure obtained from PubChem database were trained with deep neural network (DSA) algorithm. The performance of the trained DSA algorithm was compared with the support vector machines (DVM) and random forest (RF) algorithms that are frequently used in the literature. Balanced accuracy, sensitivity, positive predictive value, F1 score and MCC criteria were taken into consideration in the performance comparison of the algorithms. When these criteria were evaluated, it was observed that DSA algorithm performed better than DVM and RF algorithms in terms of F1 score and MCC which is one of the most important criteria in performance evaluation especially in unbalanced data sets in terms of other criteria except positive predictive value.As a result, DSA algorithm is a good machine learning algorithm that can be preferred in reducing time and cost spent on drug development studies because it performs better in unbalanced data structures than other machine learning algorithms.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/399833

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess