Machine learning and feature selection techniques in chemoinformatics

 Software used for developing QSAR model:

  • SVM SVM is a supervised learning technique, used for classification and regression analysis. The QSAR models can be optimized using different SVM parameters and kernels.
  • ANN ANN is based on supervised learning, unsupervised learning and reinforcement learning. SNNS (Stuttgart Neural Network Simulator) is a free software simulator for neural networks.
  • kNN The k-nearest neighbor algorithm (k-NN) is a method for classifying objects based on closest training examples. TiMBL is an open source software package implementing k-nearest neighbor classification.
  • Weka is a collection of visualization tools and algorithms for data analysis and predictive modeling, It contains libSVM, SMO, NaiveBayes, LMT, Random Forest etc learning algorithms.

 

Feature selection techniques

  • Weka Weka (Waikato Environment for Knowledge Analysis) is a popular java based tool used in feature selection.
  • Rapid miner Rapidminer is a open-source software widely used for machine learning, data mining and feature selection.
  • Orange Orange is a data mining and machine learning tool used in feature selection and data analysis. Orange.feature.selection
    module provides feature selection facilities.
  • RRF Regularized Random Forest (RRF) is an R package based feature selection techniques. In RRF, a set of non-redundant features can be selected without loss of predictive information