Acyl-homoserine-lactone (AHLs) based quorum sensing (QS) has shown great application potential in wastewater treatment process, such as biofilm formation, membrane biofouling control, and sludge granulation process. The identification of AHLs producing bacteria is necessary for in-depth research on QS mechanism and applications of biological wastewater treatment process.
In this work, we propose the first prediction model for AHLs synthase (AHLS) bacteria based on machine learning, namely AHLS-pred. We first establish a balanced training dataset AHLS1400 and independent testing dataset AHLS132 for AHLSs prediction. Subsequently, three sequence-based feature extraction methods are utilized to generate feature vectors. To effectively use the feature descriptors, we explored a feature representation learning strategy that is able to automatically learn the most discriminative features in a supervised way. By comparing the prediction performance of five different commonly used machine learning algorithms, the final prediction model is trained with support vector machine (SVM) classifier on AHLS1400 with the best performance (ACC=99.43%, MCC=0.989, AUC=0.997). The results show that the final prediction model achieves an ACC of 94.70%, MCC of 0.894, and AUC of 0.995 on AHLS132.
It demonstrates that AHLS-pred is promising and powerful prediction method for accelerating the process regarding the computational identification of AHLSs, facilitating to analyze the distribution of quorum sensing microorganisms in the sewage biological treatment system, and evaluating the feasibility of enhancing the quorum sensing microorganisms to improve the efficiency of wastewater treatment.
1. To use the classifier to predict AHLSs or Non-AHLSs, you can choose one of the following ways:
It is noted that Fasta must be the same format like sample, otherwise it may result in failure.
2. Select the confidence from 0.5 to 1, then the results that exceed the prediction confidence will be output.
3. Then press the "Submit" button to upload the sequences and begin the classification.
4. Please wait patiently for the results web page, and you can also choose to download the results in CSV format.
Java 1.8.0_144 WEKA 3.8 Eclipse
Python 3.7.x pandas scikit-learn numpy matplotlib