SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.
RSS Feed

HELP: Tutorials | FAQ
CONTACT US: Contact info

Search Results

Journal Article

Citation

Gite S, Patil S, Dharrao D, Yadav M, Basak S, Rajendran A, Kotecha K. Big Data Cogn. Comput. 2023; 7(1): e45.

Copyright

(Copyright © 2023, MDPI: Multidisciplinary Digital Publications Institute)

DOI

10.3390/bdcc7010045

PMID

unavailable

Abstract

Feature selection and feature extraction have always been of utmost importance owing to their capability to remove redundant and irrelevant features, reduce the vector space size, control the computational time, and improve performance for more accurate classification tasks, especially in text categorization. These feature engineering techniques can further be optimized using optimization algorithms. This paper proposes a similar framework by implementing one such optimization algorithm, Ant Colony Optimization (ACO), incorporating different feature selection and feature extraction techniques on textual and numerical datasets using four machine learning (ML) models: Logistic Regression (LR), K-Nearest Neighbor (KNN), Stochastic Gradient Descent (SGD), and Random Forest (RF). The aim is to show the difference in the results achieved on both datasets with the help of comparative analysis. The proposed feature selection and feature extraction techniques assist in enhancing the performance of the machine learning model. This research article considers numerical and text-based datasets for stroke prediction and detecting hate speech, respectively. The text dataset is prepared by extracting tweets consisting of positive, negative, and neutral sentiments from Twitter API. A maximum improvement in accuracy of 10.07% is observed for Random Forest with the TF-IDF feature extraction technique on the application of ACO. Besides, this study also highlights the limitations of text data that inhibit the performance of machine learning models, justifying the difference of almost 18.43% in accuracy compared to that of numerical data.


Language: en

Keywords

Ant Colony Optimization (ACO); Bag of Words (BoW); Chi-square test; feature engineering; machine learning; Term Frequency–Inverse Document Frequency (TF-IDF)

NEW SEARCH


All SafetyLit records are available for automatic download to Zotero & Mendeley
Print