Dissertations, Theses, and Capstone Projects
Date of Degree
2-2021
Document Type
Dissertation
Degree Name
Ph.D.
Program
Computer Science
Advisor
Abdullah Uz Tansel
Committee Members
William Sakas
XiangDong Li
Reda Alhajj
Subject Categories
Artificial Intelligence and Robotics | Databases and Information Systems | Data Science
Keywords
Data Mining, Machine Learning, Feature Selection Methods
Abstract
Feature selection is a key process for supervised learning algorithms. It involves discarding irrelevant attributes from the training dataset from which the models are derived. One of the vital feature selection approaches is Filtering, which often uses mathematical models to compute the relevance for each feature in the training dataset and then sorts the features into descending order based on their computed scores. However, most Filtering methods face several challenges including, but not limited to, merely considering feature-class correlation when defining a feature’s relevance; additionally, not recommending which subset of features to retain. Leaving this decision to the end-user may be impractical for multiple reasons such as the experience required in the application domain, care, accuracy, and time. In this research, we propose a new hybrid Filtering method called Class Association Rule Filter (CARF) that deals with the aforementioned issues by identifying relevant features through the Class Association Rule Mining approach and then using these rules to define weights for the available features in the training dataset. More crucially, we propose a new procedure based on mutual information within the CARF method which suggests the subset of features to be retained by the end-user, hence reducing time and effort. Empirical evaluation using small, medium, and large datasets that belong to various dissimilar domains reveals that CARF was able to reduce the dimensionality of the search space when contrasted with other common Filtering methods. More importantly, the classification models devised by the different machine learning algorithms against the subsets of features selected by CARF were highly competitive in terms of various performance measures. These results indeed reflect the quality of the subsets of features selected by CARF and show the impact of the new cut-off procedure proposed.
Recommended Citation
Al-Dhaheri, Sami A., "A New Feature Selection Method Based on Class Association Rule" (2021). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/4141
Included in
Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Data Science Commons