Big data classification is a challenging task because most known classification methods need a long time and a lot of processing resources to execute such a task and use the vast amount of available data. In this paper, we propose a novel big data classification method that leverages the power of the KNN classifier and the efficiency of the ensemble learning technique to create a new method capable of performing classification tasks on big data efficiently. The proposed method picks tiny data chunks at random from a big dataset, with each chunk including random examples of a small number of randomly selected features. A weak KNN classifier is employed on each data chunk to perform classification on new (unseen) data, and the majority voting rule is used to reach the final classification decision based on the outcomes of the weak classifiers. The proposed method has a constant classification time, according to the time complexity analysis. Furthermore, the proposed method was found to be more efficient on a single node than existing methods, some of which run on a large cluster of nodes. Because of its speed and enhanced performance, the proposed method can be considered an ideal classifier for handling complex data types such as Geospatial data, Big trajectory data, and Big Data in general.
CTELC: A Constant-Time Ensemble Learning Classifier Based on KNN for Big Data
- Details
- Written by Ahmad S Tarawneh, Eman S Alamri, Najah Noori Al-Saedi, Mohammad Alauthman, Ahmad B Hassanat
- Category: Computer Science
- Hits: 8