The usefulness of the oversampling approach to class-imbalanced structured medical datasets is discussed in this paper. In this regard, we basically look into the oversampling approach's prevailing assumption that synthesized instances do belong to the minority class. We used an off-the-shelf oversampling validation system to test this assumption. According to the experimental results from the validation system, at least one of the three medical datasets used had newly generated samples that were not belonging to the minority class as a result of the oversampling methods validated. Additionally, the error rate varied based on the dataset and oversampling method tested. Therefore, we claim that synthesizing new instances without first confirming that they are aligned with the minority class is a risky approach, especially in medical fields where misdiagnosis can have serious repercussions. As alternatives to oversampling, ensemble, data partitioning, and method-level approaches are advised since they do not make false assumptions.
The Jeopardy of Learning from Over-Sampled Class-Imbalanced Medical Datasets
- Details
- Written by Ahmad Hassanat, Ghada Altarawneh, Ibraheem M Alkhawaldeh, Yasmeen Jamal Alabdallat, Amir F Atiya, Ahmad Abujaber, Ahmad S Tarawneh
- Category: Computer Science
- Hits: 5