A Strategy for Improved Data Classification using Advanced clustering with Incomplete Datasets

Author: P.S. Deshmukh, M. Sivakkumar and Varshaha Namdeo

Journal Name:

PDF Download PDF

Abstract

This study explores the application of machine learning techniques, including clustering, in various real-world domains such as cyber security, healthcare, and agriculture. It emphasizes the importance of understanding different methods like supervised, unsupervised, semi-supervised, and reinforcement learning. Clustering algorithms are particularly powerful for analyzing large volumes of data by grouping similar objects into clusters. Subspace clustering extends this concept to identify clusters in different subspaces within high-dimensional data. The study aims to address challenges like determining optimal initial cluster positions and identifying research gaps in unsupervised learning. Its findings will aid researchers in exploring new directions and comparing the effectiveness of different algorithms. Challenges in implementing improved data classification using advanced clustering with incomplete datasets may arise from difficulties in handling missing data effectively, potential biases introduced by incomplete information, and the need for robust algorithms that can adapt to diverse data patterns while ensuring accurate classification results

Keywords

Data Mining Tools, Machine Learning, supervised learning, Unsupervised learning Clustering algorithms, Artificial Intelligence, Time Complexity, big data, Similarity Measure

Conclusion

In conclusion, this paper presents a novel multilayer data clustering framework that integrates feature selection and a modified K-Means algorithm, demonstrating superior performance over existing methods on gene data. Moreover, it highlights the significance of addressing noisy or uncertain information for clustering and classification tasks, as evidenced by the enhanced classification accuracy achieved through quadratic discriminant analysis. Moving forward, future research endeavors should focus on exploring additional databases, algorithms, and statistical distributions to further improve clustering and classification outcomes. Comparative studies among diverse algorithms and investigations into semi-supervised classification techniques could provide valuable insights for advancing the field. Furthermore, examining the stability and accuracy of ensembles comprising single clustering algorithms versus those comprising multiple clustering algorithms would be a promising avenue for future exploration. Overall, these endeavors aim to enhance the robustness and efficacy of unsupervised learning techniques in handling complex, real-world datasets

References

-

How to cite this article

P.S. Deshmukh, M. Sivakkumar and Varshaha Namdeo (2020). A Strategy for Improved Data Classification using Advanced clustering with Incomplete Datasets. International Journal on Emerging Technologies, 11(5): 746–755