A Comprehensive Literature Review on Advance Language Toxicity Detection using Deep Learning
Author: Shaina Chaudhary
Journal Name:
Download PDF
Abstract
A deep learning model with an NLP component is suggested for the moderation of toxic content in Hindi text, reflecting the need for toxic language moderation in online platforms. The increase in Hindi communication over the internet requires automated systems that can recognize abusive and harmful text. The model employs BERT, RoBERTa, and XLM-R which are based on transformers and have multilingual understanding and contextual capabilities. The model is well-versed in Hindi language and is trained to classify text as either toxic or non-toxic, although being exposed to slang, code-mixed sentences, and cultural phenomena. The use of deep learning traditional methodologies LSTM and BiLSTM improves sequence and contextual accuracy. The low complexity modification achieved 85.76% precision, 83.76% recall, and 84.25% F1 score, which renders it suitable for moderation of online forums and posting in Hindi language
Keywords
NLP, Deep Learning, BERT, RoBERTa, XLM-R, BiLSTM
Conclusion
Hindi toxicity detection has advanced greatly because deep learning models are now able to discern hate speech, an offensive language, and abusive content. Classifications are being performed more accurately with LSTM, BiLSTM, BERT, and XLM-RoBERTa. Unlike classic methods, transformer-based strategies deliver higher accuracy because of their understanding of context within the text. Multilingual BERT and XLM-R have shown great results with the processing of Hindi texts making them greatly applicable in the real world. Nevertheless, many problems persist. The lack of big and varied databases makes guessing the model difficult, as well as, dealing with code-mixed Hindi English text that are transliterated. Also, most models that are used do not account for implicit hate speech, sarcasm, or context abuse, which is sadly very prevalent. Current methods of working with the issue are also heavily focused on the written word while ignoring the abuse present in the speech, video, or memes which are essential of the internet. This gives the impression that a holistic approach is not considered. Further exploration includes extending databases and embedding work in different languages, as well as, updating explainable XAI, regarding improving clarity and limiting discrimination and unjust treatment. Furthermore, posting monitoring tools that would escalate the speed and lower the weight in real-time would widen the use and ease of practical application
References
-
How to cite this article
-