Supervised Bernoulli Text Topic Identification Model using Naïve Bayes

Author: Suresh Kumar Sharma, Kanchan Jain* and Gurpreet Singh Bawa

Journal Name:

PDF Download PDF

Abstract

In this paper, the concept of document models is conversed with respect to the Bernoulli document approach, that is on basis of the presence or absence of primary blocks of the documents, namely tokens. The research primarily deals with how an unstructured dataset consisting of text documents is converted to structured content with mathematical and statistical foundation and then topic of conversation is predicted (or estimated) based on Bernoulli assumptions. The application of Naïve Bayes approach is discussed for the model under consideration. Examples and sample code snippets in R and Python to execute the same have been included for Bernoulli document model.

Keywords

Text classification, Naïve Bayes, Topic Modelling, Bernoulli Distribution

Conclusion

In this research, it is seen how categorization of unlabeled documents can be done using underlying Bernoulli distribution for words, based on the posterior probabilities obtained from the pre-labeled training dataset. This approach, which is lexical in nature, deals with the features obtained from the labeled training dataset only and focuses on just the presence/absence of words across the documents. Thus, in this way, the supervised approach classifies the unlabeled documents to either of the categories under study.

References

-

How to cite this article

Suresh Kumar Sharma, Kanchan Jain and Gurpreet Singh Bawa (2022). Supervised Bernoulli Text Topic Identification Model using Naïve Bayes. International Journal on Emerging Technologies, 13(1): 15–21.