Sentiment analysis of Uzbek texts using NER: a comparative study of SVM, LSTM, and BERT models
https://doi.org/10.55648/1998-6920-2025-19-4-3-16
Abstract
This paper presents a comparative analysis of machine learning (SVM), deep learning
(LSTM), and transformer-based (BERT) models for sentiment classification in Uzbek texts,
enhanced by Named Entity Recognition (NER). The study addresses the challenge of accurately
detecting sentiment in morphologically complex languages with limited resources, focusing on
Uzbek–a Turkic language with rich agglutinative structures. A dataset of 10,000 user-generated
comments from social platforms was annotated using a hybrid approach: manual labeling for
sentiment (positive, negative, neutral) and a CRF-based NER system to identify entities (e.g.,
brands, locations, public figures). The integration of NER features aimed to resolve contextual
ambiguities, such as distinguishing between "I love Samarkand’s history" (positive) and
"Samarkand’s traffic is unbearable" (negative). Experimental results demonstrate that BERT,
fine-tuned on Uzbek text, achieved the highest accuracy (90.2%) by leveraging contextualized
embeddings to align entities with sentiment. LSTM showed competitive performance (85.1%)
in sequential pattern learning but required extensive training data. SVM, while computationally
efficient, lagged at 78.3% accuracy due to its inability to capture nuanced linguistic
dependencies. The findings emphasize the critical role of NER in low-resource languages for
disambiguating sentiment triggers and propose practical guidelines for deploying BERT in real-
world applications, such as customer feedback analysis. Limitations, including data scarcity and
computational costs, are discussed to inform future research on optimizing lightweight models
for Uzbek NLP tasks.
About the Authors
Bobur Rashidovich SaidovUzbekistan
PhD student of Department of Mathematical modeling, numerical methods and software packages
Vladimir Borisovich Barakhnin
Russian Federation
Doctor of Technical Sciences, Associate Professor, Novosibirsk State University Novosibirsk, Russia. Federal Research Center for Information and Computational Technologies Novosibirsk, Russia
References
1. Lample G., Ballesteros M., Subramanian S., Kawakami K., Dyer C. Neural Architectures for Named Entity Recognition // Proceedings of NAACL-HLT. – 2016. – P. 260–270. DOI: 10.18653/v1/N16-1030
2. Bojanowski P., Grave E., Joulin A., Mikolov T. Enriching Word Vectors with Subword Information // arXiv preprint arXiv:1607.04606. – 2016. – 12 p. URL: https://arxiv.org/abs/1607.04606
3. Liu Y., Ott M., Goyal N., Du J., Joshi M. RoBERTa: A Robustly Optimized BERT Pretraining Approach // arXiv preprint arXiv:1907.11692. – 2019. – 13 p.
4. Xolmirzayev A., Yusupov S. Rule-Based Sentiment Analysis for Uzbek Texts // Proceedings of the International Conference on Information Science and Communications Technologies (ICISCT). – 2021. – P. 1–4.
5. Hochreiter S., Schmidhuber J. Long Short-Term Memory // Neural Computation. – 1997. – Vol. 9(8). – P. 1735–1780. DOI: 10.1162/neco.1997.9.8.1735
6. Lample G., Ballesteros M., Subramanian S., Kawakami K., Dyer C. Neural Architectures for Named Entity Recognition // Proceedings of NAACL-HLT. – 2016. – P. 260–270.
7. Kuriyozov Z., Muhamediev R. Uzbek Language Processing: Challenges and Opportunities // International Journal of Advanced Computer Science and Applications. – 2020. – Vol. 11(6). – P. 123–130. DOI: 10.14569/IJACSA.2020.0110616
8. Tjong Kim Sang E., De Meulder F. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition // Proceedings of CoNLL-2003. – 2003. – P. 142–147. URL: https://aclanthology.org/W03-0419
9. Hochreiter S., Schmidhuber J. Long Short-Term Memory // Neural Computation. – 1997. – Vol. 9(8). – P. 1735–1780.
10. Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // arXiv preprint arXiv:1301.3781. – 2013. – 12 p. URL: https://arxiv.org/abs/1301.3781
11. Devlin J., Chang M., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // arXiv preprint arXiv:1810.04805. – 2018. – 16 p. URL: https://arxiv.org/abs/1810.04805
12. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L. Attention Is All You Need // Advances in Neural Information Processing Systems (NIPS). – 2017. – P. 5998–6008. URL:
13. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aaAbstract.html
14. Pang B., Lee L. Opinion Mining and Sentiment Analysis // Foundations and Trends in Information Retrieval. – 2008. – Vol. 2. – P. 1–135.
15. Yusupov F., Abdullaev S. Named Entity Recognition for Uzbek Using Conditional Random Fields // Proceedings of AINL-ISMW. – 2019. – P. 45–52. URL: https://ceurws.org/Vol-2499/paper11.pdf
16. Abidov A., Mirzaev T. UzBERT: A Pretrained Language Model for Uzbek // Technical Report, Tashkent University of Information Technologies. – 2022. – 25 p. URL: https://archive.org/details/uzbert-report
17. Sutton C., McCallum A. An Introduction to Conditional Random Fields // Foundations and Trends in Machine Learning. – 2012. – Vol. 4(4). – P. 267–373. DOI: 10.1561/2200000013
18. Jiao X., Yin Y., Shang L., Jiang X., Chen X., Li L., Wang F., Liu Q. TinyBERT: Distilling BERT for Natural Language Understanding arXiv preprint arXiv:1909.10351. 2019. URL: https://arxiv.org/abs/1909.10351
19. Sanh V., Debut L., Chaumond J., Wolf T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter arXiv preprint arXiv:1910.01108. 2019. URL: https://arxiv.org/abs/1910.01108
20. Rakhimov S., Khamidov J. Development of a Morphological Analyzer for Uzbek // Journal of Natural Language Engineering. – 2021. – Vol. 27(3). – P. 311–328. DOI: 10.1017/S1351324921000047
21. Rasulov A., Karimov J. Building a Corpus for Low-Resource Languages: A Case Study on Uzbek // Proceedings of LREC. – 2022. – P. 112–119. URL: https://aclanthology.org/2022.lrec-1.12
Review
For citations:
Saidov B.R., Barakhnin V.B. Sentiment analysis of Uzbek texts using NER: a comparative study of SVM, LSTM, and BERT models. The Herald of the Siberian State University of Telecommunications and Information Science. 2025;19(4):3-17. https://doi.org/10.55648/1998-6920-2025-19-4-3-16

















