Preview

The Herald of the Siberian State University of Telecommunications and Information Science

Advanced search

Sentiment analysis of Uzbek texts using NER: a comparative study of SVM, LSTM, and BERT models

https://doi.org/10.55648/1998-6920-2025-19-4-3-16

Abstract

 This paper presents a comparative analysis of machine learning (SVM), deep learning
(LSTM), and transformer-based (BERT) models for sentiment classification in Uzbek texts,
enhanced by Named Entity Recognition (NER). The study addresses the challenge of accurately
detecting sentiment in morphologically complex languages with limited resources, focusing on
Uzbek–a Turkic language with rich agglutinative structures. A dataset of 10,000 user-generated
comments from social platforms was annotated using a hybrid approach: manual labeling for
sentiment (positive, negative, neutral) and a CRF-based NER system to identify entities (e.g.,
brands, locations, public figures). The integration of NER features aimed to resolve contextual
ambiguities, such as distinguishing between "I love Samarkand’s history" (positive) and
"Samarkand’s traffic is unbearable" (negative). Experimental results demonstrate that BERT,
fine-tuned on Uzbek text, achieved the highest accuracy (90.2%) by leveraging contextualized
embeddings to align entities with sentiment. LSTM showed competitive performance (85.1%)
in sequential pattern learning but required extensive training data. SVM, while computationally
efficient, lagged at 78.3% accuracy due to its inability to capture nuanced linguistic
dependencies. The findings emphasize the critical role of NER in low-resource languages for
disambiguating sentiment triggers and propose practical guidelines for deploying BERT in real-
world applications, such as customer feedback analysis. Limitations, including data scarcity and
computational costs, are discussed to inform future research on optimizing lightweight models
for Uzbek NLP tasks.

About the Authors

Bobur Rashidovich Saidov
Novosibirsk state university
Uzbekistan

PhD student of Department of Mathematical modeling, numerical methods and software packages



Vladimir Borisovich Barakhnin
Novosibirsk state university, Federal Research Center for Information and Computational Technologies
Russian Federation

Doctor of Technical Sciences, Associate Professor, Novosibirsk State University Novosibirsk, Russia. Federal Research Center for Information and Computational Technologies Novosibirsk, Russia



References

1. Lample G., Ballesteros M., Subramanian S., Kawakami K., Dyer C. Neural Architectures for Named Entity Recognition // Proceedings of NAACL-HLT. – 2016. – P. 260–270. DOI: 10.18653/v1/N16-1030

2. Bojanowski P., Grave E., Joulin A., Mikolov T. Enriching Word Vectors with Subword Information // arXiv preprint arXiv:1607.04606. – 2016. – 12 p. URL: https://arxiv.org/abs/1607.04606

3. Liu Y., Ott M., Goyal N., Du J., Joshi M. RoBERTa: A Robustly Optimized BERT Pretraining Approach // arXiv preprint arXiv:1907.11692. – 2019. – 13 p.

4. Xolmirzayev A., Yusupov S. Rule-Based Sentiment Analysis for Uzbek Texts // Proceedings of the International Conference on Information Science and Communications Technologies (ICISCT). – 2021. – P. 1–4.

5. Hochreiter S., Schmidhuber J. Long Short-Term Memory // Neural Computation. – 1997. – Vol. 9(8). – P. 1735–1780. DOI: 10.1162/neco.1997.9.8.1735

6. Lample G., Ballesteros M., Subramanian S., Kawakami K., Dyer C. Neural Architectures for Named Entity Recognition // Proceedings of NAACL-HLT. – 2016. – P. 260–270.

7. Kuriyozov Z., Muhamediev R. Uzbek Language Processing: Challenges and Opportunities // International Journal of Advanced Computer Science and Applications. – 2020. – Vol. 11(6). – P. 123–130. DOI: 10.14569/IJACSA.2020.0110616

8. Tjong Kim Sang E., De Meulder F. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition // Proceedings of CoNLL-2003. – 2003. – P. 142–147. URL: https://aclanthology.org/W03-0419

9. Hochreiter S., Schmidhuber J. Long Short-Term Memory // Neural Computation. – 1997. – Vol. 9(8). – P. 1735–1780.

10. Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // arXiv preprint arXiv:1301.3781. – 2013. – 12 p. URL: https://arxiv.org/abs/1301.3781

11. Devlin J., Chang M., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // arXiv preprint arXiv:1810.04805. – 2018. – 16 p. URL: https://arxiv.org/abs/1810.04805

12. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L. Attention Is All You Need // Advances in Neural Information Processing Systems (NIPS). – 2017. – P. 5998–6008. URL:

13. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aaAbstract.html

14. Pang B., Lee L. Opinion Mining and Sentiment Analysis // Foundations and Trends in Information Retrieval. – 2008. – Vol. 2. – P. 1–135.

15. Yusupov F., Abdullaev S. Named Entity Recognition for Uzbek Using Conditional Random Fields // Proceedings of AINL-ISMW. – 2019. – P. 45–52. URL: https://ceurws.org/Vol-2499/paper11.pdf

16. Abidov A., Mirzaev T. UzBERT: A Pretrained Language Model for Uzbek // Technical Report, Tashkent University of Information Technologies. – 2022. – 25 p. URL: https://archive.org/details/uzbert-report

17. Sutton C., McCallum A. An Introduction to Conditional Random Fields // Foundations and Trends in Machine Learning. – 2012. – Vol. 4(4). – P. 267–373. DOI: 10.1561/2200000013

18. Jiao X., Yin Y., Shang L., Jiang X., Chen X., Li L., Wang F., Liu Q. TinyBERT: Distilling BERT for Natural Language Understanding arXiv preprint arXiv:1909.10351. 2019. URL: https://arxiv.org/abs/1909.10351

19. Sanh V., Debut L., Chaumond J., Wolf T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter arXiv preprint arXiv:1910.01108. 2019. URL: https://arxiv.org/abs/1910.01108

20. Rakhimov S., Khamidov J. Development of a Morphological Analyzer for Uzbek // Journal of Natural Language Engineering. – 2021. – Vol. 27(3). – P. 311–328. DOI: 10.1017/S1351324921000047

21. Rasulov A., Karimov J. Building a Corpus for Low-Resource Languages: A Case Study on Uzbek // Proceedings of LREC. – 2022. – P. 112–119. URL: https://aclanthology.org/2022.lrec-1.12


Review

For citations:


Saidov B.R., Barakhnin V.B. Sentiment analysis of Uzbek texts using NER: a comparative study of SVM, LSTM, and BERT models. The Herald of the Siberian State University of Telecommunications and Information Science. 2025;19(4):3-17. https://doi.org/10.55648/1998-6920-2025-19-4-3-16

Views: 5


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1998-6920 (Print)