The integrated approach to text lexical characteristics study
Abstract
The integrated approach and software environment for multi-aspect study of the text lexical characteristics are considered. This work is at the junction of corpus linguistics and lexicographical research. The basis of the research is the corpus of text and the problem-oriented dictionary. The proposed environment for supporting the researcher provides tools and interfaces for developing vocabularies and a system of domain features, terms markup, automatic generation of lexical content and accumulation of statistical information, etc. To extract terms the morphological analysis and the construction of phrases based on the rules of matching the grammatical characteristics of words are carried out. To study the contexts of the terms use, concordance construction tools are provided. Concordances allow the researcher to test his or her hypothesis about the functionality of a particular lexical unit. The considered environment allows to solve various text analysis tasks because it integrates various tools for conducting language research and supports customization of vocabularies to a problem area.
References
1. Лукашевич Н. В. Тезаурусы в задачах информационного поиска. М.: МГУ, 2011. 495 c.
2. Sinclair J. Corpus, Concordance, Collocation. Edited by Ronald Carter. Oxford: Oxford University Press, 1991, XVIII, 179. 200 p.
3. Захаров В. П., Хохлова М. В. Автоматическое выявление терминологических словосочетаний // Структурная и прикладная лингвистика. 2014. Вып. 10. С. 182–200.
4. Bolshakova E., Loukachevitch N., Nokel M. Topic Models Can Improve Domain Term Extraction // International conference on Information Retrieval (ECIR-13), Springer Verlag, 2013. LNCS-7814. P. 684–687.
5. Митрофанова О. А., Захаров В. П. Автоматизированный анализ терминологии в русскоязычном корпусе текстов // Компьютерная лингвистика и интеллектуальные технологии: тр. межд. конференции «Диалог-2009». С. 321–328.
6. Сокирко А. В. Морфологические модули на сайте www.aot.ru // Компьютерная лингвистика и интеллектуальные технологии: тр. межд. конференции Диалог-2004. С. 559–564.
Review
For citations:
Sidorova E. The integrated approach to text lexical characteristics study. The Herald of the Siberian State University of Telecommunications and Information Science. 2019;(3):80-88. (In Russ.)