We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Homonym Sense Disambiguation in the Georgian Language

Abstract: This research proposes a novel approach to the Word Sense Disambiguation (WSD) task in the Georgian language, based on supervised fine-tuning of a pre-trained Large Language Model (LLM) on a dataset formed by filtering the Georgian Common Crawls corpus. The dataset is used to train a classifier for words with multiple senses. Additionally, we present experimental results of using LSTM for WSD. Accurately disambiguating homonyms is crucial in natural language processing. Georgian, an agglutinative language belonging to the Kartvelian language family, presents unique challenges in this context. The aim of this paper is to highlight the specific problems concerning homonym disambiguation in the Georgian language and to present our approach to solving them. The techniques discussed in the article achieve 95% accuracy for predicting lexical meanings of homonyms using a hand-classified dataset of over 7500 sentences.
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2405.00710 [cs.CL]
  (or arXiv:2405.00710v1 [cs.CL] for this version)

Submission history

From: Davit Melikidze [view email]
[v1] Wed, 24 Apr 2024 21:48:43 GMT (67kb)

Link back to: arXiv, form interface, contact.