A Tulu Resource for Machine Translation

Narayanan, Manu; Aepli, Noëmi

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2403

Change to browse by:

Computer Science > Computation and Language

Title: A Tulu Resource for Machine Translation

Authors: Manu Narayanan, Noëmi Aepli

(Submitted on 28 Mar 2024)

Abstract: We present the first parallel dataset for English-Tulu translation. Tulu, classified within the South Dravidian linguistic family branch, is predominantly spoken by approximately 2.5 million individuals in southwestern India. Our dataset is constructed by integrating human translations into the multilingual machine translation resource FLORES-200. Furthermore, we use this dataset for evaluation purposes in developing our English-Tulu machine translation model. For the model's training, we leverage resources available for related South Dravidian languages. We adopt a transfer learning approach that exploits similarities between high-resource and low-resource languages. This method enables the training of a machine translation system even in the absence of parallel data between the source and target language, thereby overcoming a significant obstacle in machine translation development for low-resource languages. Our English-Tulu system, trained without using parallel English-Tulu data, outperforms Google Translate by 19 BLEU points (in September 2023). The dataset and code are available here: this https URL

Comments:	Accepted at LREC-COLING 2024
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2403.19142 [cs.CL]
	(or arXiv:2403.19142v1 [cs.CL] for this version)

Submission history

From: Noëmi Aepli [view email]
[v1] Thu, 28 Mar 2024 04:30:07 GMT (4149kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2403.19142

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: A Tulu Resource for Machine Translation

Submission history