\`{I}r\`{o}y\`{i}nSpeech: A multi-purpose Yor\`{u}b\'{a} Speech Corpus

Ogunremi, Tolulope; Tubosun, Kola; Aremu, Anuoluwapo; Orife, Iroro; Adelani, David Ifeoluwa

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2307

Computer Science > Computation and Language

Title: ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus

Authors: Tolulope Ogunremi, Kola Tubosun, Anuoluwapo Aremu, Iroro Orife, David Ifeoluwa Adelani

(Submitted on 29 Jul 2023 (this version), latest version 27 Mar 2024 (v2))

Abstract: We introduce the \`{I}r\`{o}y\`{i}nSpeech corpus -- a new dataset influenced by a desire to increase the amount of high quality, freely available, contemporary Yor\`{u}b\'{a} speech. We release a multi-purpose dataset that can be used for both TTS and ASR tasks. We curated text sentences from the news and creative writing domains under an open license i.e., CC-BY-4.0 and had multiple speakers record each sentence. We provide 5000 of our utterances to the Common Voice platform to crowdsource transcriptions online. The dataset has 38.5 hours of data in total, recorded by 80 volunteers.

Comments:	working paper
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2307.16071 [cs.CL]
	(or arXiv:2307.16071v1 [cs.CL] for this version)

Submission history

From: David Adelani [view email]
[v1] Sat, 29 Jul 2023 20:42:50 GMT (18kb)
[v2] Wed, 27 Mar 2024 08:56:01 GMT (822kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2307.16071v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus

Submission history