References & Citations
Computer Science > Computation and Language
Title: Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers
(Submitted on 13 May 2024)
Abstract: The paper discusses the creation of a multimodal dataset of Russian-language scientific papers and testing of existing language models for the task of automatic text summarization. A feature of the dataset is its multimodal data, which includes texts, tables and figures. The paper presents the results of experiments with two language models: Gigachat from SBER and YandexGPT from Yandex. The dataset consists of 420 papers and is publicly available on this https URL
Link back to: arXiv, form interface, contact.