We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: SPACE-IDEAS: A Dataset for Salient Information Detection in Space Innovation

Abstract: Detecting salient parts in text using natural language processing has been widely used to mitigate the effects of information overflow. Nevertheless, most of the datasets available for this task are derived mainly from academic publications. We introduce SPACE-IDEAS, a dataset for salient information detection from innovation ideas related to the Space domain. The text in SPACE-IDEAS varies greatly and includes informal, technical, academic and business-oriented writing styles. In addition to a manually annotated dataset we release an extended version that is annotated using a large generative language model. We train different sentence and sequential sentence classifiers, and show that the automatically annotated dataset can be leveraged using multitask learning to train better classifiers.
Comments: Accepted in LREC-COLING 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Digital Libraries (cs.DL)
Cite as: arXiv:2403.16941 [cs.CL]
  (or arXiv:2403.16941v1 [cs.CL] for this version)

Submission history

From: Jose Manuel Gomez-Perez Dr. [view email]
[v1] Mon, 25 Mar 2024 17:04:02 GMT (2439kb,D)

Link back to: arXiv, form interface, contact.