We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DB

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Databases

Title: The 4+1 Model of Data Science

Abstract: Data Science is a complex and evolving field, but most agree that it can be defined as a combination of expertise drawn from three broad areascomputer science and technology, math and statistics, and domain knowledge -- with the purpose of extracting knowledge and value from data. Beyond this, the field is often defined as a series of practical activities ranging from the cleaning and wrangling of data, to its analysis and use to infer models, to the visual and rhetorical representation of results to stakeholders and decision-makers. This essay proposes a model of data science that goes beyond laundry-list definitions to get at the specific nature of data science and help distinguish it from adjacent fields such as computer science and statistics. We define data science as an interdisciplinary field comprising four broad areas of expertise: value, design, systems, and analytics. A fifth area, practice, integrates the other four in specific contexts of domain knowledge. We call this the 4+1 model of data science. Together, these areas belong to every data science project, even if they are often unconnected and siloed in the academy.
Comments: 28 pages
Subjects: Databases (cs.DB); General Literature (cs.GL)
ACM classes: E.m; K.2
Cite as: arXiv:2311.07631 [cs.DB]
  (or arXiv:2311.07631v1 [cs.DB] for this version)

Submission history

From: ontoligent [view email] [via RAFAEL proxy]
[v1] Mon, 13 Nov 2023 12:12:32 GMT (413kb)

Link back to: arXiv, form interface, contact.