We gratefully acknowledge support from
the Simons Foundation and member institutions.

Databases

New submissions

[ total of 5 entries: 1-5 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 10 May 24

[1]  arXiv:2405.05413 [pdf, ps, other]
Title: Digital Evolution: Novo Nordisk's Shift to Ontology-Based Data Management
Comments: 14 pages, 2 figures
Subjects: Databases (cs.DB)

Biomedical data is growing exponentially, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transformation in Novo Nordisk Research & Early Development. Here, we include both our technical blueprint and our approach for organizational change management. We further discuss how such an OBDM ecosystem plays a pivotal role in the organizations digital aspirations for data federation and discovery fuelled by artificial intelligence. Our aim for this paper is to share the lessons learned in order to foster dialogue with parties navigating similar waters while collectively advancing the efforts in the fields of data management, semantics and data driven drug discovery.

[2]  arXiv:2405.05536 [pdf, other]
Title: How Good Are Multi-dimensional Learned Indices? An Experimental Survey
Subjects: Databases (cs.DB)

Efficient indexing is fundamental for multi-dimensional data management and analytics. An emerging tendency is to directly learn the storage layout of multi-dimensional data by simple machine learning models, yielding the concept of Learned Index. Compared with the conventional indices used for decades (e.g., kd-tree and R-tree variants), learned indices are empirically shown to be both space- and time-efficient on modern architectures. However, there lacks a comprehensive evaluation of existing multi-dimensional learned indices under a unified benchmark, which makes it difficult to decide the suitable index for specific data and queries and further prevents the deployment of learned indices in real application scenarios. In this paper, we present the first in-depth empirical study to answer the question of how good multi-dimensional learned indices are. Six recently published indices are evaluated under a unified experimental configuration including index implementation, datasets, query workloads, and evaluation metrics. We thoroughly investigate the evaluation results and discuss the findings that may provide insights for future learned index design.

[3]  arXiv:2405.05601 [pdf, other]
Title: Efficient Algorithms for Top-k Stabbing Queries on Weighted Interval Data (Full Version)
Comments: Full version of our DEXA2024 paper
Subjects: Databases (cs.DB)

Intervals have been generated in many applications (e.g., temporal databases), and they are often associated with weights, such as prices. This paper addresses the problem of processing top-k weighted stabbing queries on interval data. Given a set of weighted intervals, a query value, and a result size k, this problem finds the k intervals that are stabbed by the query value and have the largest weights. Although this problem finds practical applications (e.g., purchase, vehicle, and cryptocurrency analysis), it has not been well studied. A state-of-the-art algorithm for this problem incurs O(nlogk) time, where n is the number of intervals, so it is not scalable to large n. We solve this inefficiency issue and propose an algorithm that runs in O(sqrt(n)logn + k) time. Furthermore, we propose an O(logn + k) algorithm to further accelerate the search efficiency. Experiments on two real large datasets demonstrate that our algorithms are faster than existing algorithms.

Replacements for Fri, 10 May 24

[4]  arXiv:2405.02506 (replaced) [pdf, ps, other]
Title: Big Data, Big Decisions Choosing the Right Database
Authors: Mohamed Hassan
Subjects: Databases (cs.DB)
[5]  arXiv:2405.03708 (replaced) [pdf, ps, other]
Title: Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Machine Learning (cs.LG)
[ total of 5 entries: 1-5 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help  (Access key information)