We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake

Abstract: The exponential growth of artificial intelligence (AI) and machine learning (ML) applications has necessitated the development of efficient storage solutions for vector and tensor data. This paper presents a novel approach for tensor storage in a Lakehouse architecture using Delta Lake. By adopting the multidimensional array storage strategy from array databases and sparse encoding methods to Delta Lake tables, experiments show that this approach has demonstrated notable improvements in both space and time efficiencies when compared to traditional serialization of tensors. These results provide valuable insights for the development and implementation of optimized vector and tensor storage solutions in data-intensive applications, contributing to the evolution of efficient data management practices in AI and ML domains in cloud-native environments
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Machine Learning (cs.LG)
Cite as: arXiv:2405.03708 [cs.DC]
  (or arXiv:2405.03708v3 [cs.DC] for this version)

Submission history

From: Liaoliao Liu [view email]
[v1] Fri, 3 May 2024 21:48:23 GMT (1314kb)
[v2] Wed, 8 May 2024 19:45:46 GMT (1315kb)
[v3] Mon, 13 May 2024 15:30:42 GMT (1315kb)

Link back to: arXiv, form interface, contact.