We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Towards a Flexible and High-Fidelity Approach to Distributed DNN Training Emulation

Abstract: We propose NeuronaBox, a flexible, user-friendly, and high-fidelity approach to emulate DNN training workloads. We argue that to accurately observe performance, it is possible to execute the training workload on a subset of real nodes and emulate the networked execution environment along with the collective communication operations. Initial results from a proof-of-concept implementation show that NeuronaBox replicates the behavior of actual systems with high accuracy, with an error margin of less than 1% between the emulated measurements and the real system.
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as: arXiv:2405.02969 [cs.LG]
  (or arXiv:2405.02969v1 [cs.LG] for this version)

Submission history

From: Banruo Liu [view email]
[v1] Sun, 5 May 2024 15:27:56 GMT (238kb,D)

Link back to: arXiv, form interface, contact.