We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Efficient Multi-Processor Scheduling in Increasingly Realistic Models

Abstract: We study the problem of efficiently scheduling a computational DAG on multiple processors. The majority of previous works have developed and compared algorithms for this problem in relatively simple models; in contrast to this, we analyze this problem in a more realistic model that captures many real-world aspects, such as communication costs, synchronization costs, and the hierarchical structure of modern processing architectures. For this we extend the well-established BSP model of parallel computing with non-uniform memory access (NUMA) effects. We then develop a range of new scheduling algorithms to minimize the scheduling cost in this more complex setting: several initialization heuristics, a hill-climbing local search method, and several approaches that formulate (and solve) the scheduling problem as an Integer Linear Program (ILP). We combine these algorithms into a single framework, and conduct experiments on a diverse set of real-world computational DAGs to show that the resulting scheduler significantly outperforms both academic and practical baselines. In particular, even without NUMA effects, our scheduler finds solutions of 24%-44% smaller cost on average than the baselines, and in case of NUMA effects, it achieves up to a factor $2.5\times$ improvement compared to the baselines. Finally, we also develop a multilevel scheduling algorithm, which provides up to almost a factor $5\times$ improvement in the special case when the problem is dominated by very high communication costs.
Comments: Published in the 36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2024)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
MSC classes: 68W10, 68M20, 90C10
ACM classes: C.1.4
DOI: 10.1145/3626183.3659972
Cite as: arXiv:2404.15246 [cs.DC]
  (or arXiv:2404.15246v1 [cs.DC] for this version)

Submission history

From: Pál András Papp [view email]
[v1] Tue, 23 Apr 2024 17:28:52 GMT (52kb,D)

Link back to: arXiv, form interface, contact.