We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CE

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computational Engineering, Finance, and Science

Title: Code Generation and Performance Engineering for Matrix-Free Finite Element Methods on Hybrid Tetrahedral Grids

Abstract: This paper introduces a code generator designed for node-level optimized, extreme-scalable, matrix-free finite element operators on hybrid tetrahedral grids. It optimizes the local evaluation of bilinear forms through various techniques including tabulation, relocation of loop invariants, and inter-element vectorization - implemented as transformations of an abstract syntax tree. A key contribution is the development, analysis, and generation of efficient loop patterns that leverage the local structure of the underlying tetrahedral grid. These significantly enhance cache locality and arithmetic intensity, mitigating bandwidth-pressure associated with compute-sparse, low-order operators. The paper demonstrates the generator's capabilities through a comprehensive educational cycle of performance analysis, bottleneck identification, and emission of dedicated optimizations. For three differential operators ($-\Delta$, $-\nabla \cdot (k(\mathbf{x})\, \nabla\,)$, $\alpha(\mathbf{x})\, \mathbf{curl}\ \mathbf{curl} + \beta(\mathbf{x}) $), we determine the set of most effective optimizations. Applied by the generator, they result in speed-ups of up to 58$\times$ compared to reference implementations. Detailed node-level performance analysis yields matrix-free operators with a throughput of 1.3 to 2.1 GDoF/s, achieving up to 62% peak performance on a 36-core Intel Ice Lake socket. Finally, the solution of the curl-curl problem with more than a trillion ($ 10^{12}$) degrees of freedom on 21504 processes in less than 50 seconds demonstrates the generated operators' performance and extreme-scalability as part of a full multigrid solver.
Comments: 22 pages
Subjects: Computational Engineering, Finance, and Science (cs.CE)
MSC classes: 65F50, 65N30, 65N55, 65Y20, 65F10
Cite as: arXiv:2404.08371 [cs.CE]
  (or arXiv:2404.08371v1 [cs.CE] for this version)

Submission history

From: Fabian Böhm [view email]
[v1] Fri, 12 Apr 2024 10:15:46 GMT (443kb,D)

Link back to: arXiv, form interface, contact.