Join us at the annual Chapel Programming Language
event to talk about the language, libraries, and applications! ChapelCon is
free to attend and will be held virtually.
ChapelCon ‘25 welcomes anyone with computing challenges that demand performance, particularly through parallelism and scalability. ChapelCon ‘25 brings together Chapel users, enthusiasts, researchers, and developers to exchange ideas, present their work, and forge new collaborations. Anyone interested in parallel programming, programming languages, or high performance computing is encouraged to attend. A wide range of sessions support all levels of experience, with Tutorials and Free Coding sessions for those looking to hone their skills, Office Hour sessions for those looking for help from Chapel developers, and Conference sessions for those looking to share and discuss their work. ChapelCon ‘25 is free to attend and will be held virtually.
Keynote: Christopher Rackauckas, MIT
The Software Engineering of Julia’s Scientific Machine Learning (SciML)
Scientific machine learning, denoted SciML, is the integration of machine learning into scientific computing. While it has become an academic discipline in its own right, one of the key drivers to the adoption of SciML has been the ongoing creation of readily-available software. In this talk I will introduce the key tenannts of SciML with a focus towards the implications for HPC software development. Showcases of methods such as universal differential equations and their successes for generating more accurate physical models from data will be intertwined with stories about the software architecture which has enabled the sustainable development of the Open Source SciML software ecosystem. The audience should leave with a deep understanding of how the trade-off between research software and reusable open source development can be managed in a way that is beneficial to both the research community and the broader scientific computing community.
LUMI: The Supercomputer of the North
Emanuele Vitali, CSC; Jorik van Kemenade, SURF
In this talk we will introduce LUMI, starting with a view on the consortium and how you can request resources, then focusing on its hardware (in detail, the LUMI-C and LUMI-G node architectures, and the network architecture). Then we will introduce the LUMI user support team, its way of working and its duties.
Finally, we will provide a short demo on how to install and run a simple Chapel program
Spack: The Community’s Road to the HPSF and Version 1.0
Todd Gamblin, LLNL & HPSF, Invited Talk
The past year has been transformative for the twelve-year-old Spack project, starting with its inclusion in the High Performance Software Foundation (HPSF) and culminating in its 1.0 release. Spack v1.0, released in July, is the first version to offer a stable package API and to integrate true compiler dependencies into its core model—features developed over many years. This talk will cover how the Spack community evolved to this point and detail the decision-making process behind joining the HPSF and finally taking the plunge and going 1.0.
Description
This session introduces the Chapel programming language, its key features, and its place within the HPC ecosystem. The focus is on how Chapel's design philosophy makes it unique amongst programming languages. No prior experience necessary.
Description
This session demos key I/O features in Chapel and offers a short coding exercise for participants to do that uses the demonstrated features.
Description
This session demos Chapel's key parallel looping constructs and offers a short coding exercise for participants to do that uses the demonstrated features.
Description
This session demos distributions, a key abstraction in Chapel for reasoning about data locality and movement. It also offers a short coding exercise that uses these features.
Description
This session demos Chapel's aggregate data structures (classes and records) and Chapel's approach to memory management. It also offers a short coding exercise that uses these features.
Description
The free-code session is unstructured time for participants to work on their own Chapel projects in the company of other Chapel users and developers. Participants are encouraged to apply any lessons from the day's demo sessions and discuss / ask questions about their projects.
Description
This session introduces Chapel language features that play a role in the rest of the day's tutorial topics. Key topics include generic types and compile- and run-time expressions.
Description
This session covers common causes of performance issues in Chapel applications and how to diagnose and resolve them. It includes a short coding exercise for participants.
Description
This session demos Chapel's serializer and deserializer support for reading and writing custom data structures. It also includes a short coding exercise that uses these features.
Description
The free-code session is unstructured time for participants to work on their own Chapel projects in the company of other Chapel users and developers. Participants are encouraged to apply any lessons from the day's demo sessions and discuss / ask questions about their projects.
Description
This session demos Chapel's parallel iterators, which can be used to define custom iteration schemes for user-defined data structures. It includes a short exercise for participants to practice using these features.
Description
Modern hardware is parallel in a multitude of ways, including
multi-core, multi-GPU, and multi-node parallelism. The
Chapel language provides a unified toolbox to make use
of these varying kinds of parallelism, and thus effectively
leverage computing hardware. One area that can benefit
from parallelization is machine learning (ML), which has
grown in popularity in recent years. To explore Chapel’s
suitability for ML, over the course of a summer internship
project, the Chapel team has developed the first machine
learning framework in pure Chapel called ChAI. ChAI is
capable of training and inference, and is able to load pretrained models from PyTorch and apply them to workloads
distributed over any number of nodes, CPU cores, and GPUs.
The framework has shown promising scaling results; using
ChAI to load the MNIST model and classify 10,000 images, we
measured a 105x speedup when scaling from a single node to
128 nodes on a Cray XC machine. We plan to integrate ChAI
into the Chapel-powered Arkouda data science framework,
enabling interactive, ML-enabled data science over massive
datasets.
Description
Transformer models drive the current AI but require substantial computational resources to train. Chapel, a programming language designed for high-performance computing, offers an opportunity to explore efficient implementations of such models. In this talk, I present an implementation of a transformer model from scratch in Chapel and compare its performance with an equivalent from-scratch C++ implementation. PyTorch, a widely used framework for deep learning, is included as a reference. This talk highlights the strengths and limitations of Chapel for implementing modern AI models and its potential as a programming language for high-performance research.
Mohammad Dindoost, Bartosz Bryg, Ioannis Koutis, David Bader and Oliver Alvarado Rodriguez
Description
We present HiPerMotif, a hybrid parallel algorithm for subgraph isomorphism addressing scalability limitations in large-scale property graphs. Traditional vertex-by-vertex algorithms struggle with extensive early-stage exploration and limited
parallelization. HiPerMotif shifts search initialization through: (1) structural reordering prioritizing high-degree vertices, (2)
systematic first-edge mapping identification, (3) efficient validation, and (4) state injection at depth 2. Implemented in Chapel
within the Arachne framework, HiPerMotif achieves up to 66× speedup over state-of-the-art baselines and processes massive
datasets like the H01 connectome (150M edges) that existing methods cannot handle.
Description
This presentation begins with a brief introduction on probabilistic inference for solving
hypergraph algorithms alongside its apparent strengths and challenges in implementation.
PGAS as natively provided by Chapel is then presented as a way to accelerate probabilistic
inference.
Description
Scientific machine learning, denoted SciML, is the integration of machine learning into scientific computing. While it has become an academic discipline in its own right, one of the key drivers to the adoption of SciML has been the ongoing creation of readily-available software. In this talk I will introduce the key tenannts of SciML with a focus towards the implications for HPC software development. Showcases of methods such as universal differential equations and their successes for generating more accurate physical models from data will be intertwined with stories about the software architecture which has enabled the sustainable development of the Open Source SciML software ecosystem. The audience should leave with a deep understanding of how the trade-off between research software and reusable open source development can be managed in a way that is beneficial to both the research community and the broader scientific computing community.
Description
Arkouda (https://arkouda-www.github.io/) is an open-source, NumPy-like framework for
distributed exploratory data analysis (EDA), built on a Chapel backend, with growing
support for pandas-like data structures and operations. Over the past year, the project has
matured substantially through major architectural improvements, expanded data type
support, and closer alignment with the evolving NumPy 2.0 ecosystem. These updates
enhance expressiveness, performance, and reliability for large-scale, interactive data
science workflows.
We will begin with a general introduction to Arkouda and highlight recent use cases and
success stories. Given the number of Arkouda-related talks at past ChapelCon events, the
remainder of the session will focus on key improvements made since the last ChapelCon.
Anthony Chrun, Baptiste Arnould, Karim Zayni, Guillaume Auger, Maxime Blanchet, Eric Laurendeau and Justin Rigal
Description
CHAMPS (CHapel MultiPhysics Software) is a multiphysics computational framework built
around an aerodynamic flow solver based on the Euler and Reynolds-Averaged Navier–Stokes
(RANS) equations, currently under development at Polytechnique Montreal. Since its early
development, multiple research efforts have contributed to enhancing its capabilities by incorporating a range of turbulence and transition models. Additional physics modules include
solvers for droplet trajectory prediction, ice accretion, condensation trail formation, structural
deformation, and fluid–structure interaction. This paper presents some of the most recent and
impactful advancements achieved within CHAMPS, in order to share them with the Chapel
community.
Description
Automatic differentiation is the secret sauce allowing neural networks, and machine learning
applications in general, to be more than just maths, but actually work!
This talk will start with a brief general overview of automatic differentiation, then dive into
two concrete workstreams.
ForwardModeAD is a Chapel library for forward-mode automatic differentiation, built using
operator overloading. In the first half of the talk, I’ll share the story behind its development
— the design choices, the Chapel language features that made it possible, and the trade-offs
along the way. I will also cover performance bottlenecks, current limitations, and where the
library is headed next.
Enzyme is a library for automatic differentiation at the LLVM level. It has already been
integrated into languages like Julia and Rust, often achieving higher performance than nativelanguage frameworks. In the second half of the talk, I’ll present my work on integrating
Enzyme with Chapel, showing the current status, limitations, challenges encountered, and
next steps.
Ivan Tagliaferro de Oliveira Tezoto, Guillaume Helbecque, Ezhilmathi Krishnasamy, Nouredine Melab and Gregoire Danoy
Description
Modern high-performance computing systems increasingly rely on heterogeneous architectures combining CPUs and GPUs from multiple vendors, such as Nvidia and AMD. Ensuring both performance and portability in this context remains a key challenge. This work investigates two distinct programming approaches for parallel tree-based exact optimization, focusing on the Branch-and-Bound algorithm. The first is a low-level, performance-oriented implementation in C, combining OpenMP with CUDA and HIP for multi-GPU acceleration within a single compute node. The second leverages the PGAS-based Chapel language, which offers a unified and portable high-level framework for threaded and GPU programming. We revisit the design of a portable multi-GPU Chapel implementation and propose an optimized low-level counterpart featuring a collegial multi-pool data structure, dynamic load balancing through Work Stealing, and GPU thread-indexing optimizations. Both implementations are evaluated on the Permutation Flowshop Scheduling Problem using up to eight GPUs on Nvidia A100 and AMD MI250x architectures. Experimental results demonstrate that while CUDA and HIP versions consistently outperform Chapel in terms of raw performance, Chapel achieves comparable or superior scalability when considering absolute speedups. These findings suggest that Chapel represents a promising option for prototyping GPU-accelerated parallel applications, allowing developers to evaluate feasibility and design choices before transitioning to performance-tuned, low-level implementations.
Description
In this talk we will introduce LUMI, starting with a view on the consortium and how you can request resources, then focusing on its hardware (in detail, the LUMI-C and LUMI-G node architectures, and the network architecture). Then we will introduce the LUMI user support team, its way of working and its duties.
Finally, we will provide a short demo on how to install and run a simple Chapel program
Description
A simple (serial) recursive summation over a 1-dimensional array (a la quicksort) was implemented in 3 languages (Chapel, C, Fortran) and 4 compiler variants (Chapel 2.5 with LLVM, gcc 13.3.0, clang 18.1.3, gfortran 13.3.0), and compared with a standard non-recursive summation. Two alternatives for the recursion were tested: (i) by passing array indices explicitly in the recursion (possible in all three languages) and (ii) by using array slicing (only possible in Chapel and Fortran). Performance varied widely. Clang and Chapel were faster for the standard non-recursive summation; C and Fortran were faster for recursion using indices; and Fortran was much faster than Chapel for recursion using array slices. The performance of Chapel slices is a known issue (https://chapel.discourse.group/t/ new-issue-improve-the-performance-of-slices-and-rank-change-operations/30503). It appears that for the standard summation and recursive summation using indices the performance is related to the backend, i.e. GCC (gfortran and gcc) versus LLVM (Chapel and clang).
Description
When writing thread-parallel applications, users of Chapel can use high-level productivity
features like ‘forall’ and promotion to succinctly express their algorithms or use lower-level
features like ‘begin‘ to more directly control task creation. Chapel’s GPU support is similar,
where high-level promoted statements can become kernels, but explicit ‘foreach’ loops
can be used for greater control over the generated kernel. A missing piece to this is with
instruction level parallelism and vectorization. The Chapel compiler usually does a great
job at automatically vectorizing code, but when it fails there is no recourse. The next best
option is to interoperate with C or Fortran code to write the low-level operations.
In order to solve this problem, I have created CVL: chpl Vector Library. This library exposes
a vector type as a first class object which provides a unified set of operations across x86
and ARM. This provides Chapel developers direct control over the vectorization in their
applications. In this demo, I will showcase the design and implementation of the library,
including the tools used to maintain it. I will then demonstrate several benchmarks where
usage of CVL beats the Chapel compiler’s auto-vectorization. Lastly, I will discuss potential
improvements for the library going forward.
Description
Chapel’s type system can be surprisingly powerful. In addition to “usual” features such as
generics and polymorphism, Chapel provides the ability to manipulate types using functions;
this involves both taking types as arguments to functions and returning them from these
functions. This can enable powerful programming techniques that are typically confined to the
domain of metaprogramming.
For example, although Chapel’s notion of compile-time values — ‘param’s — is limited to
primitive types such as integers, booleans, and strings, one can encode compile-time lists of
these values as types. Such encodings can be used to create compile-time specializations of
functions that would be otherwise tedious to write by hand. One use case for such
specializations is the implementation of a family of functions for approximating differential
equations, the Adams-Bashforth methods. Some variants of these methods can be encoded
as lists of coefficients. Thus, it becomes possible to define a single function that accepts a
type-level list of coefficients and produces a “stamped out” implementation of the
corresponding method. This reduces the need to implement each method explicitly by hand.
Another use case of function specialization is a type-safe ‘printf’ function that validates that
users’ format specifiers match the type of arguments to the function.
More generally, Chapel’s types can be used to encode algebraic sums (disjoint unions) and
products (Cartesian) of types. This, in turn, makes it possible to build arbitrary data structures
at the type level. The lists-of-values case above is an instance of this general principle.
Functions can be defined on type-level data structures by relying on overloading and type
arguments. Programming in this manner starts to resemble programming in a purely functional
language such as Haskell.
Though this style of programming has not seen much use thus far, it can be a powerful
technique for controlling types of arguments or constructing highly customized functions with
no runtime overhead.
Description
Interoperability is a key tool for new languages to drive adoption and grow. Chapel has a
rich set of interoperability features that allow users to write flexible applications. For
example, the ability to write C code inline with Chapel code reaches the peak of
interoperability - writing code from two languages side by side in the same file. When it
comes to interoperability with Python, Chapel has taken a similar approach to other
languages. Python is a slow language by its very nature and to achieve good performance
modules are written in another language that is then called from Python code. Chapel has
been able to fill this role for some time. This takes a Python first approach, working well for
those who want to primarily write Python and a little bit of Chapel.
The ability to call Python code from Chapel allows a Chapel programmer to write the
majority of their application in Chapel and use a little bit of Python. Recent work has
resulted in a Python module for Chapel that allows developers to reach the gold standard
of interoperability, Python and Chapel code side-by-side in the same file. In this demo, we
will showcase how this module is put together and some of the key features that enable
Python interoperability in multiple ways. We will also show some applications areas where
this can be useful with libraries like numpy, scipy, and pandas.
Description
Formal methods are a set of techniques that are used to validate the correctness of software. A
particular category of these methods, model checking, uses the mathematical language of
temporal logic to construct specifications of software’s behavior. A solver can then validate the
constraints described in the formal language and ensure that undesirable states do not occur.
This talk will be an experience report of using formal methods, specifically the Alloy analyzer, to
detect a bug in Chapel’s ‘Dyno’ compiler front-end library. The area in which the bug was
discovered is currently used in production, as well as a part of editor tools such as chplcheck
and chpl-language-server.
Specifically, Alloy was used to construct a formal specification of a part of Chapel’s use/import
lookup algorithm. Chapel has a number of complicated scoping rules and possible edge cases
in this area. By running this specification against a solver, a sequence of steps was discovered
that could cause the algorithm to malfunction and produce incorrect results. A program that
causes these steps to occur was constructed and served as a concrete reproducer for the bug.
This reproducer was used to adjust the logic and fix the bug.
This talk will cover the fundamentals of temporal logic required for formal specifications, the
necessary parts of Chapel’s use/import lookup algorithm, and the steps taken to encode and
validate the compiler’s behavior.
Description
Chapel 2.5 includes a new distributed-memory sort implementation. This talk will describe the interface for radix sorting, the new distributed sort algorithm, and discuss the performance of the new implementation.
Description
Distributed-memory parallel processing addresses computational problems requiring
significantly more memory or computational resources than can be found on one node.
Software written for distributed-memory parallel processing typically uses a distributed memory parallel programming framework to enhance productivity, scalability, and
portability across supercomputers and cluster systems.
These frameworks vary in their capabilities and support for managing communication and
synchronization overhead to achieve scalability. We implemented a communication-intensive distributed radix sort algorithm to examine and compare the performance,
scalability, usability, and productivity differences between five distributed-memory parallel
programming frameworks: Chapel, MPI, OpenSHMEM, Conveyors, and Lamellar.
Description
The past year has been transformative for the twelve-year-old Spack project, starting with its inclusion in the High Performance Software Foundation (HPSF) and culminating in its 1.0 release. Spack v1.0, released in July, is the first version to offer a stable package API and to integrate true compiler dependencies into its core model—features developed over many years. This talk will cover how the Spack community evolved to this point and detail the decision-making process behind joining the HPSF and finally taking the plunge and going 1.0.
Description
Many parallel algorithms depend on reshaping how data is distributed across locales to
achieve efficient computation. In this talk, I’ll introduce the Repartition module, a custom
module implemented in Chapel designed to simplify and generalize all-to-all
communication patterns. It enables each locale to specify a destination locale for each list
element, then automatically redistributes the data accordingly.
I’ll show how this module enables sharding patterns useful across a range of distributed
algorithms. We’ll look at how Repartition integrates with Chapel’s parallelism features,
explore implementation tradeoffs, and share benchmark results across various workloads.
Whether you're writing high-performance algorithms or building reusable distributed
libraries, Repartition offers a flexible and powerful tool for managing data layout.
Description
While Chapel’s GPU programming model simplifies multi-GPU
programming, it does not fully exploit the advanced GPU-to-GPU
communication capabilities provided by modern GPUs. We present
an integration of NVSHMEM into Chapel to enable efficient GPUto-GPU communication from within CUDA kernels. Our implementation modifies Chapel’s GPU build pipeline and runtime system to
support NVSHMEM’s symmetric memory model. Performance evaluation on the Miyabi supercomputer shows up to 100x speedup for
small transfers and effective utilization of interconnect bandwidth
for larger transfers compared to Chapel’s native copy operations.
Oliver Alvarado Rodriguez, Engin Kayraklioglu, Bartosz Bryg, Mohammad Dindoost, David Bader and Brad Chamberlain
Description
Distributed applications with fine-grained communication often suffer from performance bottlenecks. Chapel's CopyAggregation module addresses this for distributed array operations but doesn't support arbitrary remote operations. Building on Arkouda's pioneering aggregation concepts, this 20-minute talk presents a prototype for generalized destination aggregation, particularly addressing sparse matrix construction challenges.
The first part analyzes Chapel's CopyAggregation module's capabilities and limitations, then introduces the generalized framework prototype. Implementation examples demonstrate aggregated array assignments and sparse matrix creation using CompressedSparseRow layouts with parallel safety mechanisms.
The second part presents experimental results on both random recursive matrix (RMAT) and uniformly-created sparse matrices. Comparative analysis across HPE Cray EX and Infiniband systems reveals performance impacts of fine-grained versus aggregated communication.
The third part outlines a roadmap for developing a generalized aggregation framework for Chapel, discussing ecosystem integration and applications beyond sparse matrices. This presentation targets Chapel users, HPC researchers, and practitioners working with distributed sparse data structures, providing practical insights for improving performance in irregular, communication-intensive applications.
The first two days of ChapelCon ‘25 (October 7 and October 8) will focus on action. Each day will begin with a guided tutorial, followed by hands-on exercises in the group, followed by free coding sessions, where participants can work on their own applications or on provided project prompts.
Tutorials
Tutorial days will begin with in-depth tutorials covering a range of topics: building/installing Chapel, traditional programming language features (basic usage, classes/records, IO, standard modules), and HPC-focused topics (locality, parallelism, distributed data, synchronization). No prior knowledge or preparation needed.
Free Coding Sessions
Work on projects with other Chapel enthusiasts in the Free Coding session. We’ll begin with guided exercises to warm up then shift to less structured work on personal projects or provided prompts. The Free Coding Sessions will be a relaxed working environment, with Chapel developers present to answer questions, and breakout rooms for short demo sessions focused on solving specific, common problems.
Office Hours
Book an Office Hour for an in-depth peer-programming session with a Chapel contributor The team is here to help with just about anything–understanding features, resolving bugs, or diagnosing/resolving performance issues. To sign up for a session, fill out a short survey to help us understand your problem and best match you with a Chapel developer.
About Conference Days
The two conference days will feature a mix of talks and demos from the community, a State of the Project update, a Keynote address, and Community Discussions.
Talks
If you have research or applications involving Chapel, we want to hear about it! This track is an opportunity to showcase any study ranging from preliminary to already published work and get feedback from the Chapel community. Talk slots can run from 5 to 30 minutes.
Demos
If you have code or visualization from Chapel-based work, this track is for you. You can demonstrate key parts of your implementation, show how it runs live, or advertise a new module or application you are working on. Demo slots can run from 5 to 30 minutes.
Posters and Extended Abstracts
ChapelCon ‘25 will accept submissions of posters and extended abstracts, with or without accompanying presentations. These contributions will be reviewed by the program committee and accepted work will be shared with attendees as part of the conference. These tracks are ideal for folks who are interested in sharing their work with the Chapel community but are unable to present on the day.
Community Discussions
As in previous years, conference days will include informal discussion periods to draw connections between different work presented each day.
Organization
General Chair: Brandon Neth, Hewlett Packard Enterprise
Program Committee Chair: Luca Ferranti, Aalto University
Tutorial Days Chair: Daniel Fedorin, Hewlett Packard Enterprise