The Chapel Parallel Programming Language

 

CHIUW 2018

The 5th Annual Chapel Implementers and Users Workshop

Friday May 25, 2018 (mini-conference day)
Saturday May 26, 2018 (ad hoc code camp)
 

 
32nd IEEE International Parallel & Distributed Processing Symposium
JW Marriott Parq Vancouver, Vancouver, British Columbia, Canada

Introduction: CHIUW 2018—the fifth annual Chapel Implementers and Users Workshop, to be held in conjunction with IEEE IPDPS 2018—will continue our annual series of workshops designed to bring developers and users of the Chapel language (chapel-lang.org) together to present and discuss work being done across the broad open-source community. Attendance is open to anyone interested in Chapel, from the most seasoned Chapel user or developer to someone simply curious to learn more.

Registration: Register for CHIUW 2018 via the IPDPS registration site. If you're only attending CHIUW, select a one-day registration. To attend other days at IPDPS, select from the other options, as appropriate.

 

Friday May 25, 2018 (Mini-Conference Day)

 
Pre-Workshop
 
8:30 - 9:00:  Chapel 101 (Optional) [slides]
Michael Ferguson (Cray Inc.)
This is a completely optional session held prior to the official start of the workshop for those who are new to Chapel and looking for a crash-course, or for those who would simply like a refresher.
 
 
Introduction (session 1)
9:00 - 9:30:  Welcome, State of the Project [slides]
Brad Chamberlain (Cray Inc.)
 
 
Morning Break
9:30 - 10:00:  Break (catered by IPDPS)
 
 
Applications of Chapel (session 2)
Session chair: Elliot Ronaghan
 
10:00 - 10:20 Parallel Sparse Tensor Decomposition in Chapel [paper | slides]
Thomas Rolinger (University of Maryland), Tyler Simon (Laboratory for Physical Sciences), Christopher Krieger (Laboratory for Physical Sciences)
Abstract: In big-data analytics, using tensor decomposition to extract patterns from large, sparse multivariate data is a popular technique. Many challenges exist for designing parallel, high performance tensor decomposition algorithms due to irregular data accesses and the growing size of tensors that are processed. There have been many efforts at implementing shared-memory algorithms for tensor decomposition, most of which have focused on the traditional C/C++ with OpenMP framework. However, Chapel is becoming an increasingly popular programing language due to its expressiveness and simplicity for writing scalable parallel programs. In this work, we port a state of the art C/OpenMP parallel sparse tensor decomposition tool, SPLATT, to Chapel. We present a performance study that investigates bottlenecks in our Chapel code and discuss approaches for improving its performance. Also, we discuss features in Chapel that would have been beneficial to our porting effort. We demonstrate that our Chapel code is competitive with the C/OpenMP code for both runtime and scalability, achieving 83%-96% performance of the original code and near linear scalability up to 32 cores.
 
10:20 - 10:40: Iterator-Based Optimization of Imperfectly-Nested Loops [paper | slides]
Daniel Feshbach (Haverford College), Mary Glaser (Haverford College), Michelle Strout (University of Arizona), and David Wonnacott (Haverford College)
Abstract: Effective optimization of dense array codes often depends upon the selection of the appropriate execution order for the iterations of nested loops. Tools based on the Polyhedral Model have demonstrated dramatic success in performing such optimizations on many such codes, but others remain an area of active research, leaving programmers to optimize code in other ways.

Bertolacci et. al demonstrated that programmer-defined iterators can be used to explore iteration-space reorderings, and that Cray’s compiler for the Chapel language can optimize such codes to be competitive with polyhedral tools. This “iterator-based” approach allows programmers to explore iteration orderings not identified by automatic optimizers, but was only demonstrated for perfectly-nested loops, and lacked any system for warning about an iterator that would produce an incorrect result.

We have now addressed these shortcomings of iterator-based loop optimization, and explored the use of our improved techniques to optimize the imperfectly-nested loops that form the core of Nussinov’s algorithm for RNA secondary-structure prediction. Our C++ iterator provides performance that equals the fastest C code, several times faster than was achieved by using the same C compiler on the code with the original iteration ordering, or the code produced by the Pluto loop optimizer. Our Chapel iterators produce run-time that is competitive with the equivalent iterator-free Chapel code, though the Chapel performance still does not equal that of the C/C++ code.

We have also implemented an iterator that produces an incorrect-but-fast version of Nussinov’s algorithm, and used this iterator to illustrate our approaches to error-detection. Manual application of our compile-time error-detection algorithm (which has yet to be integrated into a compiler) identifies this error, as does the run-time approach that we use for codes on which the static test proves inconclusive.

 
10:40 - 11:00:  Investigating Data Layout Transformations in Chapel [slides]
Apan Qasem (Texas State University), Ashwin AJi, and Mike Chu (AMD)
Abstract: Heterogeneous node architectures are quickly becoming the de facto choice in scalable supercomputers. Efficient layout and placement of shared data structures is critical in attaining desired performance on such systems. However, with most high-level programming languages, the programmer has to manually explore the optimal data organization strategy for their workloads. This paper explores automatic and semi- automatic data layout transformations for heterogeneous memory architectures using Chapel as a reference high-level language. We first identify computation and data access patterns that are problematic for hybrid nodes, then propose solutions to rectify these situations by converting inferior data layouts to efficient ones, and finally outline implementation strategies in Chapel. We demonstrate that the domain map feature in Chapel can be leveraged to implement sophisticated layout transforms for heterogeneous memory systems. Preliminary evaluation shows that the proposed transformations can make up to an order-of-magnitude difference in performance for GPU kernels with certain characteristics.
 
 
Quick Break
11:00 - 11:10:  Quick Break
 
 
Chapel Design and Evolution (session 3)
Session chair: Benjamin Robbins
 
11:10 - 11:30:  Transitioning from Constructors to Initializers in Chapel [extended abstract | slides]
Lydia Duncan and Michael Noakes (Cray Inc.)
 
11:30 - 11:50:  RCUArray: An RCU-like Parallel-Safe Distributed Resizable Array [paper | slides]
Louis Jenkins (Bloomsburg University)
Abstract: I present RCUArray, a parallel-safe distributed array that allows for read and update operations to occur concurrently with a resize. As Chapel lacks thread-local and task-local storage, I also present a novel extension to the Read-Copy-Update synchronization strategy that functions without the need for either. At 32-nodes with 44-cores per node the RCUArray’s relative performance to an unsynchronized Chapel block distributed array is as little as 20% for read and update operations, but with runtime support for zero-overhead RCU and thread-local or task-local storage it has the potential to be near-equivalent; relative performance for resize operations is as much as 3600% due to the novel design.
 
11:50 - 12:10:  Adding Lifetime Checking to Chapel [extended abstract | slides]
Michael Ferguson (Cray Inc.)
 
 
Lunch
12:10 - 1:40:  Lunch (in ad hoc groups or on your own)
 
 
Keynote Talk
Session chair: Brad Chamberlain
1:40 - 2:40:  Why Languages Matter More Than Ever [slides]
Katherine Yelick (UC Berkeley / Lawrence Berkeley National Laboratory)

Abstract: In the next few years, exascale computing systems will become available to the scientific community. These systems will require new levels of parallelization, new models of memory and storage, and a variety of node architectures for processors and accelerators. In the decade that follows, we can expect more of these changes, as well as increasing levels of hardware specialization. These systems will provide simulation and analysis capabilities at unprecedented scales, and when combined with advanced physical models, mathematical and statistical methods, and computer science and abstractions, they will lead to scientific breakthroughs. Yet the full power of these systems will only be realized if there is sufficient high-level programming support that will abstract details of the machines and give programmers a natural interface for writing new science applications. In this talk I will discuss the importance of programming languages, and the need for high-level abstractions and more powerful compilers to reach future science goals. And how does one move an existing community with large code bases into new languages? I will use my own experience from both general-purpose and special-purpose languages, and extrapolate to some of the key opportunities and challenges facing the Chapel language.

Bio: Katherine Yelick is a Professor of Electrical Engineering and Computer Sciences at the University of California at Berkeley and the Associate Laboratory Director for Computing Sciences at Lawrence Berkeley National Laboratory. Her research is in parallel programming languages, compilers, algorithms, and automatic performance tuning, as well as scientific applications. Yelick was Director of the National Energy Research Scientific Computing Center (NERSC) from 2008 to 2012 and currently leads the Computing Sciences Area at Berkeley Lab, which includes NERSC, the Energy Sciences Network (ESnet) and the Computational Research Division (CRD). Yelick is a member of the National Academy of Engineering (NAE) and the American Associate of Arts and Sciences, and is an ACM Fellow and recipient of the ACM/IEEE Ken Kennedy and ACM-W Athena awards.

A longer bio and CV can be found at here.

 
 
Chapel Performance (session 4)
Session chair: Brad Chamberlain
 
2:40 - 3:00: Tales from the Trenches: Whipping Chapel Performance into Shape [extended abstract | slides]
Elliot Ronaghan, Ben Harshbarger, and Greg Titus (Cray Inc.)
 
 
Afternoon Break
 
3:00 - 3:30:  Break (catered by IPDPS)
 
 
Tools (session 5)
Session chair: Lydia Duncan
 
3:30 - 3:50: Purity: An Integrated, Fine-Grain, Data-Centric, Communication Profiler for the Chapel Language [paper | slides]
Richard Johnson and Jeffrey Hollingsworth (University of Maryland)
Abstract: We present Purity, a configurable, data-centric, communication profiler for the Chapel language that analyzes memory and communication access patterns in a multi-node PGAS environment. By integrating Purity into the compiler and runtime framework of Chapel we can instrument Chapel programs to capture memory and communication operations and produce both online and fine-grain post execution reporting. Our profiler is equipped with a sampling mechanism for reducing overhead, handles complex data structures, and generates detailed execution profiles that map data motion to the variable, field, loop, and node levels for both distributed and non-distributed instantiations. In a case study, Purity provided valuable insight into task and data locality which allowed us to develop a programmatic solution for reducing nearly 90% of remote operations in SSCA#2.
 
3:50 - 4:10:  ChplBlamer: A Data-centric and Code-centric Combined Profiler for Multi-locale Chapel Programs [extended abstract | slides]
Hui Zhang and Jeffrey Hollingsworth (University of Maryland)
 
4:10 - 4:30:  Mason, Chapel's Package Manager [extended abstract | slides]
Ben Albrecht (Cray Inc.), Sam Partee (Haverford College), Ben Harshbarger, and Preston Sahabu (Cray Inc.)
 
 
Lightning Talks and Flash Discussions (session 6)
Session chair: Michael Ferguson
4:30 - 5:30:  Lightning Talks and Flash Discussions
This final session will feature short (5–10 minute, depending on number of participants) time slots in which community members can give short talks, lead discussions on current hot topics of interest, do demos, etc. Sign up on-site or let us know of your interest beforehand.
 
5:30 -       :  Adjourn for Dinner (in ad hoc groups or on your own)
   

 

Saturday May 26, 2018 (Code Camp Day)

 
ad hoc Chapel Code Camp
 
8:30 - ?:??: ad hoc Chapel Code Camp
The Chapel code camp is an annual chance to work cooperatively on coding problems or discussion topics while we're in one place. Members of the core Chapel team will be available to partner with members of the community on topics of interest. This year's code camp will consist of small ad hoc groups that find a space and time that works for them. If you would like to participate in a pair-programming or collaborative activity on the code-camp day, let whomever you hope to work with know. Alternatively, if you are not sure who to work with, send a short paragraph describing the activity to chapel_submissions@cray.com or bring it up with the CHIUW chairs at the workshop on Friday.
 

 

Committee

General Chairs:

  • Michael Ferguson, Cray Inc.
  • Nikhil Padmanabhan (co-chair), Yale University
Program Committee:
  • Brad Chamberlain (chair), Cray Inc.
  • Aparna Chandramowlishwaran (co-chair), University of California, Irvine
  • Mike Chu, AMD
  • Anshu Dubey, Argonne National Laboratory
  • Jonathan Dursi, The Hospital for Sick Children, Toronto
  • Hal Finkel, Argonne National Laboratory
  • Marta Garcia Gasulla, Barcelona Supercomputing Center
  • Clemens Grelck, University of Amsterdam
  • Jeff Hammond, Intel
  • Bryce Lelbach, Nvidia
  • Michelle Strout, University of Arizona
  • Kenjiro Taura, University of Tokyo
  • David Wonnacott, Haverford College

 

Call For Participation (for archival purposes)