Protocol Buffers Support in Chapel

This document describes a Google Protocol Buffers implementation for Chapel. Protocol Buffers is a language-neutral, platform-neutral, extensible mechanism for serializing structured data. The protocol buffer language is a language for specifying the schema for structured data. This schema is compiled into language specific bindings.

This project is comprised of two components:

  • Code generator: The protoc-gen-chpl tool is a compiler plugin to protoc, the protocol buffer compiler. It enables the protoc compiler to generate Chapel code.

  • Chapel library: A module that has the runtime implementation of protobufs in Chapel and provides functionality to serialize messages in wire format.

Installation Instructions

You should have the protocol buffer compiler, v3.0.0 or higher, installed on your system. Please see the official installation guide.

Once you have the compiler installed, in $CHPL_HOME run the following:

make protoc-gen-chpl

It builds the protoc-gen-chpl binary so that the command line interface can be used. This installs protoc-gen-chpl in the same place as the chapel compiler (chpl) so that it can be used anywhere in the user’s file system.

To remove protobuf support, change directory to $CHPL_HOME/tools/protoc-gen-chpl and run:

make clean

Defining protocol format

To make use of the Chapel protobuf implementation you need to start by defining a .proto file. The definitions in a .proto file contain a message for each data structure you want to serialize, then specify a name and a type for each field in the message.

Below is an example of an addressbook for a person. This section describes a simple .proto file and the corresponding generated chapel code. For complete details on .proto files see the links at the end of this document.

Note

The Chapel implementation only supports the proto3 syntax of protobuf.

The .proto file starts with an optional package declaration, which helps to prevent naming conflicts between different projects.

syntax = "proto3";
package addressbook;

In Chapel the generated records will be placed in a module matching the package name. If the package name is not specified the module takes the name of the proto file with all non-alphanumeric characters replaced by an underscore.

Next, you have to define your messages. A message is just an aggregate containing a set of typed fields. Many standard simple data types are available as field types, including int64, int32, float, double, and string.

message Person {
  string name = 1;
  int32 id = 2;  // Unique ID number for this person.
  string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    string number = 1;
    PhoneType phntype = 2;
  }

  repeated PhoneNumber phones = 4;

}

In the above example the Person message contains a string typed name field with field number 1, a int32 typed id field with field number 2 as well as a string typed email field with field number 3. You can also define enum types if you want one of your fields to have one of a predefined list of values - here you want to specify that a phone type can be one of MOBILE, HOME, or WORK. You can also define nested messages and use these as field types, like the PhoneNumber message containing a string typed number field with field number 1 and a phntype enum field with field number 2. The phones field with field number 4 is an example of a repeated message field. If a field is repeated, the field may be repeated any number of times (including zero). The order of the repeated values will be preserved in the protocol buffer. If a field is not set, a default value is assigned to the field by Chapel.

Compiling your protocol buffers

The code generator is integrated with the protoc compiler toolchain included in the default Protocol Buffers distribution. Use the protoc command with the --chpl_out flag to invoke the Chapel code generator and write the output chpl file to a specific location.

protoc --chpl_out=$DST_DIR $SRC_DIR/addressbook.proto

This generates addressbook.chpl in your specified directory.

The generated file

The generated addressbook.chpl file will contain:

  • A wrapper module with the name addressbook.

  • Person record with name, id, email and phones fields.

  • Person_PhoneNumber record with number and phntype fields.

  • An enum with the name Person_PhoneType.

  • serialize and deserialize functions for serialization/parsing.

You can import this module to a chpl file and can create an instance of Person for populating data;

use addressbook;
use IO;

var messageObj: Person;
messageObj.name = "John";
messageObj.id = 429496729;
messageObj.email = "John@a.com";

var phoneNumber1: Person_PhoneNumber;
phoneNumber1.number = "555-4321";
phoneNumber1.phntype = Person_PhoneType.HOME;
messageObj.phones.append(phoneNumber1);

var phoneNumber2: Person_PhoneNumber;
phoneNumber2.number = "555-4444";
phoneNumber2.phntype = Person_PhoneType.WORK;
messageObj.phones.append(phoneNumber2);

Serialization and parsing

The whole purpose of using protocol buffers is to serialize your data so that it can be parsed elsewhere. You can serialize your message object using the IO module and the serialize function.

var file = open("out", ioMode.cw);
var writingChannel = file.writer();

messageObj.serialize(writingChannel);

Parsing is also similar, each generated record has a deserialize function. So to parse the file we have just created we can use:

use addressbook;
use IO;

var file = open("out", ioMode.r);
var readingChannel = file.reader();

var messageObj = new Person();
messageObj.deserialize(readingChannel);

Features

The following features are currently supported

  1. Message definitions

  2. Scalar value types

  3. Unknown fields

  4. Packages

  5. Enumerations

  6. Repeated fields

  7. Nested types

  8. Using other message types

  9. Any message type

  10. Oneofs

  11. Maps