14 November 2014

Numerical Simulations and Data passing: C++, Python, and Protocol Buffers

Problem statement & Requirements

I'm working with a complex C++ simulation that requires a large number of user-specified parameters. Both speed and readability are important. I'd like to define all possible parameters in one (and only one) place, and include sensible defaults that can be easily over-ridden. Finally, intelligent type-handling would be nice. For convenience, I decided to wrap the C++ simulation in python setup/glue code. Python is a logical choice here as the "available everywhere" glue language that has nice standard libraries.

Available libraries

There aren't many data-passing options that work with both C++ and python. Libconfig, JSON, XML, and Google Protocol Buffers (PB) appear to be the only reasonable options. Here's my thoughts on the first three:
  • Libconfig: Nice clean library, good language support. The big downside is that data structures must be defined both in a data file and in code - e.g. data is "moved" from a file into C++ variables. I feel like libconfig is best for a small number of complex variables, like lists and vectors.
  • JSON: no clear standard C++ library, library docs so-so, speed complaints from some?
  • XML: Massive overkill.
That leaves PB, which has nice docs for both C++ and python. All the variables, along with their types and defaults, are defined in a .proto file. The protoc tool auto-generates python and C++ code from the .proto file. By adding it to my Makefile, C++ classes are autogenerated at compile time. This makes for fast and readable C++ code - like using a named dict, but without the speed costs.

Solution / Workflow

I'm using python to read user-supplied values into a set of PB messages, and then serializing the messages to files. C++ then reads the messages from those files at runtime. A python script run by make synchronizes the locations of files between python and C++. I also want to process commandline options for my python wrapper script. Happily, I can hand a PB message to python's parser.parse_args() and have it set PB message attributes with setattr(). The last python step (aside from writing the message to disk) is reading "variable,value" pairs from a .csv file. If a variable has already been set by parse_args, I skip it: the commandline values override .csv file values.


Overall, PB makes a very nice data coupler between an interpreted language like python and a compiled language like C++. Python excels at text processing and is easy to prototype, while C++ is fast and beautiful. PB has a few side-benefits. On the C++ side, it provides some natural namespace encapsulation to manage variable-explosion. Runtime inspection with gdb is easy enough. Finally, storing all the options values used to run each simulation in a standard-format file is handy - it allows tests to re-run the simulation with exactly the same inputs.

Python Snippets

def main():
    ## initialize protobuf, fill with ParseArgs
    setupSim = ProtoBufInput_pb2.setupSim()
    setupSim = ParseArgs(sys.argv[1:], setupSim)

def ParseArgs(argv, setupSim):
    parser = OptionParser(usage=" [options]\nNote: commandline args over-ride values in files.", version=setupSim.version)
    ## these must be valid protocol buffer fields 
    parser.add_option("-t", "--test", dest="testCLI",
        action='store_true', help="Run test suite")
    parser.add_option("-d", "--days", metavar='N',
        type='int', help="Number of days to simulate")
    ## parse!
    (setupSim, args) = parser.parse_args(argv, values=setupSim)

def prepInput(setupSim):
    ## options from ParseArgs
    inhandle = open(setupSim.file_options, 'r')
    outhandle = open(ProtoDataFiles.PbFile_setupSim, 'wb')
    reader = csv.reader(inhandle, delimiter=',')
    header =
    if not (header == ['variable','value']):
        raise Exception('Incorrect header format') 

    for row in reader:
        ## skip comments, check for 2 fields per row
        if (row[0][0] == '#'):
        if not (len(row) == 2):
            raise Exception('Problem with value pair: %s' % row)
        ## pack the message using text representation
        msgText = '%s : %s' % (row[0], row[1])
        if setupSim.HasField(row[0]):
            print("Skipping config file, keeping commandline value: %s, %s" % (row[0], getattr(setupSim,row[0])))
        setupSim = Merge(msgText, setupSim)
    ## write out to file for C++ to read

def RunSim():

if __name__ == "__main__":

C++ Code Snippets

#include "proto/ProtoBufInput.pb.h"

void PbRead(Type &msg, const char *filename){
    std::fstream infile(filename, std::ios::in | std::ios::binary);
    if (!infile) {
       throw std::runtime_error("Setup message file not found");
    } else if (!msg.ParseFromIstream(&infile)) {
       throw std::runtime_error("Parse error in message file");

// sim.cpp
#include "PbRead.h"
#include "ProtoDataFiles.h"

// protocol buffers get passed around, are globals
ProtoBufInput::setupSim PbSetupSim;

int main(int argc,char **argv)
    // #define PbFile_setupSim "filename" in ProtoDataFiles.h, written by make
    PbRead(PbSetupSim, PbFile_setupSim);
    if (PbSetupSim.test_2()){