R (15) Admin (12) programming (11) Rant (6) personal (6) parallelism (4) HPC (3) git (3) linux (3) rstudio (3) spectrum (3) C++ (2) Modeling (2) Rcpp (2) SQL (2) amazon (2) cloud (2) frequency (2) math (2) performance (2) plotting (2) postgresql (2) DNS (1) Egypt (1) Future (1) Knoxville (1) LVM (1) Music (1) Politics (1) Python (1) RAID (1) Reproducible Research (1) animation (1) audio (1) aws (1) data (1) economics (1) graphing (1) hardware (1)

14 November 2014

Numerical Simulations and Data passing: C++, Python, and Protocol Buffers

Problem statement & Requirements

I'm working with a complex C++ simulation that requires a large number of user-specified parameters. Both speed and readability are important. I'd like to define all possible parameters in one (and only one) place, and include sensible defaults that can be easily over-ridden. Finally, intelligent type-handling would be nice. For convenience, I decided to wrap the C++ simulation in python setup/glue code. Python is a logical choice here as the "available everywhere" glue language that has nice standard libraries.

Available libraries

There aren't many data-passing options that work with both C++ and python. Libconfig, JSON, XML, and Google Protocol Buffers (PB) appear to be the only reasonable options. Here's my thoughts on the first three:
  • Libconfig: Nice clean library, good language support. The big downside is that data structures must be defined both in a data file and in code - e.g. data is "moved" from a file into C++ variables. I feel like libconfig is best for a small number of complex variables, like lists and vectors.
  • JSON: no clear standard C++ library, library docs so-so, speed complaints from some?
  • XML: Massive overkill.
That leaves PB, which has nice docs for both C++ and python. All the variables, along with their types and defaults, are defined in a .proto file. The protoc tool auto-generates python and C++ code from the .proto file. By adding it to my Makefile, C++ classes are autogenerated at compile time. This makes for fast and readable C++ code - like using a named dict, but without the speed costs.

Solution / Workflow

I'm using python to read user-supplied values into a set of PB messages, and then serializing the messages to files. C++ then reads the messages from those files at runtime. A python script run by make synchronizes the locations of files between python and C++. I also want to process commandline options for my python wrapper script. Happily, I can hand a PB message to python's parser.parse_args() and have it set PB message attributes with setattr(). The last python step (aside from writing the message to disk) is reading "variable,value" pairs from a .csv file. If a variable has already been set by parse_args, I skip it: the commandline values override .csv file values.


Overall, PB makes a very nice data coupler between an interpreted language like python and a compiled language like C++. Python excels at text processing and is easy to prototype, while C++ is fast and beautiful. PB has a few side-benefits. On the C++ side, it provides some natural namespace encapsulation to manage variable-explosion. Runtime inspection with gdb is easy enough. Finally, storing all the options values used to run each simulation in a standard-format file is handy - it allows tests to re-run the simulation with exactly the same inputs.

Python Snippets

def main():
    ## initialize protobuf, fill with ParseArgs
    setupSim = ProtoBufInput_pb2.setupSim()
    setupSim = ParseArgs(sys.argv[1:], setupSim)

def ParseArgs(argv, setupSim):
    parser = OptionParser(usage=" [options]\nNote: commandline args over-ride values in files.", version=setupSim.version)
    ## these must be valid protocol buffer fields 
    parser.add_option("-t", "--test", dest="testCLI",
        action='store_true', help="Run test suite")
    parser.add_option("-d", "--days", metavar='N',
        type='int', help="Number of days to simulate")
    ## parse!
    (setupSim, args) = parser.parse_args(argv, values=setupSim)

def prepInput(setupSim):
    ## options from ParseArgs
    inhandle = open(setupSim.file_options, 'r')
    outhandle = open(ProtoDataFiles.PbFile_setupSim, 'wb')
    reader = csv.reader(inhandle, delimiter=',')
    header =
    if not (header == ['variable','value']):
        raise Exception('Incorrect header format') 

    for row in reader:
        ## skip comments, check for 2 fields per row
        if (row[0][0] == '#'):
        if not (len(row) == 2):
            raise Exception('Problem with value pair: %s' % row)
        ## pack the message using text representation
        msgText = '%s : %s' % (row[0], row[1])
        if setupSim.HasField(row[0]):
            print("Skipping config file, keeping commandline value: %s, %s" % (row[0], getattr(setupSim,row[0])))
        setupSim = Merge(msgText, setupSim)
    ## write out to file for C++ to read

def RunSim():

if __name__ == "__main__":

C++ Code Snippets

#include "proto/ProtoBufInput.pb.h"

void PbRead(Type &msg, const char *filename){
    std::fstream infile(filename, std::ios::in | std::ios::binary);
    if (!infile) {
       throw std::runtime_error("Setup message file not found");
    } else if (!msg.ParseFromIstream(&infile)) {
       throw std::runtime_error("Parse error in message file");

// sim.cpp
#include "PbRead.h"
#include "ProtoDataFiles.h"

// protocol buffers get passed around, are globals
ProtoBufInput::setupSim PbSetupSim;

int main(int argc,char **argv)
    // #define PbFile_setupSim "filename" in ProtoDataFiles.h, written by make
    PbRead(PbSetupSim, PbFile_setupSim);
    if (PbSetupSim.test_2()){