R (15) Admin (12) programming (11) Rant (6) personal (6) parallelism (4) HPC (3) git (3) linux (3) rstudio (3) spectrum (3) C++ (2) Modeling (2) Rcpp (2) SQL (2) amazon (2) cloud (2) frequency (2) math (2) performance (2) plotting (2) postgresql (2) DNS (1) Egypt (1) Future (1) Knoxville (1) LVM (1) Music (1) Politics (1) Python (1) RAID (1) Reproducible Research (1) animation (1) audio (1) aws (1) data (1) economics (1) graphing (1) hardware (1)

23 August 2013

Everyday revision control

This post has been a long time coming. Over the past year or so, I've gradually become familiar, even comfortable, with git. I've mainly used it for my own work, rather than as a collaborative tool. Most of the folks that I work with don't need to share code on a day-to-day basis, and there's a learning curve that few of my current colleagues seem interested in climbing at this point. This hasn't stopped me from *talking* to my colleagues about git as an important tool in reproducible research (henceforth referred to as RR).

I find the process of committing files and writing commit messages at the end of the day forces me to tidy up. It also allows me to more easily put a project on hold for weeks or months and to then return to it with a clear understanding of what I'd been working on, and what work remained. In short, I use my git commit messages very much like a lab notebook (a countervailing view on git and reproducible research is here, an interesting discussion of GNU make and RR here, and a nice post on RR from knitr author Yihui Xie here ).

Sidenote: I've hosted several projects at, and used their git archives, particularly for classes (I prefer the interface to github, though the two platforms are similar). I've also increasingly used Dropbox for collaborations, and I've struggled to integrate Dropbox and Git. Placing the same file under the control of two very different synchronization tools strikes me as a Bad Idea (TM), and Dropbox's handling of symlinks isn't very sophisticated. On the other hand, maintaining 2 different file-trees for one project is frustrating. I haven't found a good solution for this yet...

As far as tools go, most of the time I simply edit files with vim and commit from the commandline. In this sense, git has barely changed my work flow, other than demanding a bit of much-needed attention to organization on my part. Lately, I've started using GUI tools to easily visualize repositories, e.g. to simultaneously see a commit's date, message, files, and diff. Both gitk and giggle have similar functionality -- giggle is prettier, gitk is cleaner. Another interesting development is that Rstudio now includes git tools (as well as native latex and knitr support in the native Rstudio editor). This means that a default Rstudio install has all the tools necessary for a collaborator to quickly and easily check out an repository and start playing with it.

No comments:

Post a Comment