Pages

07 June 2013

Symmetric set differences in R

My .Rprofile contains a collection of convenience functions and function abbreviations. These are either functions I use dozens of times a day and prefer not to type in full:
## my abbreviation of head()
h <- function(x, n=10) head(x, n)
## and summary()
ss <- summary
Or problems that I'd rather figure out once, and only once:
## example:
## between( 1:10, 5.5, 6.5 )
between <- function(x, low, high, ineq=F) {
    ## like SQL between, return logical index
    if (ineq) {
        x >= low & x <= high
    } else {
        x > low & x < high
    }
}
One of these "problems" that's been rattling around in my head is the fact that setdiff(x, y) is asymmetric, and has no options to modify this. With some regularity, I want to know if two sets are equal, and if not, what are the differing elements. setequal(x, y) gives me a boolean answer to the first question. It would *seem* that setdiff(x, y) would identify those elements. However, I find the following result rather counter-intuitive:
> setdiff(1:5, 1:6) 
integer(0)
I personally dislike having to type both setdiff(x,y) and setdiff(y,x) to identify the differing elements, as well as remember which is the reference set (here, the second argument, which I find personally counterintuitive). With this in mind, here's a snappy little function that returns the symmetric set difference:
symdiff <- function( x, y) { setdiff( union(x, y), intersect(x, y))}
> symdiff(1:5, 1:6) == symdiff(1:6, 1:5)
[1] TRUE

Tada! A new function for my .Rprofile!

No comments:

Post a Comment