In search of bin counts
I look at histograms and density functions of my data in R on a regular basis. I have some idea of the algorithms behind these, but I've never had any reason to go under the hood until now. Lately, I've been looking using the bin counts for things like Shannon entropy ( in the very nice entropy package. I figured that binning and counting data would either be supported via a native, dedicated R package, or quite simple to code. Not finding the former (myhist = function(x, dig=3) { x=trunc(x, digits=dig); ## x=round(x, digits=dig); aa = bb = seq(0,1,1/10^dig); for (ii in 1:length(aa)) { aa[ii] = sum(x==aa[ii]) }; return(cbind(bin=bb, dens=aa/length(x))) } ## random variates test = sort(runif(1e4)) get1 = myhist(test)
Trouble in paradise
Truncate the data to a specified precision, and count how many are in each bin. Well, first I triedDear Google...
An hour of irritation and confusion later, I ask google and, small wonder, the second search result links to the ash package that contains said tool. And it runs somewhere between 100 and 1,000 times faster. It doesn't return the bin boundaries by default, but it's good enough for a quick-and-dirty empirical probability mass distribution.To be fair, there's something to be said for cooking up a simple solution to a simple problem, and then realizing that, for one reason or another, the problem isn't quite as simple as one first thought. On the other hand, sometimes we just want answers. When that's the case, asking google is a pretty good bet.
## their method require(ash) get2 = bin1(test, c(0,1), 1e3+1)$nc
You should really be more productive on your caffeine buzzes.
ReplyDeleteCause these long-winded musings are a waste of your processing power. Imagine how many whales you could have saved or whatever you do.
ReplyDeleteIs table(cut(x, bb)) doing what you want? Format is different but result are counts of x in bins.
ReplyDelete