Labels

R (15) Admin (12) programming (11) Rant (6) personal (6) parallelism (4) HPC (3) git (3) linux (3) rstudio (3) spectrum (3) C++ (2) Modeling (2) Rcpp (2) SQL (2) amazon (2) cloud (2) frequency (2) math (2) performance (2) plotting (2) postgresql (2) DNS (1) Egypt (1) Future (1) Knoxville (1) LVM (1) Music (1) Politics (1) Python (1) RAID (1) Reproducible Research (1) animation (1) audio (1) aws (1) data (1) economics (1) graphing (1) hardware (1)

25 June 2009

What my violin really sound (and look) like

I've been trying to spend some time with the violin on a semi-regular basis. When it's more regular, it sounds less bad. I have a vague idea when i'm playing whether i'm horrible or tolerable, but listening to one's own self is a whole other thing. Today it was raining, so i had to play indoors. I hooked up rosegarden and set a cheap but remarkably effective gooseneck headset mic in the middle of the room and started playing. You don't really want to hear it, trust me. I'm trying to play more longingly minor keys because i like the sounds of them. They sound like i feel a lot of the time. Still, my fingers don't really know where they are. While i'm playing, the "music in my head" cancels out many of these mistakes before they reach my ear. In the recording i can "see" where i'm trying to go, but i hear all the mistakes quite clearly, too! I was checking on the web to see if Rosegarden has a spectogram viewer inside it when i found this wonderful tool for audio visualizion - sonic visualizer. I haven't been this impressed with audio software since, oh, rosegarden? It's currently devouring 100% of one of the cores of my CPU rendering the full 10 minute spectrograph as i write this, but it updates to the current view very quickly. It's user controls are *smooooooth*. Here's a spectrograph view of 15-odd seconds (don't know why the time axis isn't labeled here...), with frequency and amplitude scales shown on the left. I never really thought about it, but the harmonics are more widely spaced for high notes, accounting for the perceived "pureness" of the tone? You can see the vibrato, too. Pretty cool. Click on the image to see it in decent-resolution.

19 June 2009

Lattice magic - top-to-bottom plotting order in xyplot

I've been wondering about this for a while - an alternative to as.table that fills from top-to-bottom first. Working with paired timeseries, it's much easier to compare vertically, with time-axis already aligned. Deepayan's method is fast and general, and that's awesome...

From R-help:
Hi, Using library(lattice), is there any way to tell xyplot to plot panels top to bottom, then left to right (i.e. panels are appended vertically, then horizontally). as.table changes the plot direction from left-to-right then top-to-bottom, to right-to-left then bottom- to-top, but that's not quite what I want to do. Thanks Yan

Deepayan says:
> tmp.tr3 <- xyplot(y ~ x | a, data=tmp, as.table=TRUE)  
Another high level option is to change the rule determining how packets are chosen for a given panel in the layout.
print(tmp.tr3,  packet.panel = function(layout, row, column, ...) { 
    layout <- layout[c(2, 1, 3)]
    packet.panel.default(layout = layout, row = column, column = row, ...) 
})
This effectively transposes the layout, which (along with as.table=TRUE) is what you want.
-Deepayan

15 June 2009

Obfuscated SQL

I've finally learned to create views early and often in postgres... They're like variables (with lazy evaluation). Only different. A lot different, in ways that make as much sense to me as conjugating weird tenses in french. Here i've used multiple views stacked atop each other. It's not the fastest way, but for hundreds of thousands of records, it's fast enough. I have multiple loggers in different wells. Every well has its own datum. Every file belongs to a logger, which in turn belongs to a well, and requires the correct datum. I've had to join and union several different subqueries and views. It's a god-aweful mess, but i think i finally got it.
--here's the magic: v_sourcedat
--matches source files to datums
create or replace view v_sourcedat as
        select  unit_id||well_num as well, datum_m, gw.sn, source
                from v_loggerdat wd,
                ( SELECT tmp.sn, min(tmp."time")::date AS start,
                        tmp.source FROM tmp_gwdata tmp
                        GROUP BY tmp.source, tmp.sn)
                gw
                where gw.sn = wd.sn::int
                and gw.source not in (select source from v_sourcetowell)
        union select unit_id||well_num as well, datum_m, gwo.sn, source
                from v_loggerdat wd,
                (select * from v_sourcetowell)
                gwo
                where gwo.well = wd.unit_id||well_num
        order by well;

-- using v_sourcedat, as above
-- the final data
create or replace view v_gwdata_4hr as
        SELECT datum.well, datum.sn, gw."time",
                gw.dtw::numeric(10,4) - datum.datum_m::numeric(10,4) AS dtw,
                gw.dtwsd, gw.temp, gw.tempsd
        FROM v_sourcedat datum
        JOIN gwdata_4hr gw using (source)
        WHERE gw.temp <> 3::numeric
        AND gw.dtw > (-0.5) AND gw.dtw < 4.5
        AND gw."time" < '2010-01-01 00:00:00'::timestamp without time zone
        ORDER BY datum.well, datum.sn, gw."time";

12 June 2009

flying standby

After a long and futile day at the airport yesterday watching summer afternoon thunderstorms make a hash of my travel plans, i booked for the 6am out of ABQ.

A brief list of miraculous events proceed:
1. Megan graciously and unexpectedly offers to drop me off at the god-aweful hour of 5am.

2. My alarm fails to wake me at the even more god-aweful hour of 4:30, but a gate-change triggers a call from Southwest at 5:20am. I'm obviously late.

3. There's leftovers in BAM's fridge, and megan spots me a V8.

4. At security, i'm shuttled into the pilot/express line. I fly through security despite an irrationally long 5:40am "normal" line.

5. I don't feel like a total zombie...

6. Pressure-induced condensation vortex from turbelence in the streamline off the wing during descent braking.

7. belgian waffle with real butter

8. getting on the plane from denver to baltimore.

it's shaping up to be areally nice day. i'm sure i'll keel over eventually, in a safe place...

06 June 2009

Postgresql Poetry? Aggregate median with PL/R

No, no poetry yet, but i think i'm getting closer to thinking like a query planner. Today i learned aggregates. Have i mentioned how much i love the postgresql documentation? Unknownst to myself before today, postgres's aggregates are smart - they process as a stream when at all possible - so average keeps track of the running sum and number, and calculates the average at the end. Which makes median tricky. It depends on the length of input. I have a median implimentation in PL/Perl that queries the table directly, asking it for length, and then extracts the needed records. While this is efficient, it's much closer to magic than i'm comfortable with. Armed with PL/R, I create a median function with the proper type, and then I create the aggregate using that function as the final calc, and a simple accumulator as the transistion function. This isn't very efficient when the "group by" clause give large groups to be passed into the aggregate function, since the whole group has to be stored in memory. But for lots of small groups (say, aggregating 4 samples per hour or 24 samples per day over a year or two), it gives ~20% performance gains. Throw a "limit 5" clause in, and the performance gain increases to ~100%. Here's a link to some truly amazing stuff folks are doing with PL/R. -- sql code follows -- first create the function, then the aggregate
CREATE OR REPLACE FUNCTION median(vals numeric[]) RETURNS float AS '
median(vals)
' LANGUAGE 'plr' STRICT;

CREATE AGGREGATE median (numeric)
(
sfunc = array_append,
stype = numeric[],
initcond = '{}',
finalfunc = median
);

--And usage
select sn, median(dtwm), count(dtwm), date_trunc('day', timestamp) as day from
gwdata where timestamp < '2003-01-01' group by sn, day limit 5;   
-- sn  | median | count |          day  -----+--------+-------+------------------------  
-- 5526 |   1.59 |    24 | 2002-12-26 00:00:00-07  
-- 5539 | 1.7605 |    24 | 2002-02-27 00:00:00-07  
-- 5522 |  0.737 |    24 | 2001-10-03 00:00:00-07  
-- 5517 |   0.96 |    24 | 2001-11-05 00:00:00-07  
-- 5513 | 1.3855 |    24 | 2001-09-07 00:00:00-07  
-- Time: 247.126 ms     

-- Here's the PL/Perl median() function.  
-- Much less straightforwards, included for historical interest BEGIN { strict->import(); }
my ($tname,$cname) = @_;
my $SQL = "SELECT count($cname) AS t FROM $tname";
my $rc = spi_exec_query($SQL);
my $total = $rc->{rows}[0]{'t'};
$total < offset =" ($total-1)/2;" sql = "SELECT $cname AS median FROM $tname     ORDER BY $cname OFFSET $offset LIMIT 1" sql = "SELECT avg($cname) AS median FROM   (SELECT $cname FROM $tname    ORDER BY $cname      OFFSET $offset LIMIT 2   ) AS foo" rc =" spi_exec_query($SQL);">{rows}[0]{median};

01 June 2009

Dear google - "a driver is not registered for the url sdbc:postgresql"

I have an openoffice database project built around a postgresql database using the postgresql sdbc driver, Everything worked really well on my old machine. Why doesn't it work on the new one? Does the driver not work for openoffice 3? Is it a jaunty issue? I followed the instructions and read the f@#$*'ing manual! The error message is the title of this post - google doesn't give a fix, per se, and shows just a few hits, but reveals that it's a 32/64 bit issue. Well then. There appears to be a jaunty package, openoffice.org-sdbc-postgresql. Does that work in and of itself (after i uninstall the bad 32 bit one)? Nope. But executing ### dpkg -L openoffice.org-sdbc-postgresql | grep zip shows me where the file i need is, and executing ### unopkg add `dpkg -L openoffice.org-sdbc-postgresql | grep zip` installed it for me. Everything works - hoorah!