2010-05-14

Easiness

They tell me Finnish girls are psychosexually active. Which is a scientific euphemism for "easy". It's also true, and as a Finnish male I simply love the fact that sex is so natural and mutually wanted around here.

Yet, the intrique is rarely to be seen. Right now I'm watching the first movie in the Twilight series, and one of the characters just kills me with an off-look. It simply takes one view, and erection it is; not building-wise either.

That sort of thing rarely happens to me around here. The buildup towards tension is absent, and since it is, why build upto anything, really? Just choose and act. So then you either get sex or you don't, and it's always totally ordinary. You might get the occasional triangle, hissy-fit and a ruined nose, but the passion is mostly lacking.

I'm reasonably certain I too like it that way, because I don't like the stress which comes along with the Other Way. But then, our way doesn't exactly stir passions either. It doesn't incentivize clerical rapism -- which truth be told it probably would, if it was good and passionate enough in comparison. It doesn't rock my mind, as in sex drugs and rock'n'roll. (The last one being one more euphemism for the thing itself as well.)

I really wonder, will I ever be able to find someone who really juices me up to a male frenzy, and then, while settled down, continues to do the same over the longer term? I mean if that could happen, I might eventually be able to procreate as well. If not, I'm never going to see kids of my own, which would be a real shame.

2010-05-09

Efficient lookahead peak-limiting

Okay, so you have your basic peek-ahead peak-limiter. Everybody knows how to do the compression part. But how do you track that peak, over a fixed window? Especially, how do you track it efficiently in the ideal case where you have to output a peak value over the window after each and every update to the window?

After a couple of comments on the music-dsp list, with one of them pointing to "something like mipmapping", I came up with this one.

Take a 2^n samples long recirculating delay buffer. As usual, round up to the next power of two for the maximum length of the delay/window, and then tone down from that via the lag between the read and write pointers. Now feed that buffer with what ever sort of maximum you'd like to track. Begin with all-zeroes (or identities), and consistently reset all of the values going out of your window to that as well. Just set all of the incoming values into the buffer, via the write pointer.

After that, you'll have a buffer which you can aggregate as a whole, at each point in time. No matter the lag between the read and write pointers: the data between them won't affect the final result because it's set to zero and/or unity.

Then suppose you want to do a maximum aggregate over your real window. Now it's the same as the aggregate over the whole of your buffer, regardless of whether there's a real value there or just the space between the read and write points. Updating something like that is a whole lot easier: as a first pass, build a full, balanced, binary tree over the whole buffer, and do every aggregation pairwise between adjacent values, then pairwise between the results, then...recurse right upto the top. That will always work because max(.,.) is associative, so it doesn't matter which order you do it in if you don't outright commute the values.

Then take that as an invariant: the whole pyramid on top of the buffer needs to reflect a symmetrical, binary, parenthesized order of calculating maxima from the base data. How do you update that sort of thing incrementally?

Well, you only have two changes per cycle: the one at the read pointer and the one at the write one. Each of those is independent, so you do the same thing to each. And the thing is to recurse upwards towards the root of the tree, compare the new value to the sibling of the shared parent, and then update the parent if the new maximum differs. If it doesn't, stop early. Do that to both pointers for each sample, and always return what is at the top of the pyramid afterwards.

What you then have is a worst-case guaranteed log(n) algorithm in n=window length.And if you then array all of the intermediate maxima in the pyramid after the primary buffer, you'll get a data structure with no explicit bookkeeping or varying datatypes. Just values and other values derived from them. A homogeneous, densely packed array in an order which you'll access linearly per cycle, and whose index arithmetic boils down to a couple of shifts and adds. Tottally neat both for code and cache efficiency.

If you then want to use SSE style vector operations, that is easy too: the data is already in a format where any 2^m wide vector op works at all levels except the top-most ones. They work in linearly ascending order as well, so you can apply them even when you don't really have to; that cuts down on needless conditionals and the resulting pipeline stalls. Unrolling, that's easy as well -- although your superscalar processor probably does that for you with this kind of code.

Extending the data structure is a matter of copying and 2^n interleaving. Not cheap as such, but very regular and as such linear in the size of the existing structure -- which is asymptotically as good as extensible arrays.

And finally, the biggest thing... In all of the construction, we never relied on anything beyond associativity, really.Not essentially at least. All of the stuff is being kept in order, so commutativity is only a problem when we wrap around the end of the buffer. Thus, utilizing associativity, we just need to special case the wrap and everything else works the same. We're also utilizing the existence of an identity value between the read and writer pointers -- but we can dispense with those via another if-clause as well which connects those two values to each other and jointly propagates the results upwards. It's still fully log(n).

In fact in the Starburst and Exodus sort of database research, they already implemented even the requisite operations to insert stuff in the middle and to remove it, willy-nilly. There the semantics they used were the ones which attach to concatenation. But aren't those pretty much the same we're using here? Insertion of subsequences, deletion of them, and the conceptual manipulation of null strings/epsilons. They did that on the higher level of aggregate indices, and our manipulations would then follow those fully (within the constraints of a cyclical buffer) because the higher level indices in both cases abstract back from similar basic invariants/axioms: associativity in the presence of identity (possibly forcibly introduced, but still).

2010-05-07

Yksityinen rasismi?

Eräs kanavatuttuni juuri mainitsi siitä kuinka MILF Kata Kärkkäinen edelleen on. Hyvää comebackiä etsiessäni tuli ekana mieleen Rakel Liekki, joten haeskelin siihen liittyvää juttua ja kuvaa sitten. Tuota kautta törmäsin Liekin ensimmäiseen kolumniin Demarille, jossa hän kritisoi Viivi Avellanin kommentteja siitä kuinka "mokkakikkeli" ei ole hänen mieleensä. Arvio oli, että kommentti on rasistinen.

Noin kielenkäytöltään se varmasti onkin sitä. Mutta kun olen joskus aiemminkin törmännyt kohuun enkä ole saanut sitä sitten kai oikein mietittyä loppuun, rupesin ajattelemaan... Jos yksilö on sitä mieltä, ettei halua harrastaa seksiä erivärisen kanssa, onko se automaattisesti pahaa ja tuomittavaa rasismia?

Tuo on hankala kysymys. Se on kaikille selvää, että poliittisesti värillä ei ole väliä. Sekin on selvää, että on epäsopivaa kannattaa poliittisia aatteita joille värillä on väliä. Ja niin taitaa olla sekin, jopa meikäläiselle äärivapausaatteen kannattajalle, että vaikka yksityinen kaupanomistaja saisi taatusti kieltää vääränvärisiä tulemasta sisälle kauppaansa, kyseessä olisi niin juntti teko että seuraavan kaupan omistajan luultavasti tulisi moraalisista syistä boikotoida ensimmäisen kaupan omistajaa, ihan vain tuosta hyvästä. Kaikki tuo on hirvittävän yksinkertaista, ja sivistynyttä, ja kaikkea sitä, kunnes mennään niihin asioihin jotka tunnustetaan yleisesti ottaen mielipide-, maku- ja yksityisasioiksi.

Eli siis vaikka seksiin: jos en tosiaan halua sitä mokkakikkeliä, olenko automaattisesti paha rasisti? Libertaarin vastaus on tietty tosi yksinkertainen: se on yksityisasia, eli sama mitä haluat. Paitsi että hieman sivistyneempi libertaarikin joutuu sitten palaamaan täsmälleen takaisin siihen moraaliseen kysymykseen: vaikka poliittisesti ehkä onkin aivan sama, onko se yksityisesti ajatellen oikein? Olenko itse hyvä ja moraalinen ihminen, jos ajattelen noin, tai en ajattele?

Tuo kysymys on todella äärimmäisen häiritsevä ja tahmainen; a can of worms, sanoisi anglosaksi, etenkin jollei halua ajatella asiaa pidemmälle. Meinaan kamalan selväähän se on, että jos noin ajattelee, on selvästi ongelman eikä ratkaisun puolella. Jos karsastaa mustaa, kyllä se heijastuu tavalla tai toisella ympäristöönkin. Libertaari pääsee tässä vielä helpolla, kun takana on aina ajatus jakamattomista, kokonaisista, poliittisista oikeuksista. Ne karsivat selvimmän syrjinnän toki pois. Mutta eivät kyllä todellakaan yksityistä. Ja heti kun kuvaan tuodaan vaikkapa demokratia, niin se sitten vain on että tuollainen asenne johtaa huonoon politiikkaankin.

Eli siis pitäisikö tummasta tykätä aiheesta riippumatta? Siksi että vaihtoehto aiheuttaisi yhteisöllisiä ongelmia? Toiselta kantilta tuo menee suttuiseksi koska meillä tosiaan on aiheita -- kuten sänkykaverien valinta -- jotka me tajuamme selvästi yksityisiksi. Tuskin voidaan edellyttää, että sen mustan miehen tulisi ihan vain tasa-arvoperustein olla kaikkien mieleen yhtä hyvännäköinen. Vai voidaanko, kun tuohan tietysti on jo sinänsä rasismia, ja vieläpä sillä syvimmällä mahdollisella, seksuaalisella tasolla? Sellaisella joka motivoi ihmistä yli lähes kaiken muun. Jos joku ajattelee noin, seuraava haaste olisi perustella mikseivät ikä, terveydentila (ml. raittiustila), rahallinen asema, rikostausta tai tuhannet muut täysin arkipäiväiset parinvalintaan vaikuttavat tekijät saisikaan oikeastaan vaikuttaa. "Jaa ettei tyttöä pappa kiinnosta? Ikärasisti on, ei kelpaa kunnialliseen yhteiskuntaan."

Yleensä osaan päättää moraaliset kiistakysymykset melko siististi omalta kohdaltani. Tässä en täysin. Epäilen, että periaate jolla tällaiset ratkaistaan on jollain tavalla tekemisissä sen kanssa kuinka välineellisesti toisia joutuu kohtelemaan tietyssä täsmällisessä tilanteessa, ja kuinka julkisesti, eli siis kuinka suurissa yhteisöissä/ihmisjoukoissa. Ei siis ehkä ole oikein syrjiä mustaa kaupassa, kun se on julkista, puolianonyymia ja henkilökohtaisesta hyvinvoinnista pitkälti erotettavaa toimintaa. Seksi taas on päinvastaista. Mutta tässä kohtaa silti jää niin järjettömän leveä harmaa alue, sekä huonot perusteet niin mustalle kuin valkoisellekin sen sivuilla, että kysymys ei tainnut oikein vielä ratketa noin yksinkertaisin perustein.

Tätä tullaan vielä miettimään pitkät pätkät. Enkä edes vielä mennyt siihen, saisiko niitä söpöjä japskityttöjä päin vastoin suosia...

2010-05-06

Optimizing multi-tier storage on the margin

It seems like all of the studies in multi-tier disk storage (secondary, tertiary, and ostensibly beyond/in-the-middle) do hard choices as to the categorization of data in one class or another.

Why so? Factually, when we go from purely technical optimization within a set of data to a multi-tiered architecture, we inevitably deal with an economical problem: we segregated the data for a reason, that reason derives from human concerns, and human concerns always have to do with economic cost. That much is also known: we do analyze our choices in terms of total cost of ownership, energy costs, the fixed costs of hardware, and so on. We even try to make those different kinds of costs commensurable when we discount future costs and so on.

But when do we ever apply the rest of the economic theory? Particularly, when do we ever translate any of that back into the physical level, so that the automated framework better reflects our intuitive reason, or what it should rationally speaking tell us? Almost never. A particular example of that is that we almost never directly apply automatic optimization over the current cost frontier, using marginals. Not the supply one that we're so hard trying to quantify anyway, and certainly not the demand one which a large corporation such as mine could ostensibly measure/try out.

My lauching point for this rant is about how to use disks right with large data. The rational way to go in today's environment is that most of the disks should contain rarely accessed data, and mostly be powered down. (If you're familiar with the idea, that's called "massive arrays of idle disks"/MAID, in contrast with the more well known "redundant arrays of inexpensive/independent disks"/RAID. There's a whole literature dealing with such concepts; just search for "PARAID" to get a hint of what can be done.) Yet people rarely try to optimize that sort of thing using expected extra benefit from topmost items. That is, deduction on the margin/efficiency frontier.

Even the nicest article I've seen, which was on "disk cooling" doesn't do that. Quite certainly it doesn't do what it does do in a generalized way. There is no real cost function to be seen even there. Plus the only real cost-aware things I know of -- the more developed relational database optimizers -- don't really let you input your values into the optimization process either. Much.

I'd really like to see multi-tier storage, which incorporates an optimizer which can be tuned declaratively, by declaring cost weights and/or entire cost functions. Over multiple variables, like expected average throughput, interconnect contention, maximum latency at a certain n-tile, expected total operating cost per time, and so on, over (roughly) comparable units. Systems which then utilize an intuitively sensible cost function over those parameters, and piecewise, slowly optimize over the efficiency frontier to yield adaptation towards a guaranteedly efficient eqiulibrium.

Markets do that, and they don't have nearly the computational capacity or rationality/consistency for an individual chooser that automated systems do. So why not leverage the market mechanics? Especially since, in a closed system, you can make the actors fully altruistic if really need be. There is absolutely no reason not to employ on-the-margin, incremental adaptation, except for algorithmic discontinuities in the cost landscape (which then should be avoided; cf. the parametric optimization literature within RDMBSs), equilibrium selection issues (which can mostly be automatically randomized away via purposely introduced mixed equilibria and optimization over the mixtures as a whole) and joint concavity of the optimands (most an issue with people, with limited time-to-personal-extinction).

As a simple and practical application, somebody already got this: "disk cooling" is all about moving individual pieces of data between "hot" and "cold" areas of the disk. At least when most efficiently implemented. Some hysteresis might be necessary to detract from the overall churn, true, but even then that's about economically quantifiable transaction costs. The model just works, and has already lead to very well founded models. Why not do it in the open, on the cloud, then, for real and with all of the additions over time?!?

2010-05-01

Nothing new here

Just now I had the impulse to write something, after awhile. But really, I don't have much of a new idea to report. That is embarrassing and frustrating.

I did do a relational data modeling workshop just now, true, but that's just about repeating knowledge somebody else produced and which I just happen to know. Phoney on all of the wrong levels. Something like that doesn't contribute anything new; it's about parroting the real thing from ages ago, which the listeners/attendees should already have read about fully by themselves.

Perhaps I should take solace in the knowledge that I now belong to the select crowd of bloggers who actually post negative results? And no, not even that: my post isn't even negative and funny enough. Sigh...