2020-09-29

Hypoteettinen feministi yleisössä

Käytin juuri eräässä yhteydessä lastensaantia ja siihen liittyvää klassista pähkinää esimerkkinä seriaalisesta prosessista, joka voi auttaa selittämään laskentatehon kvalitatiivista muutosta seriaalisesta kohti paralleelia viime vuosikymmenen tai parin aikana. Sitten tajusin että jos luennoisin noin yleisölle, nykyään joku saattaisi kyseenalaistaa sen kuinka käytän esimerkissä hyväkseni sukupuolikäsitteitä. Eli tuli mieleen, että ehkäpä minulla pitäisi olla valmiina vastaus yleisön herännäisfeministille

Esimerkki tietenkin on se, että vaikka nainen yleensä pykää kerran siitetyn lapsen maailmaan n. yhdeksässä kuukaudessa, siitä ei seuraa että yhdeksän naista saisi tuotettua lapsen kuukaudessa. Siitä seuraa vain että yhdeksässä kuukaudessa yhdeksän naista pystyvät pullauttamaan maailmaan yhdeksän lasta.

Tuo on täydellinen analogia siitä mitä tapahtuu kun tietyssä algoritmissa lopputulos on jakamaton (lapsi, ei vain sen osia) ja algoritmi sisältää seriaalisia riippuvuuksia jotka estävät rinnakkaistamisen (ensin neuraaliputki, sitten polarisoivia kemiallisia gradientteja, ja sitten solujen paikallinen erikoistuminen pääksi ja varpaiksi gradientin päissä). Kun ainoa tunnettu lapsialgoritmi on tuolla tavalla lineaarinen, eikä sellainen että jostain tulee silmät ja varpaat ja muut jotka liityvät maagisesti yhteen toimivaksi kokonaisuudeksi, työ ei ole rinnakkaistettavissa yhdeksälle naiselle; läpimenoaikaa inseminaatiosta parturitioon ei voida juurikaan parantaa. Jos lapsia jostain syystä halutaan paljon, ongelma rinnakkaistuu vain sillä tasolla että yhdeksän naista on samaan aikaan raskaana, sen saman yhdeksän kuukautta. Tuo mallintaa erinomaisesti sitä mikä ero CPU:lla ja GPU:lla tietojenkäsittelyssä on, ja ennen kaikkea sitä kuinka jotkin ongelmat eivät nyt vain rinnakkaistamalla nopeudu siitä mitä ne seriaalisesti olisivat; voit ravita ehkä juuri ja juuri raskaana olevan naisen ja helliä häntä niin täysin että hän kykenee uuteen raskauteen heti edellisen loputtua, mutta se seriaalinen lapsialgoritmin toteutusnopeus kasvaa aivan maksimissaan yhteen lapseen yhdeksässä kuukaudessa, ja silloinkin venytät hyvin konkreettisesti fyysisiä ja emotionaalisia rajoja. Samalla kun lapsityötä tietysti voi tehdä vaikka minkälainen naisten armeija rinnakkain, ja siis lapsia tulla paljonkin jos pakko jostain ihmeen syystä ois.

Minä epäilen, että kun esittäisin tämän esimerkin, joku nykyään pillastuisi. Naisoletuksista ja vastaavista. Sukupuolittavasta kielestä, ja epäherkkyydestä vaikkapa. Mutta minusta tuo menee ohi itse aiheen, jopa feministisesti katsottuna.

Minähän kuitenkin tässä käytän hyvin rajattua sukupuolen ja lisääntymisen stereotyyppistä määritelmää. Varsin alhaisen tason sellaista, enkä siis läheskään ainoaa. Ja vain siksi, että biologisesti katsoen tuo on ainakin tällä hetkellä ainoa toteutuva tai mahdollinen juttu: biologia määrittelee sukupuolen sukusolujen ja/tai kantamisen/lisääntymisinvestoinnin tasolla, eikä ylipäänsä ole kiinnostunut mistään muusta kuin tuosta alhaisesta tasostaan. Sitten juuri siksi kun se on niin reduktiivista, siitä löytyy esimerkkejä joilla on selvä matemaattinen rakenne, ja jotka voivat käydä vertauksista pohjimmiltaan matemaattiselle alalle, kuten tietojenkäsittelytieteeseen.

Siksi vastaukseni feministioppilaalleni olisi, että käyttämäni esimerkki on tarkoituksellisen rajattu ja siksi hyvin epätäydellinen. Ilmiöt joihin se viittaa eivät kata sukupuolta ylemmän tason selityksissä millään tavalla, ja määrittelevät niitä varsinkin ihmisyhteisössä silloin tällöin luvattoman heikosti. Niihin ei pitäisi lukea liikaa, ihan tieteellisistäkään syistä: biologia ei nykyään onneksi yritäkään väittää mitään kokemuksesta, identiteetistä tai rooleista, vaan ne ovat ylempitasoisen tieteen kuten psykologian, semiotiikan tai vaikkapa sosiologian heiniä. Tässä on työnjako, jota tiedollisen nöyryydenkin nimessä joudun noudattamaan.

Ja jos joku kehtaisi hörähtää minun tai oppilaani ajatuksista johonkin persu- tai telaketjuhenkeen, huomauttaisin koko joukolle luentosalissa, että pidän noiden korkeamman tason teorioiden tuntemusta hyödyllisenä myös lumapuolella. Etenkin koska itse olen kokenut saaneeni tuosta tietämyksestä paljon hyötyä, ja koska minunkaan sukupuolikokemukseni ei aina ole ollut aivan niin yksioikoinen. Tukisin feministiä joukossa ajatuksella siitä, että lumanörtin ei tarvitse olla vain sitä enkä minä ole, huomauttaisin että toden totta pelkistävät sukupuoliroolit ylätasolla itse asiassa aiheuttavat ongelmaa joka tiedeyhteisössä minustakin pitäisi korjata, ja samalla ottaisin tilani takaisin myös.

Meinaan juuri tuollaisessa keskustelussa asetetaan kaksi hyvin erilaista epistemologiaa vastakkain, ja aidosti uskon että omani, klassisempi, on parempi. Minun tiedonkuvani paitsi sallii myös vaatii aktiivista, riipivää keskustelua, jotta totuus paljastuisi. Uskoen että totuus on aina kaikille hyödyksi, keskimäärin, ja pitkällä aikavälillä. Se lipeää usein korkeamielisistä ideaaleistaan, totta kai, mutta parhaimmillaan se on jo vuosisadan tai pari tuottanut sitä mitä feministi kutsuu "turvalliseksi tilaksi"—paikkaa jossa ei tarvitse pelätä, ja jossa voidaan tuulettaa epämiellyttäviä juttuja kaikkien hyväksymässä raamissa, seuraamuksitta. Tajuaa toinen eli ei, tuossa tilanteessa mulla on vähän sellainen "valkoisen miehen taakka" että pitää panna käpälä maahan ja puolustaa periaatettaan, kun pitää sitä oikeana; sen ollen valistusliberaali.

Eli mun tehtävä tuossa tilanteessa ei ole myötäillä toista, vaan ottaa se tosissaan, ja kertoa miten mä näen asiat. Tehdä tilan puolesta selväksi, että mun hallitsemassa tilassa ei syrjitä, kiusata tai turhaan edes letkautella, mutta samalla hyväntahtoisen neekerihuumorinkin tulee saada lentää, kaikki pitää saada kyseenalaistaa, ja lähtökohtaisesti vastuu omista tunteista on kullakin aikuisella ihmisellä itsellään. Mun tehtävä tuossa on jopa opettaa tuota vanhempaa tilakäsitystä niille, jotka eivät välttämättä ole niin paljon törmänneet siihen—ja ennen kaikkea näyttää että tuokin tila on tavallaan äärimmäisen turvallinen. Jopa turvallisempi kuin se puhtaasti emotionaalinen, ailahteleva, tuuliviiritila josta nykyään usein puhutaan "safe spacena".

Koska ei klassisen liberaalin käsitys oikeudesta ja sopivuudesta ailahtele järin paljon, vaan se lähenee kiveenhakattua lakiyhteisöä, johon voi luottaakin sitten. Se voi olla vittumainen aktivistille siinä kuinka konservatiivinen ja järkähtämätön se on, mutta samalla, ei se kyllä kysy väkivallassa yhtään sitä mitä kaverisuhteita on ollut tai miten jutut voidaan spinnata jossain sosiaalisessa mediassa. Siitä voi tietää tavallista varmemmin mitä se on, ja kaikki jo tietää ettei mun/meikäläisten raami tuomitse puoliakaan siitä mitä muut yhteiskuntaeettiset raamit jotka on edes lähes yhtä vakaita. Kaikki menee, kunhan suostumus vain. Jos tuollaisen raamin saisi jotenkin opetettua toiselle huutajalle, vähän luulen että sekin rauhottus ja sais aikaan sitten enemmän niiden epäkohtien kanssa joista alunperinkin oli eniten huolissaan.

2020-09-06

The quaternionic sandwitch product, redone

In the previous post I just ranted about how quaternions are used for rotational calculus. I don't usually rewrite history, but this time I thought it would be better to just do the whole thing over again. With some understanding and insight for a change.

Why quaternions?

Quaternions are a generalization of complex numbers. William Rowan Hamilton discovered them after a prolonged quest for a three dimensional number system which both contains the reals and the complex numbers in a natural way. It turns out such a generalization can be made in multiple ways, but they all lose some properties of the precursors which we would rather have remain, and they also necessarily assume a form which can be unexpected. The details of a given generalization depend on which properties we're willing to leave behind.

In Hamilton's particular case what had to give were commutativity of multiplication, and the wish that the generalization could carry over to arbirary dimension. His system, the quaternions, remain a real normed division algebra and a true generalization of the complex plane, but it only works out in dimension four, and its multiplicative operator isn't commutative. After that the algebra can indeed encode three dimensional rotations and embed the two and one dimensional ones of the complex circle and the real line. Yet it has to do so in a somewhat more intricate sense.

A real division algebra has to incorporate at least one natural representation of the real line, and then every element of the extended algebra has to commute in multiplication by a real. Since the quaternions are meant to be an extension of the complex numbers as well, there has to be complex plane embedded within them which is commutative within itself, above and beyond the reals within it being commutative with everything even wider at the same time. Via some easy steps, we are led to all of these algebras being normed, so that whatever happens after renorming any vector, somehow is rotational: algebras are vector spaces de minimis, in these kinds of algebras the conjugate product a*a yields natural bilinear (symmetrical quadratic) norm, and so we get a natural definition of a sphere: all those

Rotations in three and four dimensions

When you rotate things in 3D, Euler's theorem tells us every rotation leaves at least one direction completely fixed, and that the rotation can be described as a 2D rotation in the plane perpendicular to that axis, so that in that invariant plane, stuff moves around but doesn't leave the plane. Thus we need three parameters to describe a general rotation: two for a direction and one for an angle around it. It is a nice parametrization because orientation entanglement is contained within the direction, with angle just being cyclic, and behaving exactly like multiplication on the complex unit circle.

The only trouble comes from the identity rotation and those near it, because then then axis of rotation can suddenly be chosen freely, or at least sufficienly close to freely that many mechanistic formulae for calculating the relevant numeric parameters become unstable. However this is a problem only for the inverse problem where you try to figure out the parameters; unlike with many other representations of the rotational algebra, there is no numerical trouble in doing forward calculations with compound rotations from there on. We will see the quaternions inherit his property.

In four dimensions rotations factor a bit differently. Now they always have two invariant planes, which are orthogonal, and two separate plane rotations in them, parametrized by two different cyclic angles. If both of the angles are zero, both of the planes can be chosen freely, and we have the identity rotation. If just one of the angles is zero, we're reduced to a plane rotation, except that the other invariant plane becomes fixed, and so we don't have just a single fixed axis but a whole 2D space of them. Any orthogonal projection along that space one dimension down is a 2D rotation with one fixed point.

In general the angles of rotation will be different. If you look at vectors off the invariant planes, they will rotate an amount intermediate between the extremes delimited by the rotations in the planes.

Finally, there is a special case within the more general 4D rotations where both invariant planes rotate: the case where the angle of rotation is exactly equal in both planes (upto sign). These are called isoclinic rotations, and they lead to a kind of singularity which doesn't have an analogue in 3D: the synchronicity of rotation doesn't affect the general nature of the entirety, but it suddenly means that there is extra symmetry present which makes it possible to choose the invariant planes we use to analyze and parametrize the problem rather freely. These things don't have just "the two" invariant planes: but a broad continuum of them. Because of that, they compose much more nicely than arbitrary 4D rotations: if you have two isoclinic rotations, you can choose to represent them so that their invariant planes jointly align. Then the composition is just the 2D composition of two circular rotations, side by side, and since those are commutative, we just discovered isoclinic 4D rotations commute in general.

Above we talked about equal angles up to sign. More precisely then isoclinic rotations come in two separate varieties: left and right isoclinic ones. In the left variant, the signs of the rotations in the two invariant planes are the same, in the right variant they are opposite. These families are represented in quaternionic algebra by left and right multiplication, respectively. They overlap because the reals have to commute with everything and the complex numbers have to commute within themselves, but otherwise the families are separate. It's also nice to note that within left isoclinic rotations, since the angles over the invariant planes are equal and vectors off the invariant planes rotate by an amount intermediate between the two now equal amounts, every pair of orthogonal planes can equally be chosen as the invariant ones. The same goes for right isoclinic ones.

Rotations using quaternions

Just like the complex numbers on the unit circle rotate the complex plane in multiplication, the unit quaternions rotate the quaternionic space. However there is a complication: each of the unit quaternions represents an isoclinic 4D rotation with two axes. This is not a problem when we deal with 2D rotations, because then one of the invariant planes can just be projected away. This is how the embedding of complex numbers into quaternions works: the (1,i)-plane is an invariant plane of all of the rotations its unit circle gives rise to, even in 4D.

3D rotations aren't as easy. If you project away just one dimension from a family of isoclinic rotations with varying angle, you won't have a closed closed system in 3D. Only the identity rotation and the reflection will map the 3D subspace into itself. So we have to be smarter than that.

The solution is to somehow get rid of the second axis of rotation, that is to say, get its angle of rotation to go to zero. Happily this is not too painful. From the general to the particular, we first multiply our source quaternion from the left by a quaternion a, leading to some scaling based on the norms, and some left isoclinic rotation. Then we go ahead and also multiply by a from the right. Now the norm will have squared and the rotation doubled in one invariant plane, but the rotation in the other invariant plane will actually cancel, because we now did matched right isoclinic rotation as well. This is promising!

In order to undo the remaining damage, instead of a, we multiply from the right by the inverse of a. This nulls out the effects on norm and also reverses which plane was kept in place, but otherwise we now have a rotation which is no longer isoclinic, but has only one axis in 4D like 3D rotations have. Penultimately we project down suitably to get just a 3D rotation. We note that the doubling of angle never went away in our utility plane, so we start working in half angles. And finally for optimization's sake we note that as long as we always keep our a (now called a "rotor") to unit norm, the inverse of it is just the conjugate. We're left with the rotation formula x->axa*, minding that a should now be calculated from half angles, and usually in algorithmic implementations keeping track of just a and negating its three imaginary components on-demand. Nice!

The algebraic–topological tradeoff

We did end up glossing over a major mathematical detail in the above, though. Quaternionic representations of 3d rotation are not single but double valued. When we do stuff this way, both a certain rotor and its inverse leads to the same rotation. In 3D geometry this is reflected by the fact that you can do a given rotation around a given axis either as such, or by reversing the axis and rotating the other way around. It's reflected in topology by the fact covering the 3D rotation group by the 4D one necessarily leads to a double cover. In the arithmetic that we traced through above, this is where the double sided conjugation and the need for half-angles ultimately comes from as well, and proves necessary.

Depending on your viewpoint, it can be a strength or a weakness. The upside and the reason why we go to quaternions is that working with the double cover, which admits a neat algebraic, topological and even differential description, makes the system more well behaved than the alternatives. In particular, not only one-off rotations or limited series of them can be described neatly without running into singularities such as gimbal lock, but the whole family of 3D rigid rotations can be handled continuously and without much attention to detail. Interpolation between orientations works seamlessly, and is easy to reason through using ordinary arithmetic. If you actually need to do calculus, you can, it's easy, there too you're shielded from the common mistakes by the machinery, and you don't have to deal with the extraneous degrees of freedom the rotation matrix formalization forces you to.

On the downside, what you're really dealing with here is a spinorial representation of rotation. It's notoriously hard to understand, with one of its inventors even calling it "kind of a square root of geometry", mysterious to even him and left as a difficult exercise to posterity. What the polarity of the quaternionic representation chosen means is not at all intuitive, or even clear, and so even if it doesn't much matter when you deal with just rotations, sometimes the choice might bite you in the ankle nevertheless. As it does for instance when combining two trajectories of rotations somehow, whose choice of polarity differs. All of the black magic here furthermore makes it a bit too easy to treat quaternions and their rotors as a black block, which whilst it's much more well-behaved in that role than many others, can still make a programmer succumb. If not often, then harder still when it happens.

The practical algorithmic tradeoff

Some people think quaternions are more efficient to use than matrices, in rotation. This is hogwash. Of course it's almost impossible to say if a well optimized implementation of a rotational operator really ought to be called quaternionic, vectorial or whatnot. But if you implement your rotations using the generic library operations for each type, almost everything will be faster over vector and matrix operations than the corresponding quaternionic ones. Not least because we have highly optimized linear algebra libraries available, starting with the BLAS derived series.

One could then argue that quaternions take less space to represent a rotation, and this much is true. However, in most practical applications the number of rotations represented is low compared to the clouds and oomphs of vectors they operate on. Which in the sandwich product formalism then have to be expanded by 4/3, each, to be operated on.

And so on. The quaternionic framework cannot in full conscience said to be conducive to numerical efficiency, at least in rotation only. So why do we go there in the first place? I'd argue this three-fold:

  • Programmer productivity. Quaternions can be used as a black box which minimizes mistakes and debugging time. This follows from their eager topological, algebraic and geometrical properties.
  • Numerical stability and computational efficiency in assuring it. Quaternionic representations of rotation can be continuously renormalized with little extra noise to actually represent a true rotation, instead of a linear transformation somewhat off and not as spherically symmetric.
  • Extensibility of analysis within the problem domain. Because of their inbuilt structure, in more involved niches, quaternions lend themselves to straight forward analysis and even calculus.

So, in order, if you do it in quaternions, you can just treat them as a black box. A rotator here, and another one here. You won't go amiss, unlike you can with matrix representations. You won't experience gimbal lock when you track whole trajectories of orientations, and you won't as often be presented with surprises if you try to interpolate between positions. In particular, position interpolalation within the quaternionic framework is expectable, stable and as a particular case, for once rather efficient.

Numerical stability is especially important nowadays in orientation integration and sensor fusion applications; those earlier called dead reckoning, and implemented by intertial navigation systems. This is because integrating local sensor values into a global picture is highly sensitive to noise and bias, leading to drift in the integrated global position. Now, given that every mobile phone with gyros, magnetic and linear acceleration sensors nowadays try to solve this problem on lower tier hardware, just in order to follow your position on Google Maps, we don't need any extra sources of noise or bias/drift. The quaternionic representation helps you keep both to a minimum, and isn't too difficult to do in real time, unlike the provably optimal representations which force you to do a time-variant singular value decomposition. The benefit over vectorial representations is basically the same in rotational problems as it was in using discrete cosine transform in image coding. The gap is basically the same as in not going with the full Karhunen-Loeve transformation, which is no much.

My third point is much murkier. Very few people actually go with Hamiltonian calculus in real geometrical or physical problems, even if it could be useful. It is not well known, yet it isn't the most general or useful in general either. But if you start with just rotations, it is an easy continuation to even do general calculus, which then can't be done in more general frameworks. At least as simply or mechanically. For example it is exceedingly difficult to derive a self-contained algebraic theory of "jerk" as the third time derivative of place, above acceleration, over the spherical domain. That analog of linear jerk in a moving vehicle, which you mostly feels twist-wise and have to consciously counteract after your more basic instincts have nulled in their servos over the second linear partial derivatives. Perhaps even the first two rotational ones. The one which actually induces what is called "vertigo" since we can't as easily expect it.

Summa summarum

Quaternions represent rotations rather well. They do so in a fashion which keeps the rotational algebra easy o deal with, and can be mostly be deal with machinistically. Yet you should only utilize the representation when you're mostly dealing with rotations and their peculiarities. Immediately after you've done so, you should go back into a vectorial formalism, and rotate your usual clould of thousands or millions of vectors using other primitives and libraries. That way there's a proper division of labour in place: quaternionic rotation for productivity and numerical stability on the slow path, and counterwise fast, vectorial arithmetic on the fast and parallelizable GPU side. Reason with quaternions, while implementing mass 3D arithmetic vectorially.