Bloggtittelen er blitt brutalt overfalt av bokstavrim, med det resultat at metaforbruken halter. Viten er ikke noe endelig mål i det fjerne, men snarere noe man plukker opp underveis. Jeg vil vise funn jeg snubler over. Foreløpig kun som små stubber, med mulighet for senere utbroderinger. Som et slags manifest velger jeg, i all beskjedenhet, å la Erwin Schrödingers unnskyldning skissere rammene for bloggen. Kommentarer er alltid velkomne.
- Harald

tirsdag 8. juli 2008

Mathematica er 20 år

Det er 20 år siden Mathematica 1.0 ble lansert. Wolfram Research har lagt ut interessante tilbakeblikk i Stephen Wolframs og Mathematicas utvikling.

mandag 17. mars 2008

Hva har du endret mening om? Hvorfor?

"What have you changed your mind about? Why?"

Edge stilte dette spørsmålet til en rekke fremtredende anglo-amerikanske tenkere og mottok 165 bidrag. Tankevekkende påskelektyre.

I vår hjemlige andedam har nylig Magnhild Meltveit Kleppa endret syn på homofiles rett til å gifte seg. Det viser karakterstyrke å kunne endre standpunkt i viktige prinsipielle spørsmål - ikke vanlig kost blant politikere. I politikken vil endring av standpunkt ofte tolkes negativt, enten som en svakhet - upålitelig vingling - eller som et uttrykk for kalkulert opportunisme. I vitenskapen er man i utgangspunktet forpliktet til å vurdere og revurdere sin tenkning i lys av fakta. Likevel sitter erkjennelsmessige omveltninger langt inne.

Edge sitt panel har en stor overvekt av vitenskapsfolk. Her er noen smakebiter:

Freeman Dyson (teoretisk fysiker, selverklært klimakjetter) forteller hvorfor han nylig har endret syn på betydningen av bombingen av Hiroshima og Nagasaki:
"I changed my mind about an important historical question: did the nuclear bombings of Hiroshima and Nagasaki bring World War Two to an end? Until this year I used to say, perhaps. Now, because of new facts, I say no." Les resten.

Helena Cronin (London School of Economics) har endret syn på årsaken til kjønnsforskjeller. Interessant nok er kjernen i argumentet basert på et statistisk resonnement:
"Talents, tastes and temperaments play fundamental roles. But they alone don't fully explain the differences. It is a fourth T that most decisively shapes the distinctive structure of male — female differences. That T is Tails — the tails of these statistical distributions. Females are much of a muchness, clustering round the mean. But, among males, the variance — the difference between the most and the least, the best and the worst — can be vast. So males are almost bound to be over-represented both at the bottom and at the top. I think of this as 'more dumbbells but more Nobels'." Les resten.


Howard Gardner (psykolog fra Harvard) tar et oppjør med sitt forbilde Jean Piaget:
"The giant at the time — the middle 1960s — was Jean Piaget. Though I met and interviewed him a few times, Piaget really functioned for me as a paragon. [...]I wrote my first books about Piaget; saw myself as carrying on the Piagetian tradition in my own studies; and even defended Piaget vigorously in print against those who would critique his approach and claims.[...] Yet, now forty years later, I have come to realize that the bulk of my scholarly career has been a critique of the principal claims that Piaget put forth." Les videre.

Oliver Morton (redaktør i Nature) tror ikke lenger på verdien av bemannede romferder:
"I have, falteringly and with various intermediary about-faces and caveats, changed my mind about human spaceflight. [...] And the crucial idea (crucial to me) that human exploration of Mars might answer great questions about life in the universe no longer seems as plausible or as likely to pay off in my lifetime as once it did. I increasingly think that life in a Martian deep biosphere if there is any, will be related to earth life and teach us relatively little that's new.[...] A world with a spartan $100 billion moonbase but no ability to measure spectra and lightcurves from earthlike planets around distant stars is not the world for me." Les videre.

Stewart Brand (grunnlegger av The Whole Earth Catalog) har insett at gode gamle ting er mye dårligere enn gode nye ting:
"[...] I bought a sequence of wooden sailboats. Their gaff rigs couldn't sail to windward. Their leaky wood hulls and decks were a maintenance nightmare. I learned that the fiberglass hulls we'd all sneered at were superior in every way to wood.[...] The message finally got through. Good old stuff sucks. Sticking with the fine old whatevers is like wearing 100% cotton in the mountains; it's just stupid.[...] Give me 100% not-cotton clothing, genetically modified food (from a farmers' market, preferably), this-year's laptop, cutting-edge dentistry and drugs." Les videre.

fredag 8. februar 2008

Jeg, en klimaskeptiker?

Jeg var nylig på et seminar i regi av statistikermiljøene i Oslo, hvor Kjell Stordahl presenterte kritikk av IPCCs metoder. Spesielt kritiserte han manglende vurdering av usikkerheten i prognosene - eller snarere manglende vurdering av usikkerheten i den projiserte klimautviklingen under ulike forutsetninger om CO2-utslipp (ettersom klimaforskerne ikke vil innrømme at de lager prognoser).

Stordahls presentasjon finnes i Tilfeldig Gang, september 2007 , s. 5-11 (utgitt av Norsk Statistisk Forening).

Hva skal til for å framstå som en troverdig 'klimaskeptiker'?

Det er også en debatt på forskning.no:
Er usikkerheten i klimaprognosene undervurdert? Kjell Stordahl
Misforståelser om klimaprognoser Knut H Alfsen og Helge Drange
Klimaprognoser og usikkerhet Kjell Stordahl
Mer om klimaprogoser og usikkerhet Knut H Alfsen og Helge Drange
Fortsatt uklarhet rundt klimaprognoser Kjell Stordahl

onsdag 6. februar 2008

Løgner, fordømte løgner og forvirret statistikk...


Mer å tygge på for den som tråler etter statistisk signifikans:

Toward Evidence Based Medical Statistics: The P Value Fallacy (pdf, kommentar), Steven N Goodman (Johns Hopkins University)

Goodman serverer en usedvanlig velskrevet og grundig presentasjon av logiske feilslutninger forbundet med å trekke konklusjoner fra statistiske data. Her er et knippe smakebiter:

P-verdien:
The P value is defined as the probability, under the assumption of no effect or no difference (the null hypothesis), of obtaining a result equal to or more extreme than what was actually observed. [...] It is worth noting one widely prevalent and particularly unfortunate misinterpretation of the P value. Most researchers and readers think that a P value of 0.05 means that the null hypothesis has a probability of only 5%.


Et konkret eksempel:

A recent randomized, controlled trial of hydrocortisone treatment for the chronic fatigue syndrome showed a treatment effect that neared the threshold for statistical significance, P = 0.06. The discussion section began, “. . . hydrocortisone treatment was associated with an improvement in symptoms . . . This is the first such study . . . to demonstrate improvement with a drug treatment of [the chronic fatigue syndrome]” . What is remarkable about this paper is how unremarkable it is. [...] a conclusion is stated before the actual discussion, as though it is derived directly from the results, a mere linguistic transformation of P = 0.06. This is a natural consequence of a statistical method that has almost eliminated our ability to distinguish between statistical results and scientific conclusions. We will see how this is a natural outgrowth of the “P value fallacy.”

Om hypotesetesting:
Hypothesis tests are equivalent to a system of justice that is not concerned with which individual defendant is found guilty or innocent (that is, “whether each separate hypothesis is true or false”) but tries instead to control the overall number of incorrect verdicts (that is, “in the long run of experience, we shall not often be wrong”). Controlling mistakes in the long run is a laudable goal, but just as our sense of justice demands that individual persons be correctly judged, scientific intuition says that we should try to draw the proper conclusions from individual studies.

The hypothesis test approach offered scientists a Faustian bargain—a seemingly automatic way to limit the number of mistaken conclusions in the long run, but only by abandoning the ability to measure evidence and assess truth from a single experiment. It is doubtful that hypothesis tests would have achieved their current degree of acceptance if something had not been added that let scientists mistakenly think they could avoid that trade-off. That something turned out to be Fisher’s “P value,” much to the dismay of Fisher, Neyman, Pearson, and many experts on statistical inference who followed.

Om fallgruven:
The idea that the P value can play both of these roles is based on a fallacy: that an event can be viewed simultaneously both from a long-run and a short-run perspective. In the long-run perspective, which is error-based and deductive, we group the observed result together with other outcomes that might have occurred in hypothetical repetitions of the experiment. In the “short run” perspective, which is evidential and inductive, we try to evaluate the meaning of the observed result from a single experiment. If we could combine these perspectives, it would mean that inductive ends (drawing scientific conclusions) could be served with purely deductive methods (objective probability calculations).

Hvorfor er det blitt slik?
It is a complex story, but the basic theme is that therapeutic reformers in academic medicine and in government, along with medical researchers and journal editors, found it enormously useful to have a quantitative methodology that ostensibly generated conclusions independent of the persons performing the experiment. It was believed that because the methods were “objective,” they necessarily produced reliable, “scientific” conclusions that could serve as the bases for therapeutic decisions and government policy. This method thus facilitated a subtle change in the balance of medical authority from those with knowledge of the biological basis of medicine toward those with knowledge of quantitative methods, or toward the quantitative results alone, as though the numbers somehow spoke for themselves.

Tilbake til eksemplet:
The statement that there was a relation between hydrocortisone treatment and improvement of the chronic fatigue syndrome was a knowledge claim, an inductive inference. To make such a claim, a bridge must be constructed between “P = 0.06” and “treatment was associated with improvement in symptoms.” That bridge consists of everything that the authors put into the latter part of their discussion: the magnitude of the change (small), the failure to change other end points, the absence of supporting studies, and the weak support for the proposed biological mechanism. Ideally, all of this other information should have been combined with the modest statistical evidence for the main end point to generate a conclusion about the likely presence or absence of a true hydrocortisone effect. The authors did recommend against the use of the treatment, primarily because the risk for adrenal suppression could outweigh the small beneficial effect, but the claim for the benefit of hydrocortisone remained.

Hva kan gjøres?
Some of the strongest arguments in support of standard statistical methods is that they are a great improvement over the chaos that preceded them and that they have proved enormously useful in practice. Both of these are true, in part because statisticians, armed with an understanding of the limitations of traditional methods, interpret quantitative results, especially P values, very differently from how most nonstatisticians do. But in a world where medical researchers have access to increasingly sophisticated statistical software, the statistical complexity of published research is increasing, and more clinical care is being driven by the empirical evidence base, a deeper understanding of statistics has become too important to leave only to statisticians.

The second article will explore the use of Bayes factor—the Bayesian measure of evidence— and show how this approach can change not only the numbers we report but, more important, how we think about them.

Toward Evidence-Based Medical Statistics. 2: The Bayes Factor


Introduction to Bayesian methods I: measuring the strength of evidence
(Steven N Goodman, Clinical Trials 2005; 2: 282-290)

Bayesian inference is a formal method to combine evidence external to a study, represented by a prior probability curve, with the evidence generated by the study, represented by a likelihood function. Because Bayes theorem provides a proper way to measure and to combine study evidence, Bayesian methods can be viewed as a calculus of evidence, not just belief. In this introduction, we explore the properties and consequences of using the Bayesian measure of evidence, the Bayes factor (in its simplest form, the likelihood ratio). The Bayes factor compares the relative support given to two hypotheses by the data, in contrast to the P-value, which is calculated with reference only to the null hypothesis. This comparative property of the Bayes factor, combined with the need to explicitly predefine the alternative hypothesis, produces a different assessment of the strength of evidence against the null hypothesis than does the P-value, and it gives Bayesian procedures attractive frequency properties. However, the most important contribution of Bayesian methods is the way in which they affect both who participates in a scientific dialogue, and what is discussed. With the emphasis moved from "error rates" to evidence, content experts have an opportunity for their input to be meaningfully incorporated, making it easier for regulatory decisions to be made correctly.


mandag 14. januar 2008

Mot et helhetlig naturvitenskapelig pensum?

Jeg fikk nylig en epost med informasjon om kurset "Quantitative biology or Mathematics enters biology". Et kurs i tråd med denne bloggens credo, men med ambisiøse mål for et kurs på bare to dager. Kurset er klart inspirert av artikkelen "Back to the future: education for systems-level biologists" av David Botstein og Ned Wingreen, begge professorer ved Princeton University.
We describe a graduate course in quantitative biology that is based on original path-breaking papers in diverse areas of biology; each of these papers depends on quantitative reasoning and theory as well as experiment. Close reading and discussion of these papers allows students with backgrounds in physics, computational sciences or biology to learn essential ideas and to communicate in the languages of disciplines other than their own.

[..]
It is clear that the future of biology will require combinations of skills that are rarely found in individual scientists today. The existing educational system teaches biologists very few mathematical or computational skills, and gives scientists with backgrounds in physics and informatics comparably limited exposure to even the most basic biological phenomena and principles. The problem begins early in undergraduate education, and by the doctoral level there are severe interdisciplinary communication difficulties that are encountered by even the most motivated of collaborators.

Jeg liker veldig godt kursets vekt på konkrete problemstillinger - i form av et knippe banebrytende artikler - som et utgangspunkt for læring.

Kursene omtalt over er mest et forsøk på å lappe sammen sprikende faglige bakgrunner - i form av ulike begreper, tilnærminger, og standard verktøy - sprik som er resultatet av for tidlig spesialisering innen naturvitenskapene. Princeton har tatt skrittet fullt ut med et mer fundamentalt grep, nemlig et helhetlig naturvitenskaplig pensum for lavere grad:
Integrated Science is a revolutionary new introductory science curriculum developed at Princeton, intended for students considering a career in science. By breaking down traditional disciplinary barriers, a series of courses taken in the freshman and sophomore years provides students with first-rate preparation for a major in any of the core scientific disciplines, and in such a way that helps retain the connections to the other disciplines. The curriculum is founded on the expectation that much of the most important science of the future, though based on the classical disciplines, will lie in areas that span two or more of them.