Mayo’s 2014 Rutgers talk

Rutgers, Department of Statistics and Biostatistics Seminar
Deborah G. Mayo
December 3, 2014

Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Mayo’s 2014 Rutgers slides

Abstract: Getting beyond today’s most pressing controversies revolving around statistical methods, I argue, requires scrutinizing their underlying statistical philosophies.Two main philosophies about the roles of probability in statistical inference are probabilism and performance (in the long-run). The first assumes that we need a method of assigning probabilities to hypotheses; the second assumes that the main function of statistical method is to control long-run performance. I offer a third goal: controlling and evaluating the probativeness of methods. An inductive inference, in this conception, takes the form of inferring hypotheses to the extent that they have been well or severely tested. A report of poorly tested claims must also be part of an adequate inference. I develop a statistical philosophy in which error probabilities of methods may be used to evaluate and control the stringency or severity of tests. I then show how the “severe testing” philosophy clarifies and avoids familiar criticisms and abuses of significance tests and cognate methods (e.g., confidence intervals). Severity may be threatened in three main ways: fallacies of statistical tests, unwarranted links between statistical and substantive claims, and violations of model assumptions.

 

Categories: Uncategorized | Leave a comment

More ironies from the replicationistas: Bet on whether you/they will replicate a statistically significant result

For a group of researchers concerned wit how the reward structure can bias results of significance tests, this has to be a joke or massively ironic:

Second Prediction Market Project for the Reproducibility of Psychological Science

The second prediction market project for the reproducibility project will soon be up and running – please participate!

There will be around 25 prediction markets, each representing a particular study that is currently being replicated. Each study (and thus market) can be summarized by a key hypothesis that is being tested, which you will get to bet on.

In each market that you participate, you will bet on a binary outcome: whether the effect in the replication study is in the same direction as the original study, and is statistically significant with a p-value smaller than 0.05.

Everybody is eligible to participate in the prediction markets: it is open to all members of the Open Science Collaboration discussion group – you do not need to be part of a replication for the Reproducibility Project. However, you cannot bet on your own replications.

Each study/market will have a prospectus with all available information so that you can make informed decisions.

The prediction markets are subsidized. All participants will get about $50 on their prediction account to trade with. How much money you make depends on how you bet on different hypotheses (on average participants will earn about $50 on a Mastercard (or the equivalent) gift card that can be used anywhere Mastercard is used).

The prediction markets will open on October 21, 2014 and close on November 4.

If you are willing to participate in the prediction markets, please send an email to Siri Isaksson by October 19 and we will set up an account for you. Before we open up the prediction markets, we will send you a short survey.

The prediction markets are run in collaboration with Consensus Point.

If you have any questions, please do not hesitate to email Siri Isaksson.

Categories: rejected posts | Leave a comment

Msc. kvetch: Are you still fully dressed under your clothes?

UnknownMen have a constitutional right to take pictures under women’s skirts. Yup. That’s what the Massachusetts courts have determine after one Michael Robertson was caught routinely taking pictures and videos up the skirts of women. It even has a name: upskirting.

The Supreme Judicial Court overruled a lower court decision that had upheld charges against Michael Robertson, who was arrested in August 2010 by transit police who set up a sting after getting reports that he was using his cellphone to take photos and video up female riders’ skirts and dresses.

Robertson had argued that it was his constitutional right to do so…..

“A female passenger on a MBTA trolley who is wearing a skirt, dress or the like covering these parts of her body is not a person who is ‘partially nude,’ no matter what is or is not underneath the skirt..”

Link is here.

But this is absurd: she IS partially nude under her clothing, even if she isn’t when you don’t look up her skirt! The picture Robertson took is not of her fully clothed.

People are fully clothed when the TSA conducts whole body scans in airports (a practice that’s largely ended), and yet the pictures would be of the person naked. If you can be partially naked when an instrument sees through your clothes, then you can be partially naked when a cell phone is held under your skirt. Do we really have to get philosophical about these terms…?

Meanwhile, they’re busy trying to pass a law against upskirting in MA. So are guys in Boston  busy getting all the constitutional shots they can in the mean time?

Chris Dearborn, a law professor at Suffolk University in Boston, said the court’s ruling served as a signal to the legislature to act fast, but also likely had Peeping Toms briefly “jumping for joy”. Link is here.

Jumping for joy at violating a woman’s privacy? What kind of Neanderthals are in Boston these days?

Categories: Misc Kvetching | 15 Comments

Msc Kvetch: Is “The Bayesian Kitchen” open for cookbook statistics?

I was sent a link to “The Bayesian Kitchen” http://www.bayesiancook.blogspot.fr/2014/02/blending-p-values-and-posterior.html and while I cannot tell for sure from from the one post, I’m afraid the kitchen might be open for cookbook statistics. It is suggested (in this post) that real science is all about “science wise” error rates (as opposed to it capturing some early exploratory efforts to weed out associations possibly worth following up on, as in genomics). Here were my comments:

False discovery rates are frequentist but they have very little to do with how well warranted a given hypothesis or model is with data. Imagine the particle physicists trying to estimate the relative frequency with which discoveries in science are false, and using that to evaluate the evidence they had for a Standard Model Higgs on July 4, 2012. What number would they use? What reference class? And why would such a relative frequency be the slightest bit relevant to evaluating the evidential warrant for the Higgs particle, nor for estimating its various properties, nor for the further testing that is now ongoing. Instead physicists use sigma levels (and associated p-values)! They show that the probability is .9999999… that they would have discerned the fact that background alone was responsible for generating the pattern of bumps they repeatedly found (in two labs). This is an error probability. It was the basis for inferring that the SM Higgs hypothesis had passed with high severity, and they then moved on to determining what magnitudes had passed with severity. That’s what science is about! Not cookbooks, not mindless screening (which might be fine for early explorations of gene associations, but don’t use that as your model for science in general).

The newly popular attempt to apply false discovery rates to “science wise error rates” is a hybrid fashion that (inadvertently) institutionalizes cookbook statistics: dichotomous “up-down” tests, the highly artificial point against point hypotheses (a null and some alternative of interest—never mind everything else), identifying statistical and substantive hypotheses, and the supposition that alpha and power can be used as a quasi-Bayesian likelihood ratio. And finally, to top it all off, by plucking from thin air the assignments of “priors” to the null and alternative—on the order of .9 and .1—this hybrid animal reports that more than 50% of results in science are false! I talk about this more on my blog errorstatistics.com

(for just one example:

http://errorstatistics.com/2013/11/09/beware-of-questionable-front-page-articles-warning-you-to-beware-of-questionable-front-page-articles-i/)

Categories: Misc Kvetching, Uncategorized | 4 Comments

Msc Kvetch: comment to Kristof at 5a.m.

My comment follows his article

Bridging the Moat Around Universities

By NICHOLAS KRISTOF

My Sunday column is about the unfortunate way America has marginalized university professors–and, perhaps sadder still, the way they have marginalized themselves from public debate. When I was a kid, the Kennedy administration had its “brain trust” of Harvard faculty members, and university professors were often vital public intellectuals who served off and on in government. That’s still true to some degree of economists, but not of most other Ph.D programs. And we’re all the losers for that.

I’ve noticed this particularly with social media. Some professors are terrific on Twitter, but they’re the exceptions. Most have terrific insights that they then proceed to bury in obscure journals or turgid books. And when professors do lead the way in trying to engage the public, their colleagues sometimes regard them with suspicion. Academia has also become inflexible about credentials, disdaining real-world experience. So McGeorge Bundy became professor of government at Harvard and then dean of the faculty (at age 34!) despite having only a B.A.–something that would be impossible today. Indeed, some professors would oppose Bill Clinton getting a tenured professorship in government today because of his lack of a Ph.D, even though he arguably understands government today better than any other American.

In criticizing the drift toward unintelligible academic writing, my column notes that some professors have submitted meaningless articles to academic journals, as experiments, only to see them published. If I’d had more space, I would have gone through the example of Alan Sokal of NYU, who in 1996 published an article in “Social Text” that he described as: “a pastiche of left-wing cant, fawning references, grandiose quotations, and outright nonsense.” Not only was it published, but after the article was unveiled as gibberish, Social Text’s editors said it didn’t much matter: “Its status as parody does not alter, substantially, our interest in the piece, itself, as a symptomatic document.”

I hope people don’t think my column is a denunciation of academia. On the contrary, I think universities are an incredible national resource, with really smart thinking on vital national issues. I want the world to get the benefit of that thinking, not see it hidden in academic cloisters. Your thoughts on this issue?

 

Deborah Mayo Virginia 12 hours ago

In my own field of philosophy, the truth is that the serious work, the work that advances the ideas and research, takes place in “obscure journals or turgid books”. There are plenty of areas where this research can be directly relevant to public issues–it’s the public who should be a bit more prepared to engage with the real scholarship. Take my specialization of philosophy of statistical inference in science. Science writers appear to be only interested in repeating the popular, sexy, alarmist themes (e.g., most research is wrong, statistical significance is bogus,science fails to self-correct). Rather than research what some more careful thinkers have shown, or engage the arguments behind contrasting statistical philosophies–those semi-turgid books–, these science writers call around to obtain superficial dramatic quips from the same cast of characters. They have a one-two recipe for producing apparently radical and popular articles this way. None of the issues ever get clarified this way. I suggest the public move closer to the professional work rather than the other way around. Popular is generally pablam, at least in the U.S.

Categories: Misc Kvetching | Leave a comment

Notes (from Feb 6*) on the irrelevant conjunction problem for Bayesian epistemologists (i)

images-2

 

* refers to our seminar: Phil6334

I’m putting these notes under “rejected posts” awaiting feedback and corrections.

Contemporary Bayesian epistemology in philosophy appeals to formal probability to capture informal notions of “confirmation”, “support”, “evidence”, and the like, but it seems to create problems for itself by not being scrupulous about identifying the probability space, set of events, etc. and not distinguishing between events and statistical hypotheses. There is usually a presumed reference to games of chance, but even there things can be altered greatly depending on the partition of events. Still, we try to keep to that. The goal is just to give a sense of that research program. (Previous posts on the tacking paradox: Oct. 25, 2013: “Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*” &  Oct 25.

(0) Simple Bayes Boost R: 

H is “confirmed” or supported by x if P(H|x) > P(H) (P(x |H)) > P(x).

H is disconfirmed (or undermined) by x if P(H|x) < P(H), (else x is confirmationally irrelevant to H).

Mayo: The error statistician would already get off at (0): probabilistic affirming the consequent is maximally unreliable, violating the minimal requirement for evidence. That could be altered with context-depedent information about how the data and hypotheses are arrived at, but this is not made explicit.

(a) Paradox of irrelevant conjunctions (‘tacking paradox’)

If x confirms H, then x also confirms (H & J), even if hypothesis J is just “tacked on” to H.[1]

Hawthorne and Fitelson (2004) define:

J is an irrelevant conjunct to H, with respect to evidence x just in case

P(x|H) = P(x|H & J).

(b) Example from earlier: For instance, x might be radioastronomic data in support of:

H: “the GTR deflection of light effect is 1.75″ and

J: “the radioactivity of the Fukushima water dumped in the Pacific ocean is within acceptable levels”.

(1) Bayesian (Confirmation) Conjunction: If x Bayesian confirms H, then x Bayesian-confirms:

(H & J), where P(xH & J ) = P(x|H) for any J consistent with H

(where J is an irrelevant conjunct to H, with respect to evidence x).

If you accept R, (1) goes through.

Mayo: We got off at (0) already.  Frankly I don’t  know why Bayesian epistemologists would allow adding an arbitrary statement or hypothesis not amongst those used in setting out priors. Maybe it’s assumed J is in there somehow (in the background K), but it seems open-ended, and they have not objected.

But let’s just talk about well-defined events in a probability experiment; and limit ourselves to talking about an event providing evidence of another event (e.g., making it more or less expected) in some sense. In one of Fitelson’s examples,   P(black|ace of spade) > P(black), so “black” confirms it’s an ace of spades (presumably in random drawings of card color from an ordinary deck)–despite “ace” being an “irrelevant conjunct” of sorts. Even so, if someone says data x (it’s a stock trader) is evidence it’s an inside trader in a hedge firm, I think it would be assumed that something had been done to probe the added conjuncts.

(2) Using simple B-boost R: (H & J) gets just as much of a boost by x as does H—measuring confirmation as a simple B-boost: R.

CR(H, x) = CR((HJ), x) for irrelevant conjunct J.

R: P(H|x)/P(H) (or equivalently, P(x |H)/P(x))

(a) They accept (1) but (2) is found counterintuitive (by many or most Bayesian epistemologists). But if you’ve defined confirmation as a B-boost, why run away from the implications? (A point Maher makes.) It seems they implicitly slide into thinking of what many of us want:

some kind of an assessment of how warranted or well-tested H is (with x and background).

(not merely a ratio which, even if we can get it, won’t mean anything in particular. It might be any old thing, 2, 22, even with H scarcely making x expected.).

(b) The intuitive objection according to Crupi and Tentori (2010) is this (e.g., p. 3): “In order to claim the same amount of positive support from x to a more committal theory “H and J” as from x to H alone, …adding J should contribute by raising further how strongly x is expected assuming H by itself. Otherwise, what would be the specific relevance of J?” (using my letters, emphasis added)

But the point is that it’s given that J is irrelevant. Now if one reports all the relevant information for the inference, one might report something like (H & J) makes x just as expected as H alone. Why not object to the “too was” confirmation (H & J) is getting when nothing has been done to probe J? I think the objection is, or should be, that nothing has been done to show J is the case rather than not. P(x |(H & J)) = P(x |(H & ~J)).

(c) Switch from R to LR: What Fitelson (Hawthorne and Fitelson 2004) do is employ, as a measure of the B-boost, what some call the likelihood ratio (LR).

CLR(H, x) = P(x | H)/P(x | ~H).

(3) Let x confirm H, then

(*) CLR(H, x) > CLR((HJ), x)

For J an irrelevant conjunct to H.

So even though x confirms (H & J) it doesn’t get as much as H does, at least if one uses LR. (It does get as much using R).

They see (*) as solving the irrelevant conjunct problem.

(4) Now let x disconfirm Q, and x confirm ~Q, then

(*) CLR(~Q, x) > CLR((~Q & J), x)

For J an irrelevant conjunct to Q: P(x|Q) = P(x|J & Q).

Crupi and Tentori (2010) notice an untoward consequence of using LR confirmation in the case of disconfirmation (substituting their Q for H above): If x disconfirms Q, then (Q & J) isn’t as badly disconfirmed as Q is, for J an irrelevant conjunct to Q. But this just follows from (*), doesn’t it? That is, from (*), we’d get (**) [possibly an equality goes somewhere.)

(**) CLR(Q, x) < CLR((Q & J), x).

This says that if x disconfirms Q, (Q & J) isn’t as badly disconfirmed as Q is. This they find counterintuitive.

But if (**) is counterintuitive, then so is (*).

(5) Why (**) makes sense if you wish to use LR:

The numerators in the LR calculations are the same:

P(x|Q & J) = P(x|Q) and P(x|H & J) = P(x|H) since in both cases J is an irrelevant conjunct.

But P(x|~(Q & J)) < P(x|~Q)

Since x disconfirms Q, x is more probable given ~Q than it is given (~Q v ~J). This explains why

(**) CLR(Q, x) < CLR((Q & J), x)

So if (**) is counterintuitive then so is (*).

(a) Example Qunready for college.

If x = high scores on a battery of college readiness tests, then x disconfirms Q and confirms ~Q.

What should J be? Suppose having one’s favorite number be an even number (rather than an odd number) is found irrelevant to scores.

(i) P(x|~(Q & J)) = P(high scores| either college ready or ~J)

(ii) P(x|~Q ) = P(high scores| college ready)

(ii) might be ~1 (as in the earlier discussion), while (i) considerable less.

The high scores can occur even among those whose favorite number is odd.This explains why

(**) CLR(Q, x) < CLR((Q & J), x)

In the case where x confirms H, it’s reversed

P(x |~(H & J)) > P(x |~H)

(b) Using one of Fitelson’s examples, but for ~Q:

e.g., Q: not-spades    x: black      J: ace

P(x |~Q) = 1.

P(x | Q)= 1/3

P(x |~(Q & J)) = 25/49

i.e., P(black|spade or not ace)=25/49

Note: CLR [(Q & J), x) = P(x |(Q & J))/P(x |~(Q & J))

Please share corrections, questions.

Previous slides are:

http://errorstatistics.com/2014/02/09/phil-6334-day-3-feb-6-2014/

http://errorstatistics.com/2014/01/31/phil-6334-day-2-slides/

REFERENCES:

Chalmers (1999). What Is This Thing Called Science, 3rd ed. Indianapolis; Cambridge: Hacking.

Crupi & Tentori (2010). Irrelevant Conjunction: Statement and Solution of a New Paradox, Phil Sci, 77, 1–13.

Hawthorne & Fitelson (2004). Re-Solving Irrelevant Conjunction with Probabilistic Independence, Phil Sci 71: 505–514.

Maher (2004). Bayesianism and Irrelevant Conjunction, Phil Sci 71: 515–520.

Musgrave (2010) “Critical Rationalism, Explanation, and Severe Tests,” in Error and Inference (D.Mayo & A. Spanos eds). CUP: 88-112.


[1] Chalmers and Musgrave say I should make more of how simply severity solves it, notably for distinguishing which pieces of a larger theory rightfully receive evidence, and a variety of “idle wheels” (Musgrave, p. 110.)

Categories: phil6334 rough drafts | 3 Comments

Winner of the January 2014 Palindrome Contest

images-5Winner of the January 2014 Palindrome Context

Karthik Durvasula
Visiting Assistant Professor in Phonology & Phonetics at Michigan State University

Palindrome: Test’s optimal? Agreed! Able to honor? O no! hot Elba deer gala. MIT-post set.

The requirement was: A palindrome with “optimal” and “Elba”.

BioI’m a Visiting Assistant Professor in Phonology & Phonetics at Michigan State University. My work primarily deals with probing people’s subconscious knowledge of (abstract) sound patterns. Recently, I have been working on auditory illusions that stem from the bias that such subconscious knowledge introduces.

Statement: “Trying to get a palindrome that was at least partially meaningful was fun and challenging. Plus I get an awesome book for my efforts. What more could a guy ask for! I also want to thank Mayo for being excellent about email correspondence, and answering my (sometimes silly) questions tirelessly.”

Book choice: EGEK 1996! :)
[i.e.,Mayo (1996): “Error and the Growth of Experimental Knowledge”]

CONGRATULATIONS! And thanks so much for your interest!

February contest: Elba plus deviate (deviation)

Categories: palindrome, rejected posts | 1 Comment

Sir Harold Jeffreys (tail area) howler: Sat night comedy (rejected post Jan 11, 2014)

IMG_0600You might not have thought there could be yet new material for 2014, but there is: for the first time Sir Harold Jeffreys himself is making an appearance, and his joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler in criticizing p-values. However, even comics try out “new material” with a dry run, say at a neighborhood “open mike night”. So I’m placing it here under rejected posts, knowing maybe 2 or at most 3 people will drop by. I will return with a spiffed up version at my regular gig next Saturday.

Harold Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred.” (1939, 316)

I say it’s funny, so to see why I’ll strive to give it a generous interpretation.

We can view p-values in terms of rejecting H0, as in the joke, as follows:There’s a test statistic D such that H0 is rejected if the observed D,i.e., d0 ,reaches or exceeds a cut-off d* where Pr(D > d*; H0) is very small, say .025. Equivalently, in terms of the p-value:
Reject H0 if Pr(D > d0H0) < .025.
The report might be “reject Hat level .025″.

Suppose we’d reject H0: The mean light deflection effect is 0, if we observe a 1.96 standard deviation difference (in one-sided Normal testing), reaching a p-value of .025. Were the observation been further into the rejection region, say 3 or 4 standard deviations, it too would have resulted in rejecting the null, and with an even smaller p-value. H0 “has not predicted” a 2, 3, 4, 5 etc. standard deviation difference. Why? Because differences that large are “far from” or improbable under the null. But wait a minute. What if we’ve only observed a 1 standard deviation difference (p-value = .16)? It is unfair to count it against the null that 1.96, 2, 3, 4 etc. standard deviation differences would have diverged seriously from the null, when we’ve only observed the 1 standard deviation difference. Yet the p-value tells you to compute Pr(D > 1; H0), which includes these more extreme outcomes. This is “a remarkable procedure” indeed! [i]

So much for making out the howler. The only problem is that significance tests do not do this, that is, they do not reject with, say, D = 1 because larger D values, further from might have occurred (but did not). D = 1 does not reach the cut-off, and does not lead to rejecting H0. Moreover, looking at the tail area makes it harder, not easier, to reject the null (although this isn’t the only function of the tail area): since it requires not merely that Pr(D = d0 ; H0 ) be small, but that Pr(D > d0 ; H0 ) be small. And this is well justified because when this probability is not small, you should not regard it as evidence of discrepancy from the null. Before getting to this, a few comments:

1.The joke talks about outcomes the null does not predict–just what we wouldn’t know without an assumed test statistic, but the tail area consideration arises in Fisherian tests in order to determine what outcomes H0 “has not predicted”. That is, it arises to identify a sensible test statistic D (I’ll return to N-P tests in a moment).

In familiar scientific tests, we know the outcomes that are further away from a given hypothesis in the direction of interest, e.g., the more patients show side effects after taking drug Z, the less indicative it is benign, not the other way around. But that’s to assume the equivalent of a test statistic. In Fisher’s set-up, one needs to identify a suitable measure of closeness, fit, or directional departure. Any particular outcome can be very improbable in some respect. Improbability of outcomes (under H0) should not indicate discrepancy from H0 if even less probable outcomes would occur under discrepancies from H0. (Note: To avoid confusion, I always use “discrepancy” to refer to the parameter values used in describing the underlying data generation; values of D are “differences”.)

2. N-P tests and tail areas: Now N-P tests do not consider “tail areas” explicitly, but they fall out of the desiderata of good tests and sensible test statistics. N-P tests were developed to provide the tests that Fisher used with a rationale by making explicit alternatives of interest—even if just in terms of directions of departure.

In order to determine the appropriate test and compare alternative tests “Neyman and I introduced the notions of the class of admissible hypotheses and the power function of a test. The class of admissible alternatives is formally related to the direction of deviations—changes in mean, changes in variability, departure from linear regression, existence of interactions, or what you will.” (Pearson 1955, 207)

Under N-P test criteria, tests should rarely reject a null erroneously, and as discrepancies from the null increase, the probability of signaling discordance from the null should increase. In addition to ensuring Pr(D < d*; H0) is high, one wants Pr(D > d*; H’: μ0 + γ) to increase as γ increases.  Any sensible distance measure D must track discrepancies from H0.  If you’re going to reason, the larger the D value, the worse the fit with H0, then observed differences must occur because of the falsity of H0 (in this connection consider Kadane’s howler).

3. But Fisher, strictly speaking, has only the null distribution, and an implicit interest in tests with sensitivity of a given type. To find out if H0 has or has not predicted observed results, we need a sensible distance measure.

Suppose I take an observed difference d0 as grounds to reject Hon account of its being improbable under H0, when in fact larger differences (larger D values) are more probable under H0. Then, as Fisher rightly notes, the improbability of the observed difference was a poor indication of underlying discrepancy. This fallacy would be revealed by looking at the tail area; whereas it is readily committed, Fisher notes, with accounts that only look at the improbability of the observed outcome d0 under H0.

4. Even if you have a sensible distance measure D (tracking the discrepancy relevant for the inference), and observe D = d, the improbability of d under H0 should not be indicative of a genuine discrepancy, if it’s rather easy to bring about differences even greater than observed, under H0. Equivalently, we want a high probability of inferring H0 when H0 is true. In my terms, considering Pr(D < d*; H0) is what’s needed to block rejecting the null and inferring H’ when you haven’t rejected it with severity. In order to say that we have “sincerely tried”, to use Popper’s expression, to reject H’ when it is false and H0 is correct, we need Pr(D < d*; H0) to be high.

5. Concluding remarks:

The rationale for the tail area is twofold: to get the right direction of departure, but also to ensure Pr(test T does not reject null; H0 ) is high.

If we don’t have a sensible distance measure D, then we don’t know which outcomes we should regard as those H0 does or does not predict. That’s why we look at the tail area associated with D. Neyman and Pearson make alternatives explicit in order to arrive at relevant test statistics. If we have a sensible D, then Jeffreys’ criticism is equally puzzling because considering the tail area does not make it easier to reject H0 but harder. Harder because it’s not enough that the outcome be improbable under the null, outcomes even greater must be improbable under the null. And it makes it a lot harder (leading to blocking a rejection) just when it should: because the data could readily be produced by H0 [ii].

Either way, Jeffreys’ criticism, funny as it is, collapses.

When an observation does lead to rejecting the null, it is because of that outcome—not because of any unobserved outcomes. Considering other possible outcomes that could have arisen is essential for determining (and controlling) the capabilities of the given testing method. In fact, understanding the properties of our testing tool just is to understand what it would do under different outcomes, under different conjectures about what’s producing the data.


[i] Jeffreys’ next sentence, remarkably is: “On the face of it, the evidence might more reasonably be taken as evidence for the hypothesis, not against it.” This further supports my reading, as if we’d reject a fair coin null because it would not predict 100% heads, even though we only observed 51% heads. But the allegation has no relation to significance tests of the Fisherian or N-P varieties.

[ii] One may argue it should be even harder, but that is tantamount to arguing the purported error probabilities are close to the actual ones. Anyway, this is a distinct issue.

Categories: rejected posts, Uncategorized | 1 Comment

Winner of the December 2013 palindrome book contest

fxxnmzg-1WINNER: Zachary David
PALINDROME:
Ableton Live: ya procedure plaid, yo. Oy, dial Peru decor. Pay evil, not Elba.

Zachery notes: “Ableton Live is a popular DJ software all the hipster kids use.”

MINIMUM REQUIREMENT**: A palindrome that includes Elba plus procedure.

BIO: Zachary David is a quantitative software developer at a Chicago-based proprietary trading firm and a student at Northwestern University. He infrequently blogs at http://zacharydavid.com.

BOOK SELECTION: “I’d love to get Error and Inference* off of my wish list and onto my desk.”

EDITOR: It’s yours!

STATEMENT: “Finally, after years of living in Wicker Park, my knowledge of hipsters has found its way into poetry and paid out in prizes. I would like to give a special thank you to professor Mayo for being very welcoming to this first time palindromist. I will definitely participate again… I enjoyed the mental work out. Perhaps the competition will pick up in the future.”

*Full title of book choice:

Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (by D. G. Mayo and A. Spanos, eds.,CUP 2010), 

Note: The word for January 2014 is “optimal” (plus Elba). See January palindrome page.

Congratulations Zachary!

**Nor can it repeat or be close to one that Mayo posts. Joint submissions are permitted (1 book); no age requirements. Professional palindromists not permitted to enter. Note: The rules became much easier starting May 2013, because no one was winning, or even severely trying. The requirements had been Elba + two selected words, rather than only one. I hope we can go back to the more severe requirements once people get astute at palindromes—it will increase your verbal IQ, improve mental muscles, and win you free books. (The book selection changes slightly each month).

_________

Categories: Uncategorized | 1 Comment

Mascots of Bayesneon statistics (rejected post)

bayes_theoremBayes-neon Mascots (desperately seeking): a neon sign! puppies! wigless religious figureprobably the reverend!

I have always thought that the neon sign (of the definition of conditional probability)–first spotted on a truly impressive cult blog–is the fitting emblem for the contemporary Bayes-neon. Politically, epistemologically, and commercially–it says it all!

(My “proper” blog (compared to this one) has a stock mascot, Diamond Offshore. Unfortunately, it’s at like a year low. Search rejected posts, if interested in the story. The insignia or pictorial emblem for that blog is the exiled one, casting about for inductive insights):

photo-2-fishing

Categories: rejected posts | 2 Comments

Blog at WordPress.com. The Adventure Journal Theme.

Follow

Get every new post delivered to your Inbox.