Too swamped to read about ‘the swamping problem’ in epistemology, but…



I was sent an interesting paper that is a quintessential exemplar of analytic epistemology. It’s called “What’s the Swamping Problem?” (by Duncan Prichard), and was tweeted to me by a philosophy graduate student, George Shiber. I’m too tired and swamped to read the fascinating ins and outs of the story. Still, here are some thoughts off-the-top of my head that couldn’t be squeezed into a tweet. I realize I’m not explaining the problem, that’s why this is in “rejected posts”–I didn’t accept it for the main blog. (Feel free to comment. Don’t worry, absolutely no one comes here unless I direct them through the swamps.)

1.Firstly, it deals with a case where the truth of some claim is given whereas we’d rarely know this. The issue should be relevant to the more typical case. Even then, it’s important to be able to demonstrate and check why a claim is true, and be able to communicate the reasons to others. In this connection, one wants information for finding out more things and without the method you don’t get this.

  1. Second, the goal isn’t merely knowing isolated factoids but methods. But that reminds me that nothing is said about learning the method in the paper. There’s a huge gap here. If knowing, is understood as true belief PLUS something, then we’ve got to hear what that something is. If it’s merely reliability without explanation of the method,(as is typical in reliabilist discussions) no wonder it doesn’t add much, at least wrt that one fact. It’s hard even to see the difference, unless the reliable method is spelled out. In particular, in my account, one always wants to know how to recognize and avoid errors in ranges we don’t yet know how to probe reliably. Knowing the method should help extend knowledge into unknown territory.
  2. We don’t want trivial truths. This is what’s wrong with standard confirmation theories, and where Popper was right. We want bold, fruitful, theories that interconnect areas in order to learn more things. I’d rather know how to spin-off fabulous coffee makers using my 3-D printer, say, then have a single good coffee now. The person who doesn’t care how a truth was arrived at is not a wise person. The issue of “understanding” comes up (one of my favorite notions), but little is said as what it amounts to.
  1. Also overlooked on philosophical accounts is the crucial importance of moving from unreliable claims to reliable claims (e.g., by averaging, in statistics.) . I don’t happen to think knowing merely that the method is reliable is of much use, w/o knowing why, w/o learning how specific mistakes were checked, errors are made to ramify to permit triangulation, etc.
  1. Finally, one wants an epistemic account that is relevant for the most interesting and actual cases, namely when one doesn’t know X or is not told X is a true belief. Since we are not given that here (unless I missed it) it doesn’t go very far.
  1. Extraneous: On my account, x is evidence for H only to the extent that H is well tested by x. That is, if x accords with H, it is only evidence for H to the extent that it’s improbable the method would have resulted in so good accordance if H is false. This goes over into entirely informal cases. One still wants to know how capable and incapable the method was to discern flaws.
  1. Related issues, though it might not be obvious at first, concerns the greater weight given to a data set that results from randomization, as opposed to the same data x arrived at through deliberate selection.

Or consider my favorite example: the relevance of stopping rules. People often say that if data x on 1009 trials achieves statistical significance at the .05 level, then it shouldn’t matter if x arose from a method that planned on doing 1009 trials all along, or one that first sought significance after the first 10, and still not getting it went on to 20, then 10 more and 10 more until finally at trial 1009 significance was found. The latter case involves what’s called optional stopping. In the case of, say, testing or estimating the mean of a Normal distribution the optional stopping method is unreliable, at any rate, the probability it erroneously infers significance is much higher than .05. It can be shown that this stopping rule is guaranteed to stop in finitely trials and reject the null hypothesis, even though it is true. (Search optional stopping on

I may add to this later…You can read it: What Is The Swamping Problem

Categories: Misc Kvetching, Uncategorized | 1 Comment

Potti Update: “I suspect that we likely disagree with what constitutes validation” (Nevins and Potti)

PottiSo there was an internal whistleblower after all (despite denials by the Duke people involved): a med student Brad Perez. It’s in the Jan. 9, 2015 Cancer Letter. I haven’t studied this update yet, but thought I’d post the letter here on Rejected Posts. (Since my first post on Potti last May, I’ve received various e-mails and phone calls from people wanting to share the inside scoop, but I felt I should wait for some published item.)
          Here we have a great example of something I am increasingly seeing: Challenges to the scientific credentials of data analysis are dismissed as mere differences in “statistical philosophies” or as understandable disagreements about stringency of data validation.
         If so, then statistical philosophy is of crucial practical importance. While Potti and Nevins concur (with Perez) that data points in disagreement with their model are conveniently removed, they claim the cherry-picked data that do support their model give grounds for ignoring the anomalies. Since the model checks out in the cases it checks out, it is reasonable to ignore those annoying anomalous cases that refuse to get in line with their model. After all it’s only going to be the basis of your very own “personalized” cancer treatment!
Jan 9, 2015
 Extracts from their letter:
Nevins and Potti Respond To Perez’s Questions and Worries

Dear Brad,

We regret the fact that you have decided to terminate your fellowship in the group here and that your research experience did not tum out in a way that you found to be positive. We also appreciate your concerns about the nature of the work and the approaches taken to the problems. While we disagree with some of the measures you suggest should be taken to address the issues raised, we do recognize that there are some areas of the work that were less than perfect and need to be rectified.


 I suspect that we likely disagree with what constitutes validation.


We recognize that you are concerned about some of the methods used to develop predictors. As we have discussed, the reality is that there are often challenges in generating a predictor that necessitates trying various methods to explore the potential. Clearly, some instances arc very straightforward such as the pathway predictors since we have complete control of the characteristics of the training samples. But, other instances are not so clear and require various approaches to explore the potential of creating a useful signature including in some cases using information from initial cross validations to select samples. If that was all that was done in each instance, there is certainly a danger of overfitting and getting overly optimistic prediction results. We have tried in all instances to make use of independent samples for validation of which then puts the predictor to a real test. This has been done in most such cases but we do recognize that there are a few instances where there was no such opportunity. It was our judgment that since the methods used were essentially the same as in other cases that were validated, that it was then reasonable move forward. You clearly disagree and we respect that view but we do believe that our approach is reasonable as a method of investigation.

……We don’t ask you to condone an approach that you disagree with but do hope that you can understand that others might have a different point of view that is not necessarily wrong.

Finally, we would like to once again say that we regret this circumstance. We wish that this would have worked out differently but at this point, it is important to move forward.

Sincerely yours,

Joseph Nevins

Anil Potti

The Med Student’s Memo

Bradford Perez Submits His Research Concerns

Nevins and Potti Respond To Perez’s Questions and Worries

A Timeline of The Duke Scandal

The Cancer Letter’s Previous Coverage


I’ll put this up in my regular blog shortly

Categories: junk science, Potti and Duke controversy | 1 Comment

Why are hypothesis tests (often) poorly explained as an “idiot’s guide”?

From Aris Spanos:

“Inadequate knowledge by textbook writers who often do not have the technical skills to read and understand the original sources, and have to rely on second hand accounts of previous textbook writers that are often misleading or just outright erroneous. In most of these textbooks hypothesis testing is poorly explained as an idiot’s guide to combining off-the-shelf formulae with statistical table.

“A deliberate attempt to distort and cannibalize frequentist testing for certain Bayesian statisticians who revel in (unfairly) maligning frequentist inference in their misguided attempt to motivate their preferred viewpoint of statistical inference.” (Aris Spanos)


Categories: frequentists tests | 1 Comment

Gelman’s error statistical critique of data-dependent selections–they vitiate P-values: an extended comment

The nice thing about having a “rejected posts” blog, which I rarely utilize, is that it enables me to park something too long for a comment, but not polished enough to be “accepted” for the main blog. The thing is, I don’t have time to do more now, but would like to share my meanderings after yesterday’s exchange of comments with Gelman.

I entirely agree with Gelman that in studies with wide latitude for data-dependent choices in analyzing the data, we cannot say the study was stringently probing for the relevant error (erroneous interpretation) or giving its inferred hypothesis a hard time.

One should specify what the relevant error is. If it’s merely inferring some genuine statistical discrepancy from a null, that would differ from inferring a causal claim. Weakest of all would be merely reporting an observed association. I will assume the nulls are like those in the examples in the “The Garden of Forking Paths” paper, only I was using his (2013) version. I think they are all mere reports of observed associations except for the BEM ESP study. (That they make causal, or predictive, claims already discredits them).

They fall into the soothsayer’s trick of, in effect, issuing such vague predictions that they are guaranteed not to fail.

Here’s a link to Gelman and Loken’s “The Garden of Forking Paths”

I agree entirely: “Once we recognize that analysis is contingent on data, the p-value argument disappears–one can no longer argue that, if nothing were going on, that something as extreme as what was observed would occur less than 5% of the time.” (Gelman 2013, p. 10). The nominal p-values does not reflect the improbability of such an extreme or more extreme result due to random noise or “nothing going on”.

A legitimate p-value of α must be such that

Pr(Test yields P-value < α; Ho: chance) ~ α.

With data dependent hypotheses, the probability the test outputs a small significance level can easily be HIGH, when it’s supposed to be LOW. See this post: “Capitalizing on Chance” reporting on Morrison and Henkel from the 1960s!![i]

Notice, statistical facts about p-values demonstrate the invalidity of taking these nominal p-values as actual. So statistical facts about p-values are self-correcting or error correcting.

So, just as in my first impression of the “Garden” paper, Gelman’s concern is error statistical: it involves appealing to data that didn’t occur, but might have occurred, in order to evaluate inferences from the data that did occur. There is an appeal to a type of sampling distribution over researcher “degrees of freedom” akin to literal multiple testing, cherry-picking, barn-hunting and so on.

One of Gelman’s suggestions is (or appears to be) to report the nominal p-value, and then consider the prior that would render the p-value = to the resulting posterior. If the prior doesn’t seem believable, I take it you are to replace it with one that does. Then, using whatever prior you have selected, report the posterior probability that the effect is real. (In a later version of the paper, there is only reference to using a “pessimistic prior”.) This is remindful of Greenland’s “dualistic” view. Please search on error

Here are some problems I see with this:

  1. The supposition is that for the p-value to be indicative of evidence for the alternative (say in a one-sided test of a 0 null), the p-value should be like a posterior probability for the null, (1- p) to the non-null. This is questionable.

Aside: Why even allow using the nominal p-value as a kind of likelihood to go into the Bayesian analysis if its illegitimate? Can we assume the probability model used to compute the likelihood from the nominal p-value?

  1. One may select the prior in such a way that one reports a low posterior that the effect is real. There’s wide latitude in the selection and it will depend on the framing of the “not-Ho” (non-null).Now one has “critics degrees of freedom” akin to researchers degrees of freedom.

One is not criticizing the study or pinpointing its flawed data dependencies, yet one is claiming to have grounds to criticize it.

Or suppose the effect inferred is entirely believable and now the original result is blessed—even though it should be criticized as having poorly tested the effect. Adjudicating between different assessments by different scientists will become a matter of defending one’s prior, when it should be a matter of identifying the methodological flaws in the study. The researcher will point to many other “replications” in a big field studying similar effects, etc.

There’s a crucial distinction between a poorly tested claim and an implausible claim. An adequate account of statistical testing needs to distinguish these.

I want to be able to say that the effect is quite plausible given all I know, etc., but this was a terrible test of it, and supplies poor grounds for the reality of the effect.

Gelman’s other suggestion, that these experimenters distinguish exploratory from confirmatory experiments, and that they be required to replicate their results is, on the face of it, more plausible. But the only way this would be convincing, as I see it, is if the data analysts were appropriately blinded. Else, they’ll do the same thing with the replication.

I agree of course that a mere nominal p-value “should not be taken literally” (in the sense that it’s not an actual p-value)—but I deny that this is equal to assigning p as a posterior probability to the null.

There are many other cases in which data-dependent hypotheses are well-tested by the same data used in their construction/selection  Distinguishing cases has been the major goal of much of my general work in philosophy of science (and it carries over into PhilStat).

 One last thing: Gelman is concerned that the p-values based on these data-dependent associations are misleading the journals and misrepresenting the results. This may be so in the “experimental” cases. But if the entire field knows that this is a data-dependent search for associations that seem indicative of supporting one or another conjecture, and that the p-value is merely a nominal or computed measure of fit, then it’s not clear there is misinterpretation. It’s just a reported pattern.

[i] When the hypotheses are tested on the same data that suggested them and when tests of significance are based on such data, then a spurious impression of validity may result. The computed level of significance may have almost no relation to the true level. . . . Suppose that twenty sets of differences have been examined, that one difference seems large enough to test and that this difference turns out to be “significant at the 5 percent level.” Does this mean that differences as large as the one tested would occur by chance only 5 percent of the time when the true difference is zero? The answer is no, because the difference tested has been selected from the twenty differences that were examined. The actual level of significance is not 5 percent, but 64 percent! (Selvin 1970, 104)



Categories: rejected posts | Tags: | 3 Comments

A SANTA LIVED AS A DEVIL AT NASA! Create an easy peasy Palindrome for December, win terrific books for free!

imagesTo avoid boredom, win a free book, and fulfill my birthday request, please ponder coming up with a palindrome for December. If created by anyone younger than 18, they get to select two books. All it needs to have is one word: math (aside from Elba, but we all know able/Elba). Now here’s a tip: consider words with “ight”: fight, light, sight, might. Then just add some words around as needed. (See rules, they cannot be identical to mine.)

Night am…. math gin

fit sight am ….math gist if

sat fight am…math gift as

You can search “palindrome” on my regular blog for past winners, and some on this blog too.

Thanx, Mayo

Categories: palindrome, rejected posts | 6 Comments

More ironies from the replicationistas: Bet on whether you/they will replicate a statistically significant result

For a group of researchers concerned wit how the reward structure can bias results of significance tests, this has to be a joke or massively ironic:

Second Prediction Market Project for the Reproducibility of Psychological Science

The second prediction market project for the reproducibility project will soon be up and running – please participate!

There will be around 25 prediction markets, each representing a particular study that is currently being replicated. Each study (and thus market) can be summarized by a key hypothesis that is being tested, which you will get to bet on.

In each market that you participate, you will bet on a binary outcome: whether the effect in the replication study is in the same direction as the original study, and is statistically significant with a p-value smaller than 0.05.

Everybody is eligible to participate in the prediction markets: it is open to all members of the Open Science Collaboration discussion group – you do not need to be part of a replication for the Reproducibility Project. However, you cannot bet on your own replications.

Each study/market will have a prospectus with all available information so that you can make informed decisions.

The prediction markets are subsidized. All participants will get about $50 on their prediction account to trade with. How much money you make depends on how you bet on different hypotheses (on average participants will earn about $50 on a Mastercard (or the equivalent) gift card that can be used anywhere Mastercard is used).

The prediction markets will open on October 21, 2014 and close on November 4.

If you are willing to participate in the prediction markets, please send an email to Siri Isaksson by October 19 and we will set up an account for you. Before we open up the prediction markets, we will send you a short survey.

The prediction markets are run in collaboration with Consensus Point.

If you have any questions, please do not hesitate to email Siri Isaksson.

Categories: rejected posts | 3 Comments

Msc. kvetch: Are you still fully dressed under your clothes?

UnknownMen have a constitutional right to take pictures under women’s skirts. Yup. That’s what the Massachusetts courts have determine after one Michael Robertson was caught routinely taking pictures and videos up the skirts of women. It even has a name: upskirting.

The Supreme Judicial Court overruled a lower court decision that had upheld charges against Michael Robertson, who was arrested in August 2010 by transit police who set up a sting after getting reports that he was using his cellphone to take photos and video up female riders’ skirts and dresses.

Robertson had argued that it was his constitutional right to do so…..

“A female passenger on a MBTA trolley who is wearing a skirt, dress or the like covering these parts of her body is not a person who is ‘partially nude,’ no matter what is or is not underneath the skirt..”

Link is here.

But this is absurd: she IS partially nude under her clothing, even if she isn’t when you don’t look up her skirt! The picture Robertson took is not of her fully clothed.

People are fully clothed when the TSA conducts whole body scans in airports (a practice that’s largely ended), and yet the pictures would be of the person naked. If you can be partially naked when an instrument sees through your clothes, then you can be partially naked when a cell phone is held under your skirt. Do we really have to get philosophical about these terms…?

Meanwhile, they’re busy trying to pass a law against upskirting in MA. So are guys in Boston  busy getting all the constitutional shots they can in the mean time?

Chris Dearborn, a law professor at Suffolk University in Boston, said the court’s ruling served as a signal to the legislature to act fast, but also likely had Peeping Toms briefly “jumping for joy”. Link is here.

Jumping for joy at violating a woman’s privacy? What kind of Neanderthals are in Boston these days?


Seems like upskirting is back in the news, this time in Georgia. Under your skirt is not really a private place, after all.

Categories: Misc Kvetching | 15 Comments

Msc Kvetch: Is “The Bayesian Kitchen” open for cookbook statistics?

I was sent a link to “The Bayesian Kitchen” and while I cannot tell for sure from from the one post, I’m afraid the kitchen might be open for cookbook statistics. It is suggested (in this post) that real science is all about “science wise” error rates (as opposed to it capturing some early exploratory efforts to weed out associations possibly worth following up on, as in genomics). Here were my comments:

False discovery rates are frequentist but they have very little to do with how well warranted a given hypothesis or model is with data. Imagine the particle physicists trying to estimate the relative frequency with which discoveries in science are false, and using that to evaluate the evidence they had for a Standard Model Higgs on July 4, 2012. What number would they use? What reference class? And why would such a relative frequency be the slightest bit relevant to evaluating the evidential warrant for the Higgs particle, nor for estimating its various properties, nor for the further testing that is now ongoing. Instead physicists use sigma levels (and associated p-values)! They show that the probability is .9999999… that they would have discerned the fact that background alone was responsible for generating the pattern of bumps they repeatedly found (in two labs). This is an error probability. It was the basis for inferring that the SM Higgs hypothesis had passed with high severity, and they then moved on to determining what magnitudes had passed with severity. That’s what science is about! Not cookbooks, not mindless screening (which might be fine for early explorations of gene associations, but don’t use that as your model for science in general).

The newly popular attempt to apply false discovery rates to “science wise error rates” is a hybrid fashion that (inadvertently) institutionalizes cookbook statistics: dichotomous “up-down” tests, the highly artificial point against point hypotheses (a null and some alternative of interest—never mind everything else), identifying statistical and substantive hypotheses, and the supposition that alpha and power can be used as a quasi-Bayesian likelihood ratio. And finally, to top it all off, by plucking from thin air the assignments of “priors” to the null and alternative—on the order of .9 and .1—this hybrid animal reports that more than 50% of results in science are false! I talk about this more on my blog

(for just one example:

Categories: Misc Kvetching, Uncategorized | 4 Comments

Msc Kvetch: comment to Kristof at 5a.m.

My comment follows his article

Bridging the Moat Around Universities


My Sunday column is about the unfortunate way America has marginalized university professors–and, perhaps sadder still, the way they have marginalized themselves from public debate. When I was a kid, the Kennedy administration had its “brain trust” of Harvard faculty members, and university professors were often vital public intellectuals who served off and on in government. That’s still true to some degree of economists, but not of most other Ph.D programs. And we’re all the losers for that.

I’ve noticed this particularly with social media. Some professors are terrific on Twitter, but they’re the exceptions. Most have terrific insights that they then proceed to bury in obscure journals or turgid books. And when professors do lead the way in trying to engage the public, their colleagues sometimes regard them with suspicion. Academia has also become inflexible about credentials, disdaining real-world experience. So McGeorge Bundy became professor of government at Harvard and then dean of the faculty (at age 34!) despite having only a B.A.–something that would be impossible today. Indeed, some professors would oppose Bill Clinton getting a tenured professorship in government today because of his lack of a Ph.D, even though he arguably understands government today better than any other American.

In criticizing the drift toward unintelligible academic writing, my column notes that some professors have submitted meaningless articles to academic journals, as experiments, only to see them published. If I’d had more space, I would have gone through the example of Alan Sokal of NYU, who in 1996 published an article in “Social Text” that he described as: “a pastiche of left-wing cant, fawning references, grandiose quotations, and outright nonsense.” Not only was it published, but after the article was unveiled as gibberish, Social Text’s editors said it didn’t much matter: “Its status as parody does not alter, substantially, our interest in the piece, itself, as a symptomatic document.”

I hope people don’t think my column is a denunciation of academia. On the contrary, I think universities are an incredible national resource, with really smart thinking on vital national issues. I want the world to get the benefit of that thinking, not see it hidden in academic cloisters. Your thoughts on this issue?


Deborah Mayo Virginia 12 hours ago

In my own field of philosophy, the truth is that the serious work, the work that advances the ideas and research, takes place in “obscure journals or turgid books”. There are plenty of areas where this research can be directly relevant to public issues–it’s the public who should be a bit more prepared to engage with the real scholarship. Take my specialization of philosophy of statistical inference in science. Science writers appear to be only interested in repeating the popular, sexy, alarmist themes (e.g., most research is wrong, statistical significance is bogus,science fails to self-correct). Rather than research what some more careful thinkers have shown, or engage the arguments behind contrasting statistical philosophies–those semi-turgid books–, these science writers call around to obtain superficial dramatic quips from the same cast of characters. They have a one-two recipe for producing apparently radical and popular articles this way. None of the issues ever get clarified this way. I suggest the public move closer to the professional work rather than the other way around. Popular is generally pablam, at least in the U.S.

Categories: Misc Kvetching | Leave a comment

Notes (from Feb 6*) on the irrelevant conjunction problem for Bayesian epistemologists (i)



* refers to our seminar: Phil6334

I’m putting these notes under “rejected posts” awaiting feedback and corrections.

Contemporary Bayesian epistemology in philosophy appeals to formal probability to capture informal notions of “confirmation”, “support”, “evidence”, and the like, but it seems to create problems for itself by not being scrupulous about identifying the probability space, set of events, etc. and not distinguishing between events and statistical hypotheses. There is usually a presumed reference to games of chance, but even there things can be altered greatly depending on the partition of events. Still, we try to keep to that. The goal is just to give a sense of that research program. (Previous posts on the tacking paradox: Oct. 25, 2013: “Bayesian Confirmation Philosophy and the Tacking Paradox (iv)*” &  Oct 25.

(0) Simple Bayes Boost R: 

H is “confirmed” or supported by x if P(H|x) > P(H) (P(x |H)) > P(x).

H is disconfirmed (or undermined) by x if P(H|x) < P(H), (else x is confirmationally irrelevant to H).

Mayo: The error statistician would already get off at (0): probabilistic affirming the consequent is maximally unreliable, violating the minimal requirement for evidence. That could be altered with context-depedent information about how the data and hypotheses are arrived at, but this is not made explicit.

(a) Paradox of irrelevant conjunctions (‘tacking paradox’)

If x confirms H, then x also confirms (H & J), even if hypothesis J is just “tacked on” to H.[1]

Hawthorne and Fitelson (2004) define:

J is an irrelevant conjunct to H, with respect to evidence x just in case

P(x|H) = P(x|H & J).

(b) Example from earlier: For instance, x might be radioastronomic data in support of:

H: “the GTR deflection of light effect is 1.75” and

J: “the radioactivity of the Fukushima water dumped in the Pacific ocean is within acceptable levels”.

(1) Bayesian (Confirmation) Conjunction: If x Bayesian confirms H, then x Bayesian-confirms:

(H & J), where P(xH & J ) = P(x|H) for any J consistent with H

(where J is an irrelevant conjunct to H, with respect to evidence x).

If you accept R, (1) goes through.

Mayo: We got off at (0) already.  Frankly I don’t  know why Bayesian epistemologists would allow adding an arbitrary statement or hypothesis not amongst those used in setting out priors. Maybe it’s assumed J is in there somehow (in the background K), but it seems open-ended, and they have not objected.

But let’s just talk about well-defined events in a probability experiment; and limit ourselves to talking about an event providing evidence of another event (e.g., making it more or less expected) in some sense. In one of Fitelson’s examples,   P(black|ace of spade) > P(black), so “black” confirms it’s an ace of spades (presumably in random drawings of card color from an ordinary deck)–despite “ace” being an “irrelevant conjunct” of sorts. Even so, if someone says data x (it’s a stock trader) is evidence it’s an inside trader in a hedge firm, I think it would be assumed that something had been done to probe the added conjuncts.

(2) Using simple B-boost R: (H & J) gets just as much of a boost by x as does H—measuring confirmation as a simple B-boost: R.

CR(H, x) = CR((HJ), x) for irrelevant conjunct J.

R: P(H|x)/P(H) (or equivalently, P(x |H)/P(x))

(a) They accept (1) but (2) is found counterintuitive (by many or most Bayesian epistemologists). But if you’ve defined confirmation as a B-boost, why run away from the implications? (A point Maher makes.) It seems they implicitly slide into thinking of what many of us want:

some kind of an assessment of how warranted or well-tested H is (with x and background).

(not merely a ratio which, even if we can get it, won’t mean anything in particular. It might be any old thing, 2, 22, even with H scarcely making x expected.).

(b) The intuitive objection according to Crupi and Tentori (2010) is this (e.g., p. 3): “In order to claim the same amount of positive support from x to a more committal theory “H and J” as from x to H alone, …adding J should contribute by raising further how strongly x is expected assuming H by itself. Otherwise, what would be the specific relevance of J?” (using my letters, emphasis added)

But the point is that it’s given that J is irrelevant. Now if one reports all the relevant information for the inference, one might report something like (H & J) makes x just as expected as H alone. Why not object to the “too was” confirmation (H & J) is getting when nothing has been done to probe J? I think the objection is, or should be, that nothing has been done to show J is the case rather than not. P(x |(H & J)) = P(x |(H & ~J)).

(c) Switch from R to LR: What Fitelson (Hawthorne and Fitelson 2004) do is employ, as a measure of the B-boost, what some call the likelihood ratio (LR).

CLR(H, x) = P(x | H)/P(x | ~H).

(3) Let x confirm H, then

(*) CLR(H, x) > CLR((HJ), x)

For J an irrelevant conjunct to H.

So even though x confirms (H & J) it doesn’t get as much as H does, at least if one uses LR. (It does get as much using R).

They see (*) as solving the irrelevant conjunct problem.

(4) Now let x disconfirm Q, and x confirm ~Q, then

(*) CLR(~Q, x) > CLR((~Q & J), x)

For J an irrelevant conjunct to Q: P(x|Q) = P(x|J & Q).

Crupi and Tentori (2010) notice an untoward consequence of using LR confirmation in the case of disconfirmation (substituting their Q for H above): If x disconfirms Q, then (Q & J) isn’t as badly disconfirmed as Q is, for J an irrelevant conjunct to Q. But this just follows from (*), doesn’t it? That is, from (*), we’d get (**) [possibly an equality goes somewhere.)

(**) CLR(Q, x) < CLR((Q & J), x).

This says that if x disconfirms Q, (Q & J) isn’t as badly disconfirmed as Q is. This they find counterintuitive.

But if (**) is counterintuitive, then so is (*).

(5) Why (**) makes sense if you wish to use LR:

The numerators in the LR calculations are the same:

P(x|Q & J) = P(x|Q) and P(x|H & J) = P(x|H) since in both cases J is an irrelevant conjunct.

But P(x|~(Q & J)) < P(x|~Q)

Since x disconfirms Q, x is more probable given ~Q than it is given (~Q v ~J). This explains why

(**) CLR(Q, x) < CLR((Q & J), x)

So if (**) is counterintuitive then so is (*).

(a) Example Qunready for college.

If x = high scores on a battery of college readiness tests, then x disconfirms Q and confirms ~Q.

What should J be? Suppose having one’s favorite number be an even number (rather than an odd number) is found irrelevant to scores.

(i) P(x|~(Q & J)) = P(high scores| either college ready or ~J)

(ii) P(x|~Q ) = P(high scores| college ready)

(ii) might be ~1 (as in the earlier discussion), while (i) considerable less.

The high scores can occur even among those whose favorite number is odd.This explains why

(**) CLR(Q, x) < CLR((Q & J), x)

In the case where x confirms H, it’s reversed

P(x |~(H & J)) > P(x |~H)

(b) Using one of Fitelson’s examples, but for ~Q:

e.g., Q: not-spades    x: black      J: ace

P(x |~Q) = 1.

P(x | Q)= 1/3

P(x |~(Q & J)) = 25/49

i.e., P(black|spade or not ace)=25/49

Note: CLR [(Q & J), x) = P(x |(Q & J))/P(x |~(Q & J))

Please share corrections, questions.

Previous slides are:


Chalmers (1999). What Is This Thing Called Science, 3rd ed. Indianapolis; Cambridge: Hacking.

Crupi & Tentori (2010). Irrelevant Conjunction: Statement and Solution of a New Paradox, Phil Sci, 77, 1–13.

Hawthorne & Fitelson (2004). Re-Solving Irrelevant Conjunction with Probabilistic Independence, Phil Sci 71: 505–514.

Maher (2004). Bayesianism and Irrelevant Conjunction, Phil Sci 71: 515–520.

Musgrave (2010) “Critical Rationalism, Explanation, and Severe Tests,” in Error and Inference (D.Mayo & A. Spanos eds). CUP: 88-112.

[1] Chalmers and Musgrave say I should make more of how simply severity solves it, notably for distinguishing which pieces of a larger theory rightfully receive evidence, and a variety of “idle wheels” (Musgrave, p. 110.)

Categories: phil6334 rough drafts | 3 Comments

Blog at