Phil/Stat/Law: 50 Shades of gray between error and fraud

500x307-embo-reports-vol-73-meeting-report-fig-1-abcAn update on the Diederik Stapel case: July 2, 2013, The Scientist, “Dutch Fraudster Scientist Avoids Jail”.

Two years after being exposed by colleagues for making up data in at least 30 published journal articles, former Tilburg University professor Diederik Stapel will avoid a trial for fraud. Once one of the Netherlands’ leading social psychologists, Stapel has agreed to a pre-trial settlement with Dutch prosecutors to perform 120 hours of community service.

According to Dutch newspaper NRC Handeslblad, the Dutch Organization for Scientific Research awarded Stapel $2.8 million in grants for research that was ultimately tarnished by misconduct. However, the Dutch Public Prosecution Service and the Fiscal Information and Investigation Service said on Friday (June 28) that because Stapel used the grant money for student and staff salaries to perform research, he had not misused public funds. …

In addition to the community service he will perform, Stapel has agreed not to make a claim on 18 months’ worth of illness and disability compensation that he was due under his terms of employment with Tilburg University. Stapel also voluntarily returned his doctorate from the University of Amsterdam and, according to Retraction Watch, retracted 53 of the more than 150 papers he has co-authored.

“I very much regret the mistakes I have made,” Stapel told ScienceInsider. “I am happy for my colleagues as well as for my family that with this settlement, a court case has been avoided.”

No surprise he’s not doing jail time, but 120 hours of community service?  After over a decade of fraud, and tainting 14 of 21 of the PhD theses he supervised?  Perhaps the “community service” should be to actually run the experiments he had designed?  What about his innocence of misusing public funds?

From Retraction watch:

The FIOD (Dutch organisation that pursues tax, benefit and other frauds) states that Stapel has not misused public funds, because they were given for doing research that was performed. Most of it was spent on salaries and the personnel paid performed research. Although this was with made-up data, the implicated personnel did not know this.

True, the personnel did not know, but Stapel did. See the Tilberg Report in mystatistical dirty laundry” post. You might say, well there is no hope of recouping the money anyway, but that scarcely prevents many fraud cases. What do you think?

****************

In pondering this, consider another blog post where I asked: “even if reporting spurious statistical results is considered ‘bad statistics,’ is it criminal behavior?” Many comments to that (12/13/13) post were tending to No.  (One comment raised the fear that doing statistics might require having a lawyer!)  That case concerned a biotech company, InterMune, and its previous CEO, Dr. Harkonen. I gave an excerpt from lawyer Nathan Schachtman’s blog:

In August 2002, Dr. Harkonen approved a press release, which carried a headline, “phase III data demonstrating survival benefit of Actimmune in IPF.” A subtitle announced the 70% relative reduction in patients with mild to moderate disease. ……

….The prosecution asserted that Dr. Harkonen engaged in data dredging, grasping for the right non-prespecified end point that had a low p-value attached. Such data dredging implicates the problem of multiple comparisons or tests, with the result of increasing the risk of a false-positive finding, notwithstanding the p-value below 0.05.

Supported by the testimony of Professor Thomas Fleming, who chaired the Data Safety Monitoring Board for the clinical trial in question, the government claimed that the trial results were “negative” because the p-values for all the pre-specified endpoints exceeded 0.05.  Shortly after the press release, Fleming sent InterMune a letter that strongly dissented from the language of the press release, which he characterized as misleading.  Because the primary and secondary end points were not statistically significant, and because the reported mortality benefit was found in a non-prespecified subgroup, the interpretation of the trial data required “greater caution,” and the press release was a “serious misrepresentation of results obtained from exploratory data subgroup analyses.”

The district court sentenced Harkonen to six months of home confinement, three years of probation, 200 hours of community service, and a fine of $20,000. Dr. Harkonen appealed on grounds that the federal fraud statutes do not permit the government to prosecute persons for expressing scientific opinions about which reasonable minds can differ.  Unless no reasonable expert could find the defendant’s statement to be true, the trial court should dismiss the prosecution.  .… The government cross-appealed to complain about the leniency of the sentence….. (my emphasis)

Read the rest of Schachtman’s post. [For the infamous press release to investors, and a link to a time line of this case, see: https://errorstatistics.com/2012/12/14/philstatlaw-bad-statistics-cont/ .]

Harkonen is awaiting his last appeal: the Supreme court in August–stay tuned. [Update: he lost his final appeal!]

****************

Can the Stapel and Harkonen cases be compared? (Not obviously.)  I’m no lawyer, and have only read some published reports.[i] Some “two handed” reflections:  On the one hand, it might be said that scouring subgroups for nominally significant endpoints is done all the time, and there’s nothing like data fabrication going on here. True, but on the other hand, unlike Stapel, Harkonen was in a position to (and did) influence doctors, not to mention manipulate the company’s share price.[ii] The company itself agreed to pay some millions (their own scientists did not approve Harkonen’s report). Also unlike Stapel, Harkonen is ensconced in some other drug company. Still, the latter’s punishment might make the former case appear too lenient. Or not?

Any comparison will hinge on one’s philosophy of punishment. It might be argued that little good would result from further punishing Stapel, but what about from making an example out of Harkonen? Using the latter measure, I might be inclined to agree with the government’s complaint about leniency…maybe replace “jail” for “home confinement” in the sentence sentence? (I know what Schachtman thinks and has legalistically argued.)  What say you?


[i] In a discussion of his most recent, failed, appeal:

“At trial,” the appellate court observed, “nearly everybody actually involved in the … clinical trial testified that the Press Release misrepresented [the trial’s] results,” noting that the evidence showed that “even Harkonen himself was ‘very apologetic’ about the Press Release’s misleading nature.” The appellate court then determined, as did the trial court, that the evidence at trial supported the jury’s findings that Harkonen was aware that the release was misleading and that he acted with the specific intent to defraud. http://www.jdsupra.com/legalnews/ninth-circuit-affirms-conviction-in-hark-69966/

[ii] I never held shares of InterMune.

Categories: PhilStatLaw, spurious p values, Statistics | 13 Comments

Post navigation

13 thoughts on “Phil/Stat/Law: 50 Shades of gray between error and fraud

  1. Nathan Schachtman

    Mayo,

    Me neither (never held shares of Actimmune). I think it is important to understand that the gov’t was mostly concerned about prosecuting off-label promotion by Dr. Harkonen. I won’t comment on that count of the indictment other than to say that the jury acquited on that count, and thus removed the issue from the case. The jury convicted on the wire fraud count, which is why the case is still with us.

    There are important things to keep in mind in comparing Harkonen’s case with Stapel’s. First, Harkonen’s clinical trial was, by everyone’s account, including the government and its witnesses, a well-conducted study. There was no “funny business” with loss to follow up on the intent to treat analysis. Second, Harkonen did not fabricate or falsify any data. Third, the statistical analyses were accurately done and reported. Everything in the gov’t’s case turns on the use of the word “demonstrate” to describe the inference from a subgroup (p = 0.004) not pre-specified. Not best practice, but also not a crime. This was, afterall, a press release, not a government submission for a new drug application. The press release promised that there would be a full presentation of data within a couple of weeks, at a pulmonary medicine conference in Europe, and that was indeed done.

    Why has the gov’t pursued this case? The U.S. has taken the extreme position that there is no first amendment right to speak of off-label uses (a position it lost in U.S. v. Caronia, before the Second Circuit). Having lost that issue with the jury, the prosecutor is trying to do with the wire fraud statute what it failed to do with the FDCA criminalization of misbranding.

    Nathan Schachtman

  2. Nathan: The legal beagle is sure quick! Just on the last para of your comment, I thought the distinction was (is becoming?) that there are first amendment rights for true off-label speech, and that it was the determination of “intent to defraud” that made the difference. I’ll have to look up what I read some weeks ago on this. Actually, perhaps this issue is in flux at the moment. I agree it would be extreme to bar communicating valid results about an off-label use of a treatment. Some things are never “on label”.

  3. Tax inspections are very useful not so much for the money the I.R.S sucks out of them, but because everybody else is scare to be the next one… and that’s a lot of money.

    It just might work as well in the scientific community. How about mandatory audits where the researchers must show to a jury the step by step process in the making of their papers?

    And we do not need many audits for this to work actually… Paranoia works great.

  4. Fran: I’m not sure I get your point. Stapel faced no jury, but of course already admitted to fabricating the data. Harkonen faced a jury but already admitted to data-dependent subgroup analysis.His position is that the latter is not fraud but free speech–to put it very coarsely.

  5. rv

    I

  6. rv

    A subgroup analysis that wasn’t part of the experimental design certainly shouldn’t be interpreted as confirmatory evidence.

    However, doing such an analysis is no worse than any analysis of available data outside the context of a study design. If subgroup analyses are going to be disallowed, then one might as well disallow analyses of observational data as well.

    I think a better approach is not to avert our eyes from available information, but to use methods which don’t result in underpowered comparisons (e.g. shrinkage-based approaches) and to report them for the convenience sample analyses that they are – don’t pretend like one has a confirmatory result just because p<.05.

    • rv: Thanks for your comment. I fail to see the argument behind your claim that to deny that certain kinds of data-dependent agreements count as good evidence is tantamount to disallowing all observational data.
      With respect to moving from a .05 stat sig result to a real effect, I always repeat that Fisher insisted that to assert a phenomenon is experimentally demonstrable: “[W]e need, not an isolated record, but a reliable method of procedure. In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result.” (Fisher Design of Experiments 1947, 14).

  7. Mayo:

    Certainly the line between error and fraud can be difficult to see sometimes.

    But as long as all the steps are published, the analysis, however outrageously wrong, has to be considered as a point of view of the researcher.

    For example, if I flip a coin 100 times and I get 100 heads and my opinion is “no evidence for bias in the coin” this is not fraud. On the other hand, if I flip a coin 100 times and I get 49 heads but I publish that I got 50 heads and my conclusion is “no evidence for bias in the coin” this is fraud.

    So Harkonen should have never been punished if he did not altered data in any way and published all the steps taken in the study. What happened to him is scary. If being wrong is going to be a crime we should all invest in the Private Penitentiary business.

    What’s next? Jailing Frequentists or Bayesians on the grounds their methodology is flawed for one or the other side?

    A possible solution for Harkonen like cases would be to publish the raw data, and how it was obtained, and let everyone else decide if it demonstrates something or not.

    I am more concern though about the blatant fraud cases, these are the ones that I believe could be reduced with an international institution auditing studies.

    I am understand human weakness and, since several famous scientists have incurred in confirmation bias fraud by removing results that went against their champion theory as bogus, is understandable many other scientists will do alike. Hence the audit proposal.

    If professional sport men and women can get tested any moment for doping I see no reason why professional scientists should not be subject to the same strict standards when possible.

  8. Fran: You may wish to distinguish fraud this way*, but we can identify an irresponsible report if, for example, it purports to have done a good job ruling out a systematic pattern when it has not (as in your case of the 100 coin tosses). If your job is to report on patterns of disease, say, and you report “nothing going on here” even though all and only those exposed show signs of a toxic effect, then you haven’t done your job.

    That of course was to give an extreme example like yours. Ordinary cases are not so extreme, but we still know what it is to have done a terrible job at interpreting data, a questionable job needing further checks, or even, a much better job than such and such analysis would indicate, and so on. I don’t think there’s reason to resort to something like: just report all the data and let the doctors or patients or investors figure it out.

    The rule book in statistics and law seems quite sophisticated actually.
    http://www.fjc.gov/public/pdf.nsf/lookup/SciMan3D01.pdf/$file/SciMan3D01.pdf
    This came up in a previous post:
    https://errorstatistics.com/2012/07/10/philstatlaw-reference-manual-on-scientific-evidence-3d-ed-on-statistical-significance-schachtman/

    *though I don’t think it’s in sync with what “fraudbusters” have in mind.

  9. Mayo:

    but we can identify an irresponsible report if, for example…

    Well exactly! you see, as long as we can see it, as long as it is not hidden, then we can criticize it or we can choose not to read the authors papers again, or maybe recommend them some reading.

    For example, in the climate warming show there is a majority that considers highly irresponsible and unscientific the work of a minority… Jail time for the dissidents?

    About the rule book, fair enough, if the are rules for the game we want to play and we break those legal rules then we pay for it, but then let’s not call game science but bureaucracy.

    Nobody in science should ever be legally worried to be wrong, or outlandish in his/her theories. Lysenko comes to my mind

    http://en.wikipedia.org/wiki/Trofim_Lysenko

    This is an extreme case were being right was a crime. I guess that I am just saying that the 1st amendment should hold for scientific thought as well, so if someone pleases to say 2+3=4 in their study then so be it as long as they are honest.

    Well, I guess I am very tolerant with earnest ignorance and error yet ruthless with fraud.

    I don’t think there’s reason to resort to something like: just report all the data and let the doctors or patients or investors figure it out.

    If you are going to buy a car, will you trust the car dealer advice or you will take along with you that friend that knows a lot about cars for advice sake?

    Patients, obviously, they have to rely on doctors, but doctors and investors they’d better rely on someone else than the statistician working for the company selling the product.

    I am not saying they should publish only the raw data, but by doing so they will have more reasons to be accurate in their analysis since they would be a few clicks away to be publicly ashamed otherwise.

    But my real worry, again, is fraud; Stapel’s case shows how easy is to do a career in science based on lies, 53 papers retracted for God’s sake! How can this even be possible? How crooked is the peer review process? Well, at least we know the problem is there which is the first step to solve it.

  10. “Stapel’s case shoes how easy it is to do a career in science based on lies” … If you believe his field is “science”. Think up a cute social observation or guess. Eg vegetarians are nicer people than non-vegetarians. Do a little experiment with 50 psychology students who gain credit from participating, confirm your theory, and publish. Make sure the results are reported in the newspapers! The next generation of psychology students earned their degrees by merely answering questionaires for “scientific” purposes!

    • Richard: I find it very interesting that you say this. Of course that is what I’d more or less alleged in my “M&M’s, limbo stick, ovulation, and Dale Carnegie” post.
      https://errorstatistics.com/2013/06/22/what-do-these-share-in-common-mms-limbo-stick-ovulation-dale-carnegie-sat-night-potpourri/
      But I assumed my position was somewhat extreme. You seemed more generous. But if we do discount the field as non-science, should we describe it as junk, or as simply offering potentially interesting narratives of factors that might be behind human behavior. Or do we shut it down? I would really like to know what you think, and what other people think.

      I have been speculating as to why I think (or fear) this stuff will continue, at least in marketing-related & political fields–aside from the interest current researchers have in surviving, and aside from human/interest/entertainment. Perhaps it is on the fringe in its own right. It has to do with people’s superstitions mixed with placebo effects. Say I know darn well there is nothing scientific about a study purporting to show that being primed with a color or sound or whatever, makes people want to buy my product or vote for my candidate. If it is believed, it may work like a charm (or a hex). And nothing too terrible is likely to happen if it doesn’t work. Better than having the competitor “use” it to advantage. Wacky I suppose. What do you think?

I welcome constructive comments for 14-21 days. If you wish to have a comment of yours removed during that time, send me an e-mail.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.