20 Comments

I'm a bit concerned by the suggestion that what one does is look at the most plausibly manipulated pieces of data.

Consider a perfectly honest study which reaches some desired statistically significant result. Since clearly no one would introduce fake data unless it moved the result from insignificant/non-existent to significant we can infer that looking at the subset of data most likely to have been manipulated always switches insignificant to significant results.

The test can't be what's the most plausible manipulated subset of the data but something a bit more complicated that considers how plausible/sus those data points happen to be.

Expand full comment

> Suppose observations with low outcome values are switched from condition 1 to condition 2. Dropping these observations still leaves the mean outcome for condition 1 artificially lower.

This should say "artificially higher", right?

Expand full comment

Yes. Thank you for catching this.

Expand full comment

Nice work here, Matt! I enjoyed the footnote joke about 1980s data management practices. 😄

Expand full comment

The punny title is a nice touch as well.

Expand full comment

The gold is always hidden in the footnotes: "Strictly speaking, there are all sorts of tricks a competent fraudster might undertake, to both produce the desired result and cover their tracks. But at no point here does Data Colada appear to be alleging the existence of competent fraud." Sick burn!

Expand full comment

There's an old saying, "If you have to explain statistics to a jury, you've already lost."

Gino's fabrication is sufficiently egregious that I don't expect her to win

Expand full comment

I actually don't know which way I'd expect a jury to default regarding burden of proof here (irrespective of whatever a judge instructs). If they default to "DC committed defamation unless they can show the research was probably fraudulent" then they have to explain statistics... If they default to "Gino has to show the research wasn't fraudulent" it's reversed.

Expand full comment

Burden of proof is on the plaintiff in the United States. If the defendants are public figures, which might be iffy in this case, it’s even harder.

Expand full comment

I believe Gino is playing the long game here. The point of the lawsuit is not to win. The point of the lawsuit is to extract settlements from HBS and Data Colada (complete with NDAs, naturally).

She'll then be able to wave the settlements around as evidence of her innocence. Of course the settlements won't prove anything like that, but the point here is to sufficiently confuse people, and to give some air cover to any institution that wants to hire her.

Expand full comment

I suspect that keeping Data Colada in the mix is part of Option 3. Once you have the July 16 spreadsheet to compare with the OSF data, it seems the Data Falsificada #1 analysis becomes irrelevant. But "HBS used the wrong dataset" kind of reintroduces it and thus raises the need to explain away calcChain.xml. At this point, enough heads may be spinning to serve the goals of Option 3.

Expand full comment

I agree - once you have an original datafile you trust to be true and unmanipulated, then everything Data Colada are doing in #1 becomes largely superfluous. (Of course this creates an incentive to deny that datafile is the correct original). Clever forensic tricks are what you have to resort to when you have limited information.

Expand full comment

"Data Colada observe this and concludes that the six observations from 51 to 52 are all out-of-sequence, while under Gino’s rule, only 51, 12, and 5 follow higher valued IDs within the same condition and thus are deemed out-of-sequence. She also appears to erroneously categorize ID 91 as out-of-sequence, in violation of her stated rule."

Doubly ironically, you also missed that under Gino's rule 52 also classifies as suspicious because it follows 91.

Expand full comment

Thanks, fixed. Both Gino and Data Colada flag observation 52.

Expand full comment

The arguments made by Data Colada are pretty compelling. It seems to me that if there were an honest explanation for their findings, Gino would not have sued them.

Also I don't think I've seen this point addressed head-on, but Harvard has access to the original data. Now I think in her lawsuit, Gino muddies the water around whether the data Harvard calls 'original' is in fact original, but it seems reasonable to think so.

If the data in the Excel sheets hadn't been manipulated, that would have been clear from the original data. Harvard had access to the original data, and still fired her. That says a lot.

I think Gino is about to truly understand the Streisand effect.

Expand full comment

Why wouldn't you sue if there is an honest explanation? I mean the voting systems people sued fox despite having very good responses to the absurd concerns raised in that situation.

I don't think Gino has a good defense but I don't think suing tells us much other than Gino doesn't think they will convince Data Colada to admit they were wrong volountarily and that they've been advised by lawyers about preserving legal avenues.

Expand full comment

If you're a researcher and you come to a conclusion based on the data, and I come to a different conclusion, I'll share that conclusion with you. We can then have a discussion about facts, ideas, and argument.

Suing is just an end-run around this process and demonstrates she has no ideas worth debating, which is consistent with her being a liar and a cheat.

You are right however that it is still possible she is correct, suing doesn't rule it out completely. It is suggestive however.

Expand full comment

I think it's more fair to say that suing happens when at least one party is acting unreasonably or unethically. So if you're opinion about Data Colada suggest they are being reasonable then I see how it looks bad.

I really should have said: fair in this case though I've seen cases in academia where the person doing the criticism really is on some kind of unjustified crusade to tear someone down and it makes sense to sue then too.

Expand full comment

Fair. I think if she had offered some reasonable explanations of why Data Colada was wrong, I'd be more sympathetic (even if I disagreed with her reasons).

In her lawsuit she accuses DC of coming after her because she's a woman, yet offers zero support/reasoning for that accusation.

I find her despicable so I might have a tough time thinking rationally about what's happening here.

Expand full comment

What I find most persuasive is that Harvard acted. They've basically never done that before and they wouldn't start with a woman and act so quickly unless they didn't have any doubts.

Expand full comment