“On the Importance of testing a random sample (for Covid)”, an article from Significance magazine


Nearly 3 months ago I tweeted “Stat people: shouldn’t they be testing a largish random sample of people [w/o symptoms] to assess rates, alert those infected, rather than only high risk, symptomatic people, in the U.S.?” I was surprised that nearly all the stat and medical people I know expressed the view that it wouldn’t be feasible or even very informative. Really? Granted, testing was and is limited, but had it been made a priority, it could have been done. In the new issue of Significance (June 2020) that I just received, James J. Cochran writes “on the importance of testing a random sample.” [1] 

In the United States (as of 9 April 2020), President Donald Trump has said that testing for novel coronavirus infection will be limited to people who believe they may be infected. But if we only test people who believe they may be infected, we cannot understand how deep the virus has reached into the population. The only way this could work is if those who believe they may be infected are representative of the population with respect to novel coronavirus infection. Does anyone believe this is so? The common characteristic of those who believe they may be infected is that they all show some outward symptoms of infection by the virus. In other words, people who are being tested for the novel coronavirus are disproportionately showing severe symptoms. This would not be a problem if someone who is infected by the novel coronavirus immediately shows symptoms, but this is not the case. We have strong evidence that some people develop mild cases, show no symptoms, and carry the virus without knowing it because they are asymptomatic. Thus, efforts to understand the virus’s penetration into the population must include observation of the asymptomatic.

Indeed, a recent assessment (the Annals of Internal Medicine) is that at least 40% of people with covid 19 are (and remain) asymptomatic. (An overview is in Time). Oddly, while remaining asymptomatic, some still show damage to the lungs or other organs. 

The estimate of the proportion of the population who are infected can be calculated as:


So, we need data from a random sample of the entire population in order to gather data from infected people who are showing symptoms, infected people who are asymptomatic, and people who are not infected. All have some probability of being included in a true random sample of the population.

As of 23 April, leaders in Germany and New York State (see bit.ly/2Kp2iXd and dailym.ai/3bxZ5Au) had moved to implement random testing to assess how widespread the virus is, but there has been resistance from leaders elsewhere. This could be due to ignorance, disregard, or lack of appreciation of statistical principles – a consequence of the lack of statistical literacy that pervades the general population. (If the general population insisted on the use of random sampling to assess how widespread the virus is, leaders would not likely resist.) Or it could reflect concern over the limited availability of tests and a desire to devote all of these limited tests to those who show symptoms of novel coronavirus infection.

Unfortunately, this might be inadvertently helping the novel coronavirus spread. If a society does not understand the extent of infection in the general population or the virus’s infectivity, how can it prepare and optimally devote its resources to slow the spread of the virus? How does it decide what preventive measures are appropriate or necessary? How does it minimise the likelihood that the virus spreads to the point that the capacity of the hospital system is overwhelmed? Most crucially, how does it know if it is making progress or if conditions are deteriorating?

Without the evidence that a random sample of the general population would provide, we are operating in the dark. While we operate in the dark, preventable deaths will accumulate, and we will continue to take measures that are not only ineffective, but also unnecessarily costly.

Most of the world still lacks the ability to test a large number of people, and this understandably makes even those leaders who appreciate sampling hesitant to test a random sample of the general population. But the bottom line is, we need more coronavirus tests than we think we need.

We should add to this the need for a random sample of tests of antibodies. Perhaps we’ll have some better numbers now that states are opening up and having to test  employees.
[1] The journal comes out every other month; this is the first with a large section devoted to coronavirus. 
Categories: random sample

Post navigation

13 thoughts on ““On the Importance of testing a random sample (for Covid)”, an article from Significance magazine

  1. Stanley Young

    Many leaders knew. There be crooks. I will send you things later. I travel now. Stan

    Sent from my iPhone


  2. Miguel

    In Spain a prevalence study with more than 60000 antibody tests has been performed. I have read comments complaining that the number of tests is ridiculous and all the population should be tested, but I do not think it is about statistical literacy but political bigotry.

  3. Christian Hennig

    If a country like Italy can test 70,000 people on any normal weekday, how is it a problem for the US to test a random sample of size, say, 10,000? Bizarre, bizarre, bizarre.

  4. Given the stakes involved, I would be surprised (and hugely disappointed) that random sample testing hasn’t been done yet!

    • F. E. Guerra-Pujol: So are you saying that you presume that random sample testing has been done in the U.S.? Do you have a link? The article in Significance just came out, although there’s always a lag of a few weeks. I haven’t found any, but maybe it’s currently in the works. Now they would need to do both diagnostic & antibody testing.

      • Hello! No, what I meant to say is that it is absolutely insane that random sample testing has not been at all anywhere in the U.S. How can we truly know what the scope of the problem is if we don’t even know what the base rate of infection is? I would have thought that that is what the CDC and various State public health departments are for? If these agencies have not conducted any random sample testing, they have completely abandoned their mission and any pretense that they are “doing science”!

  5. Universities are opening, mostly with a hybrid of in-class and remote teaching. I know that’s so at Virginia Tech & U. VA. But we don’t hear anything about tests being done on students and facultyat least those who will be on campus.

  6. Maybe sewage systems can provide some kind of measurement of spread: https://www.theverge.com/21283825/sewer-systems-coronavirus-seawge-data-warning-signs-cities.

  7. David Chorlian

    1. There seems to be a clinical, as opposed to epidemiological, bias against tests which are not diagnostic. This is based on casual reading of the news, not any detailed evidence. I get the sense that this is much more an FDA emphasis than a CDC emphasis, based on other casual reading of the news.

    2. I also got the impression that the FDA shut many researchers out of the loop; most notorious was the shutdown of investigation in Washington state at the beginning of the outbreak.

    3. Many weeks ago, Daniel Lakeland argued for large scale pooled tests to provide prevalence testing on Andrew Gelman’s blog. (Sorry, I am not going attempt to locate this now.) There was some interesting discussion both on the science and the practical issues involved. I can’t believe that others did not have this idea, but it never surfaced in public discussion, as you note.

    • David: Yes I read that post on polled testing on Gelman’s blog, and I’m fairly sure that I read somewhere that this had been done some place.
      On your point #2, I had felt, early on, that private individuals and universities (backed by individuals keen to have this happen) should have formed a network of tests. I don’t think it’s too late.

      Another thing is antibody testing. From what I’ve read, and all the articles I’ve seen refer back to the early study in China, it would be possible for someone to have covid, and by the time they might be tested for antibodies, no longer have them. Lost data. (Learning that they had had it and that the case was mild or asymptomatic, might well influence their decision to be vaccinated at some point.)

  8. Steven McKinney

    Finally! This is the first population-level random testing I have found, and I’ve been on the lookout for weeks.

    IUPUI (Indiana University which merged with Purdue) School of Public Health is currently running a state-wide population testing effort, the “COVID-19 Random Sample Study”. Thank you IUPUI!

    Their reports are well worth reading. They are not huge reports, and many news outlets have described the findings.

    In mid-May, they had tested about 4,600 state residents. 1.7% tested positive for COVID-19. 44.8% of the positive cases were asymptomatic.

    In mid-June, they announced a second lot of findings of 3,619 state residents. 0.6% tested positive, and of the positive cases, 43% were asymptomatic.

    These data are in line with the findings of the census testing done in the Italian village Vo where all three-thousand-or-so residents were tested and they noted that of their cases, about half were asymptomatic.

    I’ll be on the lookout for more population random testing efforts, but at this point random and census population surveys show that about half of COVID-19 cases are asymptomatic, not 90% or higher as some other non-random convenience samples have announced, including one report from the CDC.


  9. Steven McKinney

    Mayo: I see that the IUPUI study is reported in the Annals of Internal Medicine review you describe above.

    I need to read your blog more often for useful COVID-19 information, I’m too busy scouring news sources and you had the goods right here.

    I am stunned to hear that you had statistical and medical people questioning the value of population-level surveys of COVID-19 penetrance. Population-level studies are the only way we will truly understand how this virus spreads in the wild.

Blog at WordPress.com.