random sample

“On the Importance of testing a random sample (for Covid)”, an article from Significance magazine


Nearly 3 months ago I tweeted “Stat people: shouldn’t they be testing a largish random sample of people [w/o symptoms] to assess rates, alert those infected, rather than only high risk, symptomatic people, in the U.S.?” I was surprised that nearly all the stat and medical people I know expressed the view that it wouldn’t be feasible or even very informative. Really? Granted, testing was and is limited, but had it been made a priority, it could have been done. In the new issue of Significance (June 2020) that I just received, James J. Cochran writes “on the importance of testing a random sample.” [1] 

In the United States (as of 9 April 2020), President Donald Trump has said that testing for novel coronavirus infection will be limited to people who believe they may be infected. But if we only test people who believe they may be infected, we cannot understand how deep the virus has reached into the population. The only way this could work is if those who believe they may be infected are representative of the population with respect to novel coronavirus infection. Does anyone believe this is so? The common characteristic of those who believe they may be infected is that they all show some outward symptoms of infection by the virus. In other words, people who are being tested for the novel coronavirus are disproportionately showing severe symptoms. This would not be a problem if someone who is infected by the novel coronavirus immediately shows symptoms, but this is not the case. We have strong evidence that some people develop mild cases, show no symptoms, and carry the virus without knowing it because they are asymptomatic. Thus, efforts to understand the virus’s penetration into the population must include observation of the asymptomatic.

Indeed, a recent assessment (the Annals of Internal Medicine) is that at least 40% of people with covid 19 are (and remain) asymptomatic. (An overview is in Time). Oddly, while remaining asymptomatic, some still show damage to the lungs or other organs. 

The estimate of the proportion of the population who are infected can be calculated as:


So, we need data from a random sample of the entire population in order to gather data from infected people who are showing symptoms, infected people who are asymptomatic, and people who are not infected. All have some probability of being included in a true random sample of the population.

As of 23 April, leaders in Germany and New York State (see bit.ly/2Kp2iXd and dailym.ai/3bxZ5Au) had moved to implement random testing to assess how widespread the virus is, but there has been resistance from leaders elsewhere. This could be due to ignorance, disregard, or lack of appreciation of statistical principles – a consequence of the lack of statistical literacy that pervades the general population. (If the general population insisted on the use of random sampling to assess how widespread the virus is, leaders would not likely resist.) Or it could reflect concern over the limited availability of tests and a desire to devote all of these limited tests to those who show symptoms of novel coronavirus infection.

Unfortunately, this might be inadvertently helping the novel coronavirus spread. If a society does not understand the extent of infection in the general population or the virus’s infectivity, how can it prepare and optimally devote its resources to slow the spread of the virus? How does it decide what preventive measures are appropriate or necessary? How does it minimise the likelihood that the virus spreads to the point that the capacity of the hospital system is overwhelmed? Most crucially, how does it know if it is making progress or if conditions are deteriorating?

Without the evidence that a random sample of the general population would provide, we are operating in the dark. While we operate in the dark, preventable deaths will accumulate, and we will continue to take measures that are not only ineffective, but also unnecessarily costly.

Most of the world still lacks the ability to test a large number of people, and this understandably makes even those leaders who appreciate sampling hesitant to test a random sample of the general population. But the bottom line is, we need more coronavirus tests than we think we need.

We should add to this the need for a random sample of tests of antibodies. Perhaps we’ll have some better numbers now that states are opening up and having to test  employees.
[1] The journal comes out every other month; this is the first with a large section devoted to coronavirus. 
Categories: random sample | 11 Comments

Blog at WordPress.com.