Diagnostic value of chest computed tomography imaging for COVID-19 based on reverse transcription-polymerase chain reaction: a meta-analysis

Background The computed tomography (CT) diagnostic value of COVID-19 is controversial. We summarized the value of chest CT in the diagnosis of COVID-19 through a meta-analysis based on the reference standard. Methods All Chinese and English studies related to the diagnostic value of CT for COVID-19 across multiple publication platforms, was searched for and collected. Studies quality evaluation and plotting the risk of bias were estimated. A heterogeneity test and meta-analysis, including plotting sensitivity (Sen), specificity (Spe) forest plots, pooled positive likelihood ratio (+LR), negative likelihood ratio (-LR), dignostic odds ratio (DOR) values and 95% confidence interval (CI), were estimated. If there was a threshold effect, summary receiver operating characteristic curves (SROC) was further plotted. Pooled area under the receiver operating characteristic curve (AUROC) and 95% CI were also calculated. Results Twenty diagnostic studies that represented a total of 9004 patients were included from 20 pieces of literatures after assessing all the aggregated studies. The reason for heterogeneity was caused by the threshold effect, so the AUROC = 0.91 (95% CI: 0.89–0.94) for chest CT of COVID-19. Pooled sensitivity, specificity, +LR, -LR from 20 studies were 0.91 (95% CI: 0.88–0.94), 0.71 (95% CI: 0.59–0.80), 3.1(95% CI: 2.2–4.4), 0.12 (95% CI: 0.09–0.17), separately. The I2 was 85.6% (P = 0.001) by Q-test. Conclusions The results of this study showed that CT diagnosis of COVID-19 was close to the reference standard. The diagnostic value of chest CT may be further enhanced if there is a unified COVID-19 diagnostic standard. However, please pay attention to rational use of CT. Graphic Abstract


Background
Coronavirus disease 2019 (COVID- 19) is an acute infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and is primarily characterized by pulmonary inflammatory lesions. It can also cause damage to the digestive system [1][2][3], nervous system [4,5], urinary system [6] and other systemic organs, with corresponding clinical symptoms. COVID-19 was identified as the health emergency of national concern by World Health Organization on 30 January, 2020 since its initial detection in December 2019 [7]. COVID-19 has highly transmissible [8]. Current clinical studies show that there are no specific therapeutic drugs for treatment [9].
The reference standard for diagnosing COVID-19 is reverse transcription-polymerase chain reaction (RT-PCR) [10]; however, it suffers from numerous flaws, such as poor sensitivity [11], producing false negatives, etc., all of which have also been noted in the studies [12,13]. A study indicated that RT-PCR was not a perfect reference standard [14]. What's more, a succession of countries [15] had reported the discovery of COVID-19 variant strains, and the proportion is increasing. The most feared of the COVID-19 variants could accelerate the spread of the disease. Researchers at Vita's laboratory in Helsinki, Finland considered that RT-PCR may not be able to detect the mutated virus (http:// www. zhihu. com/ quest ion/ 44521 7928). Feng et al. pointed out that the abundance of genomic data made it possible to reassess the applicability of RT-PCR to ensure that it was applicable to mutant strain [16].
Computed tomography (CT) played an important role in the global fight against COVID-19 [11]. Some studies have shown that CT can be used as a primary auxiliary screening and diagnostic tool in pandemic areas [17][18][19]. The COVID-19 pandemic has put enormous pressure on global health resources, which creates an urgent need for researchers to find a reasonable diagnosis. In this metaanalysis, we aim to summarize the diagnostic value and existing problems of CT.

Search strategy
We

Selection criteria
Inclusion criterias were as follow-up: (1) retrospective or prospective diagnostic test study; (2)  Literatures were excluded if they were (1) duplicates, case reports, reviews, systematic reviews, meta-analyses, conference abstracts, letters, non-original studies, and unrelated studies; (2) failure to extract the 2 × 2 table data; (3) the full text was unavailable; (4) not confirmed by the reference standard.

Data extraction and quality assessment
Two authors independently screened the studies for quality assessment and data extraction according to the inclusion and exclusion criteria, and disagreements were resolved by reaching a consensus or consulting a third senior specialist. The extracted data included: (1) First author, study site, study time, publication year, age, gender, sample size, study type (prospective, retrospective), and case selection; (2) Diagnostic reference standard, i.e., RT-PCR; (3) Outcome indicators included the sensitivity (Sen), specificity (Spe), accuracy (Ac), negative predictive value (NPV), positive predictive value (PPV), TP, FP, FN, and TN of diagnosis by chest CT, which were used to build the 2 × 2 table.
A quality evaluation was done in the diagnostic test of Review Manager 5.3 (the Cochrane Collaboration, London, UK), and for the requirements of Quality Assessment of Diagnostic Studies-2 (QUADAS-2), the evaluation rules applicable to this study were developed by referring to relevant studies [20]. On this basis, if the opinions of the two authors were still not consistent, the evaluation rules would be renegotiated to reach a consensus and further improve and supplement the preestablished evaluation rules.

Statistical analysis
Meta-disc1.4.0.0 (Clinical Biostatistics Unit, Madrid, Spain) and Stata14.0 (Computer Resource Center, Texas, US) software were used for statistical analysis. The Q test was utilized to analyze the heterogeneity between the results of each study. If P < 0.01, the difference was considered statistically significant, and the heterogeneity was quantitatively judged based on I 2 value. If I 2 > 50%, heterogeneity was considered significant, and the random effects model was selected. Otherwise, the fixed effects model was selected; Spearman's rank correlation coefficient and summary receiver operating characteristic curve (SROC) between the logarithm of sensitivity and the logarithm of (1 − specificity) were used to determine whether there was a threshold effect. Forest plots of Sen and Spe were made to calculate the pooled positive likelihood ratio (+LR), negative likelihood ratio (-LR), dignostic odds ratio (DOR) values and 95% confidence interval (CI); if there was a threshold effect, SROC was further plotted to calculate the pooled area under the receiver operating characteristic curve (AUROC). Publication bias was assessed with Deeks' plots.

Literature search
The flow chart for the screening of included literatures is shown in Fig. 1. According to the search strategy, a total of 6537 pieces of literatures were initially screened; 3330 duplicates were excluded by the management application; 214 pieces of literatures of meta-analysis, systematic review and review were excluded; 134 pieces of literatures of case report were excluded; 1 piece of literature of animal experiment was excluded. In addition, 2791 pieces of literatures with irrelevant content were excluded after the titles and abstracts were reviewed; 42 pieces of literatures from which the 2 × 2 table could not be obtained were excluded after a review of the full text; 3 pieces of literatures were error in original data; 2 pieces of literatures without full text. Finally, 20 pieces of literatures were included, which were from China, Italy, France, United Kingdom, the Netherlands, Brazil and India.

Quality evaluation
The basic characteristics of the included 20 literatures are shown in Table 1. These literatures were produced by emergency departments or fever clinics in different regions.
The results of QUADAS-2 are shown in Fig. 2. In 9/20 pieces of literatures, the flow and timing were evaluated as high risk because these studies did not include all the cases, which might cause bias. In 8/20 pieces of literatures, the risk of bias with regards to the reference standard was unclear because these studies did not specifically describe whether repeat testing was performed for the first negative RT-PCR test. In 5/20 pieces of literatures, the risk of the index test was unclear because the literatures did not specifically show whether the film was read through the blind method. In 2/20 pieces of literatures, the high risk of patient selection was mainly because all the patients in the studies could be definitively diagnosed by the index test and the lack of patients that were difficult to diagnose. Furthermore, in 2/20 pieces of literatures, the risk of patient selection was unclear because of the absence of explanatory notes confirming the presence of difficult to diagnose patients.  Fig. 3. The positive likelihood ratio (+LR) and negative likelihood ratio (-LR), the likelihood ratio scatter plot, the pre-test probability/ post-test probability is shown in Fig. 4. It can be seen that the misdiagnosis rate and missed diagnosis rate of the index test for the included literatures used for this study are small. In this study, the pre-test probability by the physician was 20%, the post-test positive predictive value was 44%, and the post-test negative predictive value was 3%, indicating that the physician's judgment of the patient is important. This also indicates that if the pre-test probability is increased, then the index test can proficiently predict the disease.

Threshold effect analysis
The Spearman's correlation coefficient between the logarithm of sensitivity and the logarithm of (1 − specificity) was determined to be r = 0.681 (P = 0.001), indicating significant correlation and threshold effect heterogeneity. SROC was plotted, showing a "shoulder-arm"-like change. Therefore, Sen and Spe could not be combined alone in this study. In order to further to reflect the real diagnostic accuracy of CT, AUROC was calculated to be 0.91 (95% CI: 0.89-0.94). The results (AUROC > 90%, and P < 0.05) indicate that CT has a diagnostic accuracy close to the reference standard, and it is basically capable of making a correct diagnosis. SROC is shown in Fig. 5.

Sensitivity analysis and Deeks' publication bias test
It can be clearly seen from Fig. 6a to d that there is strong sensitivity in one original study (No. 14) [34].
It can be clearly seen from the results in Fig. 6e that P = 0.57, which indicates that the funnel plot is symmetrical. Therefore, it can be determined that there is no publication bias in this study, and the conclusions of this study are accurate and reliable.

Discussion
To investigate the value of CT imaging for the diagnosis of COVID-19, 20 diagnostic studies were included in this study. The results prove that the diagnostic value of CT is close to the reference standard. At present, there is disagreement about the role of CT in the diagnosis of COVID-19. Some scholars believed that [18] the chest CT, which served as the primary screening or diagnostic method, had a high false positive rate in areas with low incidences of COVID-19. Others have suggested that [17] chest CT scans could be used as a primary screening and diagnostic tool for COVID-19. COVID-19 Diagnosis and Treatment Protocol (Trial 5th Revised Version) [41] indicated that suspected cases possessing imaging features of pneumonia in Hubei Province as being clinically diagnosed COVID-19. In addition, the progress of COVID-19 is still unclear, and the continuous emergence of mutant strains has puzzled researchers. CT provides macroscopic imaging changes that are not affected by viral mutations, advances in CT thin-slice and reconstruction techniques have increased the detection rate of the disease. At present, asymptomatic infection is increasing and the harm is huge. Chen et al. [42] showed that chest CT was helpful for early detection of asymptomatic infected persons. At the same time, CT provides a visual image representation for the follow-up comparison of diseases. It is still necessary to pay attention to the amount of radiation from CT. Fortunately, it has been reported on the study that the use of isocenter can ensure the same image quality and reduce the radiation of CT [43]. In addition, infection prevention and control management of CT examination rooms is critical.
The results of this study indicate that the heterogeneity is caused by the threshold effect. Combined with the quality evaluation results of QUADAS-2 in this study, there are high risk of bias and applicability concerns in the flow and timing, patient selection. Analyzing the causes of heterogeneity, the authors believe that it depends on the disease prevalence, the characteristics of the study population, diagnostic criteria, etc. In this study, we conducted a CT screening examination of COVID-19 based on the patient population in the emergency department and fever clinic of various hospitals during the epidemic period, which also included some hospitals that isolated and treated suspected patients, so there was a certain patient selection bias. In the included studies, the guidelines or consensus for CT diagnosis of COVID-19 were inconsistent, as were the qualifications and specialty of the diagnosing physician. All these may be the reasons for the threshold effect (heterogeneity) in this study.
The COVID-19 pathogen is affected by various factors and has a low positive detection rate. This indicated that nucleic acid test may produce false negative results [44,45]. Among the studies, RT-PCR tests have been repeated on patients with negative results, which can reduce the probability of a false negative to a certain extent. We defined positive results and all negative results after the first or multiple RT-PCR tests as COVID-19 positive and COVID-19 negative, respectively. In particularly, multiple RT-PCR tests are necessary for highly suspected patients with negative RT-PCR. The bias of the reference  standard was minimized. The main problem at hand for countries with increasing case numbers is the lack of appropriate screening and diagnostic facilities [46]. A recent summary of clinical observations on COVID-19 in China revealed that some patients had positive RT-PCR later than the onset of the clinical symptoms themselves. However, they had CT imaging changes or related clinical symptoms when RT-PCR tests were negative, and due to the failure to produce a timely diagnosis, some patients progressed rapidly, which affected the prognosis. It has been more than a year since COVID-19 was first confirmed. Further standardization and unification of CT implementation by experts in related fields to improve the pre-test probability of COVID-19 by front-line physicians will benefit doctors and patients worldwide. At the same time, experts on countries need to cooperate in multiple ways. First, a unified diagnostic standard for CT is significant. Up to now, scholars are still exploring and studying. A meta-analysis on evaluating the diagnostic value of CT using methods such as the COVID-19 Reporting and Data System (CO-RADS) and the Radiological Society of North America (RSNA) Classification System [14]. Smet et al. studied the COVID-19 Reporting and Data System (CO-RADS) diagnostic power of CT [47]. China has also published expert consensus on COVID-19 imaging diagnosis [48]. Then, radiologists need to be professionally trained in these guidelines or consensus. Finally, we should pay attention to standardize the visiting procedures for clinically suspected patients and close contacts by screening the epidemiological history and collecting the history of relevant clinical symptoms, such as fever, cough, myalgia or fatigue, dyspnea, expectoration and chest tightness, diarrhea, nausea or vomiting, abdominal discomfort or pain as well as loss of appetite and olfactory dysgeusia [49]. In conclusion, we should conduct nasopharyngeal/oropharyngeal swab tests for suspected patients and CT examinations for RT-PCR-negative patients. Patients suspected of COVID-19 via the initial results of a CT diagnosis should be isolated and the diagnosis should be further confirmed by multiple RT-PCR tests or specifically RT-PCR tests for lower  respiratory tract specimens. Antibody results could also be used to provide further confirmation [45].
There are some limitations of this study. Firstly, there are differences in the regional and medical policies of COVID-19, and the studies we included may not represent the characteristics of the general population for the area in which the study was based. Secondly, most of the studies we included are retrospective studies, and prospective study designs have relatively small sample sizes. Thirdly, the initial inexperience of front-line physicians may have affected the diagnostic performance of CT. Fourthly, given that most of the studies are retrospective, we are concerned about the thoroughness of the blind diagnostic results of CT and RT-PCR. Finally, there may be bias in RT-PCR due to the different reference standards, kits used, sampling methods and target genes detected in each country.

Conclusions
In this study, we mainly summarized the value of CT in diagnosing COVID-19, but it also had some limitations. As a rapid and intuitive means of diagnosing chest disease, CT is recommended for rational use for PCR-negative patients with high suspicion of COVID-19. In the event of a global pandemic, the combination of CT and RT-PCR can quickly and accurately confirm a diagnosis and interrupt the transmission chain.