National report — Misuse of statistics is common in the dermatology literature, according to the findings of a study conducted by researchers from Wake Forest University School of Medicine.
To determine the prevalence of statistical errors in published papers reporting on dermatology research, the investigators undertook a review of original studies published in the Journal of the American Academy of Dermatology and the Archives of Dermatology between January and December 2003. Among 207 studies published, only 155 (75 percent) included statistical analysis. Within that latter group, 59 (38 percent) contained errors in statistical methods or presentation.
Most of the errors were considered to be minor. However, in more than one-third of the papers with an error (39 percent), the statistical test used was improperly chosen considering the dataset size or type, reports Julie A. Neville, M.D., resident, department of dermatology.
Dr. Neville's collaborators in this review were Alan B. Fleischer Jr., M.D., professor and chairman, department of dermatology, and Wei Lang, Ph.D., a statistician in the department of public health sciences. The group conducted the review because previous studies focusing on literature pertaining to other medical specialties had identified statistical errors in 45 percent to 95 percent of published papers.
Their review of the dermatology papers determined that the most commonly used statistical tests in the 155 articles were the chi-squared test (30 percent), unpaired t-test (19 percent), Analysis of Variance (ANOVA) (17 percent) and Fisher's exact test (15 percent). The tests most often applied incorrectly were the unpaired t-test, which was improperly chosen as the statistical method in 11 (38 percent) of the papers in which it was used, and the paired t-test, which was used incorrectly in five (31 percent) of 15 papers.
Use of a parametric test on potentially non-normal data, which occurred in 16 (10 percent) of the 155 papers, was by far the most common type of error. Use of the chi-squared test rather than Fisher's exact test when analyzing small sample sizes was identified as an error in three (2 percent) papers. Notably, both of those types of misuse have the potential to impact the study results, Dr. Neville says.
Missing information Review of the statistical methods descriptions in the published papers identified that the majority of papers (59 percent) did not mention the statistical software package used for the analysis. Only a small minority of the studies (10 percent) included information on statistical power.
"We did not include those findings as errors in our study. However, in certain cases, the data analyses can be affected by the statistical package chosen. In addition, it is important for readers to know if the study was adequately powered to detect statistical significance, although in reality, most studies with negative findings are not published," Dr. Neville says.
To address the flaws and deficiencies they identified, Dr. Neville and her collaborators believe it might be helpful for a knowledgeable statistical reviewer to be included in the peer-review process. In addition, they support expansion of statistical guidelines in dermatology journal author checklists for manuscript submission.