General Dermatology
Eczema
Alopecia
Aesthetics
Vitiligo
COVID-19
Actinic Keratosis
Precision Medicine and Biologics
Rare Disease
Wound Care
Rosacea
Psoriasis
Psoriatic Arthritis
Atopic Dermatitis
Melasma
NP and PA
Skin Cancer
Hidradenitis Suppurativa
Drug Watch
Pigmentary Disorders
Acne
Pediatric Dermatology
Practice Management

Pediatric Dermatologists Outperform Artificial Intelligence; ChatGPT Demonstrates Comparability in Some Aspects

May 14, 2024

News

Article

In a comparison of pediatric dermatologists versus AI, dermatologists primarily exhibited greater performance.

Artificial intelligence-based tools (AITs) such as OpenAI's Chat Generative Pre-trained Transformer (ChatGPT) have developed a growing importance in medical applications. These tools have demonstrated the ability to predict patient outcomes and adverse events associated with treatment, as well as the capability to interpret imaging or lab results, among others.¹

Aware of these capabilities and the ever-expanding role of AITs in the medical field, researchers Huang et al sought to assess the knowledge and clinical diagnostic capabilities of ChatGPT iterations 3.5 and 4.0 via a comparison of pediatric dermatologists.

In the study, published in Pediatric Dermatology, researchers found that on average, pediatric dermatologists predominantly outperformed AITs in multiple-choice, multiple-answer, and case-based questions.² However, results of the study also demonstrated that ChatGPT, specifically version 4.0, often exhibited comparability in some aspects, including in multiple-choice and multiple-answer questions.

Dermatologist examining wart on the hand of a pediatric patient

Image Credit: © GRON777 - stock.adobe.com

Background and Methods

Researchers developed a test of 24 text-based questions, including 16 multiple-choice questions, 2 multiple-answer questions, and 6 case-based questions; case-based questions were free-response.

Questions were developed based on American Board of Dermatology 2021 Certification Sample Test and the “Photoquiz” section of the journal Pediatric Dermatology, and all questions were first processed through ChatGPT's web interface as of October 2023.

Researchers utilized a 0 to 5 scale common for the evaluation of AITs to evaluate and grade case-based questions. Reviewers of responses were blinded to respondents' identities.

Findings

A total of 5 pediatric dermatologists completed the questions posed by researchers, with an average of 5.6 years of clinical experience shared between them.

On average, pediatric dermatologists scored 91.4% on multiple-choice and multiple-answer questions, while ChatGPT version 3.5 demonstrated an average score of 76.2%, giving pediatric dermatologists a significantly greater advantage. However, when compared to ChatGPT version 4.0, results were considered comparable, with iteration 4.0 achieving an average score of 90.5%--just 0.9% less than that of the clinicians.

On average, clinicians performed better than AI on case-based questions with a score of 3.81, while ChatGPT v.3.5 scored an average of 3.53. On average, case-based question scoring for pediatric dermatologists was not significantly greater than ChatGPT v.4.0.

Using these findings as a basis, Huang et al developed a differential best practices list of "dos and don'ts" for clinicians.

They recommend that clinicians DO:

Use ChatGPT to brainstorm a differential diagnosis
Provide detailed and relevant information while maintaining patient privacy
Fact check ChatGPT's responses using reputable sources for medical information
Stay updated on legal and institutional policies surrounding the use of AI tools in health care

They recommend that clinicians DO NOT:

Rely on ChatGPT to provide the single, best diagnosis
Succumb to anchoring bias as a result of ChatGPT's responses
Immediately accept ChatGPT's responses as medical facts
Enter HIPAA-protected information into AI tools like ChatGPT that are not HIPAA-compliant

Conclusions

Researchers recommended that dermatology clinicians become more familiar with AIT tools as their accuracy continues to advance and improve, noting that they may serve as useful for fact-based questions and case-based materials.

Though these results are promising, they noted that further research is necessary to better understand the role of ChatGPT in clinical knowledge and reasoning.

Limitations of the study, as posed by researchers, include the potential for changing reproducibility of the results and the potential for prior exposure of pediatric dermatologists to questions and cases utilized within the study.

"While clinicians currently continue to outperform AITs, incremental advancements in the complexity of these AI algorithms for text and image interpretation offer pediatric dermatology clinicians a valuable addition to their toolbox," according to Huang et al. "In the present circumstance, generative AI is a useful tool but should not be relied upon to draw any final conclusions about diagnosis or therapy without appropriate supervision."

References

Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. 2023; 388(13): 1201-1208. doi:10.1056/nejmra2302038
Huang CY, Zhang E, Caussade MC, Brown T, Stockton Hogrogian G, Yan AC. Pediatric dermatologists versus AI bots: Evaluating the medical knowledge and diagnostic capabilities of ChatGPT. Pediatr Dermatol. May 9, 2024. Accessed May 13, 2024. doi:10.1111/pde.15649