Compared with the previous EczemaNet1, EczemaNet2 was more accurate in its automated detection of atopic dermatitis lesions.
The quality and robustness of eczema lesion detection increased by approximately 25% and 40%, respectively, when using EczemaNet2 compared with the previous model, EczemaNet1, according to a study published in JID Innovations.1
Objective atopic dermatitis (AD) severity assessments are being increasingly used to standardize comparisons, reduce detection biases, and determine eligibility for systemic medicines. Although a complete face-to-face skin examination is often desired for an objective AD severity assessment, the researchers noted that it is resource intensive for both the patient and trained assessor.
Because of this, they explained that an automated remote assessment of AD severity with digital images would “enable and standardize the remote assessment of AD severity and reduce the inter- and intra-observer variability.” On the other hand, researchers explained that past models are not fully automated, like the EczemaNet1 model.2
EczemaNet1’s 2 main components were the region of interest (RoI) detection model and the severity prediction model.1 They noted that EczemaNet1 was unsuccessful as the RoI detection model was trained on crops obtained from nonexperts, its outputs being rectangular bounding boxes of the detected AD regions that did not provide information on their shapes, which limited its assessment performance.
Consequently, their study aimed “to develop more accurate and robust computer vision algorithms for AD region detection” to establish a reliable AD severity assessment. To do so, the researchers created EczemaNet2, an improved version of EczemaNet1, that used a modified RoI detection model with a standard pixel-level segmentation U-Net to produce skin and AD segmentation masks. They added 3 postprocessing steps to extract crops for the severity prediction model, which included “a border-following algorithm to generate rectangular crops, square cropping to extract square crops without distortion, and adding surrounding non-AD skin pixels to the AD skin pixels in the square crops.”
The researchers compared the new and old models using 1345 AD region photographs previously used in the Softened Water Eczema Trial from 287 children aged 6 months to 16 years3; 4 dermatologists segmented the AD regions from each photo at the pixel level in finer resolution than those used in EczemaNet1.1 Based on the Fitzpatrick skin phototype, 226 patients (1031 photographs) were types 1 and 2, 41 (182 photographs) were types 3 and 4, and 20 (132 photographs) were types 5 and 6.
They thoroughly evaluated the quality and robustness of RoI detection within each machine’s RoI models, as well as their downstream severity prediction skills. First, the researchers evaluated the quality of RoI detection for AD vs non-AD regions within each image. Detection quality was evaluated using the F1 score and precision, which is “the harmonic mean of precision and recall, whereas precision is the fraction of correctly classified pixels in the predicted mask, and recall is the reaction of correctly classified pixels in the ground truth mask.” The scores ranged from 0 to 1, 1 indicating perfect segmentation accuracy.
The researchers found that both the EczemaNet2 RoI detection model trained on skin segmentation masks and the model trained on pixel-level AD segmentation masks achieved a better detection quality than that of EczemaNet1. They noted that the improvement in AD detection was statistically significant (with P = .021 in paired t test), but the detection performance remained moderate, with an average precision and F1 score of approximately 60% across images.
Next, the researchers evaluated the robustness of RoI detection using the intersection over union (IoU) metric, which reflects how robust the model prediction is on differing imaging conditions; it ranges between 0 and 1, with a score of 1 indicating a perfect overlap. The perturbations applied included blurring, noise, and brightness changes. Considering all perturbations, they found that RoI detection in EczemaNet2 was significantly more robust than that of EczemaNet1 (P = .014 in paired t-test).
Lastly, the researchers investigated the impact of the RoI model configurations on the downstream severity prediction task. To do so, they “conducted 10-fold cross-validation with a 90:10 train/test split, stratified on patients, and computed the root mean square error of the mean prediction across test images.” Consequently, the researchers evaluated their performances in predicting the scores of the following AD severity assessments: Eczema Area and Severity Index, 3-Item Severity, and Six Area, Six Sign AD.
The researchers found that the RoI detection models in EczemaNet2 trained with AD or skin segmentation masks achieved a better performance than EczemaNet1 as it resulted in a slightly higher root mean square error for AD severity assessment. They noted that merging AD segmentation masks within the neighboring skin pixels provided the downstream severity model with informative and discriminative pattern features between the lesion and healthy skin, leading to marginal improvement in the accuracy of severity prediction (P = .091 in paired t test). On the other hand, data augmentation, either through traditional methods or Pix2Pix, did not impact the average predictive performance.
They acknowledged that their study had limitations, one being that AD segmentation masks may be unreliable due to poor agreement among dermatologists on their abilities to detect AD regions in digital images. The researchers noted that skin segmentation may be a reasonable alternative as “it may sufficiently restrict the inputs to the severity assessment model without excluding potentially informative regions in the images, assuming the images contain a priori representative sites of AD.”
Although EczemaNet2 detected eczema lesions more accurately and robustly than EczemaNet1, the researchers identified areas for further research.
“We believe that collecting more and better-quality data (images and labels) would surpass the gains in performance from using cleverer algorithms,” the authors wrote. “In particular, we emphasize the importance of ensuring that the training dataset covers images for various skin tones to limit skin color bias.”
Reference
[This article was originally published by our sister publication, AJMC.]