Artificial intelligence based automatic quantification of epicardial adipose tissue suitable for large scale population studies | Scientific Reports – Nature.com

Populations

SCAPIS is a general-population-based prospective study (www.scapis.org), to which 30154 men and women aged 50–64 years were randomly recruited from the census register at six sites (Gothenburg, Linköping, Malmö/Lund, Stockholm, Umeå, and Uppsala) between 2013 and 2018. Participants gave written informed consent and were subjecte…….

npressfetimg-3504.png

Populations

SCAPIS is a general-population-based prospective study (www.scapis.org), to which 30154 men and women aged 50–64 years were randomly recruited from the census register at six sites (Gothenburg, Linköping, Malmö/Lund, Stockholm, Umeå, and Uppsala) between 2013 and 2018. Participants gave written informed consent and were subjected to a comprehensive examination26. The study is approved (# 2010-228-31M) as a multi-center study by the ethical review board in Umeå, c/o Department of Medical Research, Umeå University, 901 87 Sweden. For the present work, only subjects enrolled at the Gothenburg site (n = 6256) were included. Totally 411 randomly selected image sets were used for training and testing the software and another randomly selected 1400 image sets were used to further test the performance of the model in a larger population. To identify which factors are associated with EATV and EAT attenuation we used the test population (n = 1400, see details below).

Study procedures and imaging

All procedures in this paper were carried out in accordance with relevant guidelines and regulations. The comprehensive study procedures in SCAPIS have been described in detail26. In the analyses we used data from non-contrast CT images, physical examinations and routine laboratory tests.

Briefly, all imaging in SCAPIS was performed using the same CT-scanners and protocols, Siemens Somatom Definition Flash with a Stellar detector (Siemens Healthcare, Forchheim, Germany). Care Dose 4D was used for dose optimization. Image acquisition was ECG-gated, with tube voltage of 120 kV, and refmAs of 80. The images have a matrix of 512 × 512 voxels in the axial plane, with a square DFOV in the range of 170–200 mm. All images were reconstructed using the B35f. HeartView medium CaScore algorithm, generating a slice thickness of 3 mm, with 50% overlap between slices.

Development of CNN models and datasets used

For the estimation of EATV we developed two CNN models which work in series. The first model, “EAT-Net” outputs a segmentation of the EAT voxels inside the pericardium, enabling calculation of EATV and EAT attenuation. The second model, “Crop-Net” estimates any missing EATV in cases where the heart is not fully represented in the image set. This problem is fairly common in the SCAPIS cohort, since the smallest possible scan volume was used in order to minimize radiation doses, increasing the risk of incomplete, cropped heart images due to patient and radiographer related issues. To develop EAT-Net, a total of 411 unique and randomly selected image sets were used (training, n = 308, validation, n = 78 and testing n = 25). In a further test of EAT-Net, another 1400 unique and randomly selected image sets were segmented by EAT-Net and visually evaluated to identify failed segmentations. To develop Crop-Net, a total of 866 image sets were selected from the dataset used for visual evaluation of EAT-Net. Crop-Net was then tested on a subset of the data used for developing EAT-Net (n = 55). The general design of the models is shown in Fig. 1.

Figure 1

Basic design of the two CNN models which work in series. EAT-Net outputs segmentations of the heart, from which epicardial adipose tissue (EAT) volume, and EAT attenuation values are calculated. Crop-Net outputs an estimation of the fraction of EAT volume that is missing in incomplete image sets.

EAT-Net

EAT-Net is a fully convolutional neural network trained on large patches of the image and it works in a striding window fashion to segment the full image volume. Training and inference were performed using the Tensorflow27 and Keras28 frameworks. For all training steps an 80%/20% split between training and validation sets was used. All model development and hyperparameter tuning was done on the validation set and the performance of the final model was evaluated on a separate test set.

All annotations were performed by the same expert thoracic radiologist (author DM) with more than five years of experience in thoracic radiology, whose reading has previously been bench-marked against another expert reader with excellent inter-reader reproducibility (Dice coefficient for EAT = 0.9)19. Annotations and visual evaluation were performed in the cloud-based platform RECOMIA29.

When performing the annotations, a continuous line representing the pericardium was drawn in each axial slice covering the heart. The reader was free to change the window-settings, and magnification. If the pericardium was not clearly visible in parts of the actual slice, a decision was made where the pericardium was most probably located based on the neighboring slices. Training of pericardium segmentation was performed with two classes, “background” (all voxels outside pericardium) and “heart” (all voxels inside the pericardium). EAT-Net was initially trained on 29 cases where the pericardium was annotated manually in all image slices (about 70 slices per heart) by a single expert reader (author DM). The first version of EAT-Net was used to segment novel cases, which were reviewed by an expert reader (DM) and image slices with significant segmentation errors were corrected by manual annotation. The manual annotations produced in this step were used as training data in the next training session, generating a new version of EAT-Net. The process with manual correction and retraining was iterated, until the training set consisted of a total of 308 subject cases.

After finalizing training of the classes background and heart, we introduced a third class, “non-adipose tissue inside heart” to train the model to recognize areas within the pericardium certainly not containing EAT. The following procedure was used: in 30 of the training image sets, a continuous line was drawn on all slices including parts of the heart muscle and most of the heart chambers and EAT-Net was trained to segment this third class.

EAT-Net has an input size of 288 × 288 × 64 voxels and works with voxel dimensions of 0.33 × 0.33 × 1.5 mm. Data augmentation was used in all steps to generate additional training data for EAT-Net by artificially modifying the images in the following ways; (i) the HU values were varied between − 100 to + 100 (for the full patch), (ii) the patch was rotated between − 0.15 and + 0.15 radians, (iii) the patch was scaled from − 10 to + 10% in size. During training of EAT-Net, input images were randomly cropped, from any direction, to any extent, until reaching the limits of the annotated pericardium, beyond which no further cropping was done. Categorical cross entropy was used as loss function and the optimization was performed using the Adam method with Nesterov momentum.

The last part of EAT-Net is a softmax activation resulting in a score between 0 and 1 for all three classes. For each voxel the class with the highest score was chosen.

A post-processing step was applied after segmentation by EAT-Net in which the largest connected volume of the classes “heart” and “non-adipose tissue inside heart” was assumed to be the true heart. Voxels in direct contact with the index voxel, i.e. the 26 surrounding voxels in a 3 × 3 × 3 kernel, were defined as connected. Any smaller volumes of connected voxels were set to background to remove spurious voxels. The final prediction of EATV was made by selecting all voxels classified as “heart” within the Hounsfield range [− 190, − 30]. Voxels classified as “non-adipose tissue inside heart” were excluded. For segmentation using EAT-Net, no cropping was done during prediction. The entire image was processed by the network in a sliding-window manner.

Crop-Net

Crop-Net was developed to impute missing information in image sets due to cropping, which was almost exclusively seen as incomplete representation of the heart in the superior or inferior parts of the image stacks, attributable to improper positioning or selection of scan areas during image acquisition. We hypothesized that an individual’s total EATV can be estimated from the information contained in only a few axial slices covering the center of the heart. To generate the Crop-Net training set, we selected image sets with a complete representation of all aspects of the pericardium and close to perfect segmentation (n = 866) from the dataset used for visual evaluation of EAT-Net (n = 1400). To simulate missing slices in the input CT image, a random number of image slices were then cropped away from the inferior or superior part of the image stack and Crop-Net was trained to predict the fraction of EATV that was missing.

To develop Crop-Net we used a CNN structure inspired by ResNet1830 but with 3D convolutions and down-sampling of layers as well as valid padding for all layers. Crop-Net takes an input of size 230 × 230 × 116 voxels and outputs a single value. For each training sample, the image stack was resampled to 0.75 × 0.75 × 1.5 mm per voxel to ensure that the entire pericardial sac fits within the input volume of the model. In addition, the examination was cropped around the pericardium with margins of (± 10, ± 10, ± 5) voxels. Finally, a random number of image slices (ranging from 0 to 50% of all slices) were cropped away randomly from either the inferior or the superior part of the heart. The fraction of EAT cropped away was calculated and used as target value for training. The same data augmentation was used as for EAT-Net.

Crop-Net was tested using a separate image dataset previously used for development of EAT-Net (n = 55), in which the whole pericardium is represented and manually annotated. To simulate a misplaced scan volume for the CT images in the test set, a fraction of the superior or inferior part of the heart was cropped.

Evaluation of the model

The final version of the combined EAT-Net and Crop-Net model was tested in two ways; first, volume prediction was tested using 25 image sets that had manual annotation of the pericardium in all image slices as well as a complete manual annotation of “non-adipose tissue inside heart”. None of these images sets were included in the training or validation sets. Some of the image sets showed minimal superior or inferior cropping, with visually insignificant volumes of the heart not being represented in the image stack. Ground truth was defined as all voxels inside of the pericardium with Hounsfield values within [− 190, − 30] and not belonging to the class “non-adipose tissue inside heart”, when the largest connected volume filter was applied.

In addition, to identify challenges with rare anatomical variations and errors introduced from suboptimal image acquisition, 1400 randomly selected image sets were analyzed by the model and all slices of the resulting segmentations were visually assessed. The segmentation quality was scored using the following criteria:

  1. 1.

    Acceptable; segmentation is perfect or has only small errors, which are unlikely to affect the resulting EATV estimate. The segmentation quality of the 25 cases used for testing volume prediction was set as bench-mark for acceptable segmentations.

  2. 2.

    Not acceptable; significant errors in segmentation which will probably affect the resulting EATV estimate significantly. These segmentations did not fulfil the quality parameters observed in the 25 cases used for testing volume prediction.

Published data on EATV

In order to compare our results to the literature, we tabulated a number of published studies, which have estimated EATV. Data is presented from studies, that have contributed significantly to the knowledge in the area and/or represent important cohorts and/or specific techniques for segmentation.

What are the main predictors of EATV and EATV attenuation in a population sample?

We used data from the same cohort that was used for visual evaluation of the combined EAT-Net and Crop-Net model (n = 1400) to address how the variation in EATV and EAT attenuation could be explained by variation in different anthropometric and cardiometabolic risk factors. The total explained variance for EATV and EAT attenuation based on the following factors was analyzed using random forest regression: gender, age, weight, height, waist, hip, systolic and diastolic blood pressure, cholesterol, LDL, HDL, triglycerides, p-glucose, HbA1c, hsCRP, creatinine, active smoking, antihypertensive or cholesterol lowering medication.

Statistics

Descriptive data was presented with percentage, median, and interquartile ranges. Model performance was evaluated using the Dice coefficient and differences in EATV between model estimates and ground truth were shown in a Bland–Altman plot. Using a random forest classifier (R-version 4.0.2, package: RandomForest), EATV and EAT attenuation was predicted from a set of 18 variables on anthropometrics and cardiometabolic risk. Total explained variance was calculated from the out-of-bag model predictions. Each variable’s percentage contribution to the explained variance was estimated using the increase in the mean squared error (MSE) of the model caused by permutation of each variable.

Source: https://www.nature.com/articles/s41598-021-03150-w