Reproducibility and uptake time dependency of volume-based parameters on FDG-PET for lung cancer

Background Volume-based parameters, such as metabolic tumor volume (MTV) and total lesion glycolysis (TLG), on F-18 fluorodeoxyglucose (FDG) positron emission tomography (PET) are useful for predicting treatment response in nonsmall cell lung cancer (NSCLC). We aimed to examine intra- and inter-operator reproducibility to measure the MTV and TLG, and to estimate their dependency on the uptake time. Methods Fifty NSCLC patients underwent preoperative FDG-PET. After an injection of FDG, the whole body was scanned twice: at the early phase (61.4 ± 2.8 min) and delayed phase (117.7 ± 1.6 min). Two operators independently defined the tumor boundary using three different delineation methods: (1) the absolute SUV threshold method (MTVp and TLGp; p = 2.0, 2.5, 3.0, 3.5), (2) the fixed% SUVmax threshold method (MTVq% and TLGq%; q = 35, 40, 45), and (3) the adaptive region-growing method (MTVARG and TLGARG). Parameters were compared between operators and between phases. Results Both the intra- and inter-operator reproducibility were high for all parameters using any method (intra-class correlation > 0.99 each). MTV3.0 and MTV3.5 resulted in a significant increase from the early to delayed phase (P < 0.05 for both), whereas MTV2.0 and MTV2.5 neither increased nor decreased (P = n.s.). All of the MTVq% values significantly decreased over time (P < 0.01), whereas MTVARG and TLG with any delineation method increased significantly (P < 0.05). Conclusions High reproducibility of MTV and TLG was obtained by all of the methods used. MTV2.0 and MTV2.5 were the least sensitive to uptake time, and may be good alternatives when we compare images acquired with different uptake times, although applying constant uptake time is important for volume measurement.


Background
Positron emission tomography (PET) using F-18 fluorodeoxyglucose (FDG) has been an essential diagnostic tool in oncology [1][2][3]. FDG-PET generates functional images that contribute to clinical diagnoses and treatment planning complementarily with anatomical modalities such as computed tomography (CT) and magnetic resonance imaging (MRI). PET is also characterized by high quantitative performance [4][5][6]. In most clinical settings, FDG-PET images were assessed semi-quantitatively using the standardized uptake value (SUV), which commonly represents the radioactivity concentration per unit volume of tissue normalized to the injected dosage and body weight [7]. The maximum of the SUV (SUVmax) within the tumor has been used most frequently to express the intensity of FDG uptake in the tumor because of its simplicity and high reproducibility [8][9][10][11][12]. However, the SUVmax has several problems. Because the SUVmax represents just a single voxel (normally < 0.1 ml) and not the entire tumor metabolism, it is sensitive to statistical noise of the image [13]. In recent years, the use of the SUVpeak has been preferred [13]. The definition of SUVpeak remains to be standardized, but usually calculated by averaging SUV within a 1-ml sphere (12 mm in diameter) around the voxel showing highest intensity voxel. The SUVpeak is less sensitive to image noise but suffers from the same problem as SUVmax still reflects a small part of the tumor [14,15].
In this context, the metabolic tumor volume (MTV) and total lesion glycolysis (TLG) has been recently used as indices of the whole tumor FDG uptake. The MTV is defined as the volume of tumor determined on an FDG-PET image using a certain threshold. Once MTV is determined, the SUVmean can be defined as the averaged SUV within MTV. TLG is the product of the MTV and the SUVmean. These indicators reflect the activity of the glucose metabolism in the entire tumor. The clinical usefulness of these indicators (e.g., prognosis and treatment response) has been demonstrated in many cancers such as lung [16,17], head-and-neck [18][19][20], and gynecological cancer [21,22].
Calculating the MTV and TLG requires tumor contouring on the PET image. Many methods have been reported to determine the contour [23][24][25][26][27][28][29][30][31], and among them, manual contouring, the absolute SUV threshold method, and relative SUV threshold methods have been used widely. With the manual contouring method, the tumor boundary is determined based on an operator's visual inspection. This operator-dependent method suffers from reproducibility and is affected by the window level and color scale. It also takes a long time to apply this manual operation to all of the images containing tumors. Other methods have thus been developed to reduce the effects from display conditions or operators.
There is no doubt that the SUVmax has high intraand inter-operator reproducibility, but the reproducibility of MTV and TLG still needs to be assessed. In the present study, we examined intra-operator reproducibility (i.e., the same operator analyzes the same image twice) and inter-operator reproducibility (i.e., two operators analyze the same image independently). In addition, considering possible effects of uptake time after the FDG administration on the MTV and TLG, we acquired PET images twice after a single injection (at 60 and 120 min), and we compared the MTV and TLG between these images. We applied different delineation methods that are widely used. Thus, in this study, we aimed to evaluate (1) intraoperator reproducibility, (2) inter-operator reproducibility, and (3) the effect of uptake time differences on volume-based parameters.

Study subjects
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The institutional ethics committee of Hokkaido Cancer Center approved this retrospective study. Informed consent was waived from individual participants in the retrospective study according to the committee. Patient records/information was anonymized and de-identified prior to analysis. From our hospital information system, we found a total of 52 patients who underwent FDG-PET for an examination of lung nodules before treatment at the National Hospital Organization Hokkaido Cancer Center between December 2010 and March 2012. One patient was suspected of having metastatic lung tumor from breast cancer, and another patient did not complete the scanning because of severe pain. Thus, we included 50 patients (27 males; age, 70.2 ± 10.1 years old) whose lung nodules were visualized by FDG-PET and whose nodule(s) were pathologically confirmed as non-small cell lung cancer (NSCLC). The patient characteristics are shown in Table 1

Image acquisition and reconstruction
All of the clinical FDG-PET studies were performed with an Eminence SET-3000G PET scanner (Shimadzu, Kyoto, Japan). All of the patients fasted for at least 6 h before the injection of FDG (224 ± 54 MBq, range 142-294 MBq; 4.0 ± 0.9 MBq/kg, range 2.5-6.4 MBq/kg). The blood glucose level was 100 ± 19 mg/dl. The images were scanned twice for each study: early scanning at 61.4 ± 2.8 min (range 58-67 min) and delayed scanning at 117.7 ± 1.6 min (range 114-121 min). The transaxial field of view was 512 mm in diameter. Three-dimensional emission scanning was performed in a continuous bedmovement manner (0.8-0.9 mm/s). Transmission scanning was performed with a 137 Cs external source to correct for attenuation.
Images were reconstructed with a block-iterative algorithm named 'dynamic row-action maximum likelihood algorithm (DRAMA), ' modified from the rowaction maximum likelihood algorithm (RAMLA) [32].

Image processing
A total of 100 FDG-PET datasets (two datasets, i.e., early and delayed images, from 50 patients) were processed to delineate the tumor by two operators (Fig. 1). Operator-1 (T.K.) is an experienced radiologic technologist of nuclear medicine, and Operator-2 (K.H.) is an experienced nuclear medicine physician. Both Operator-1 and Operator-2 independently defined the tumor boundary two times with an interval of 30 days or longer (i.e., a total of 4-time measurements). Operator-2 defined the tumor boundary once without viewing the results reported by Operator-1, and vice versa. Hereinafter, we use these three abbreviations: Op 1 Ob 1 representing the first observation from operator-1, Op 1 Ob 2 representing the second observation from operator-1, Op 2 Ob 1 representing the first observation from Operator-2, and Op 2 Ob 2 representing the second observation from Operator-2. The volume-of-interest (VOI) was defined by manually drawing polygonal regions of interest (ROIs) to enclose the entire tumor with enough margins on every slice where the tumor was seen. During the ROI definition, the PET images were displayed using a rainbow color bar with a fixed window level of SUV 0-4. Physiological uptake was carefully avoided. Neither lymph nodes nor distant metastatic lesions were investigated in this study. All of the ROIs were combined to generate a three-dimensional VOI.
In this study, we used the following three delineation methods. (1) The absolute SUV threshold method, which is a procedure of defining the area of the tumor as a region with a certain value higher than predetermined threshold, such as an SUV of 2.5 or 3.0. (2) The fixed% SUVmax threshold method, which is a procedure for defining the area of the tumor as a region with a higher SUV than a certain percentage of the SUVmax within the tumor (40-50 %, commonly). (3) The adaptive regiongrowing method (ARG), which is a relatively new method [26]. The ARG is essentially a region-growing method that examines neighboring voxels of the current region and determines whether the neighbor voxels should be added to the in-tumor region. If {a neighbor voxel} ≥ {mean of current region} × {arbitrary threshold}, the voxel is added to the region. There is a sharp volume increase point when the threshold (%) varies from 100 to 0 %, and the tumor region is determined by this border point. With this method, the area of the tumor can be extracted automatically by the setting of the highest voxel in the tumor. Because the ARG method uses a new procedure, there are still few studies using this method.
The tumor volume was automatically determined within the VOI using different methods: MTV p , MTV q% , and MTV ARG . MTV p is the MTV determined using the absolute SUV threshold method, where p = 2.0, 2.5, 3.0, or 3.5. MTV q% is the MTV determined using the fixed% SUVmax threshold method, where q = 35, 40, or 45 %.  Values of p and q were chosen based on their frequency of appearance in literature [13]. MTV ARG is the MTV determined using ARG method.
TLG was defined as the product of the corresponding MTV and SUVmean values within the tumor boundary. The SUVmax was also recorded, which represented the voxel showing the highest SUV in the VOI. The SUV was calculated as [tissue radioactivity concentration (Bq/ ml)] × [body weight (g)] /[injected radioactivity (Bq)].
For all the image analysis including manual ROI drawing, mathematical delineation, and parameter calculation, we used an in-house software package, composed with Visual Studio 2010 (Microsoft Corporation, Redmond, Washington, USA) and C# language.

Statistical analysis
Values are expressed as the mean ± SD. The free statistical package R version 3.2.5 (R Project, http:// cran.r-project.org) was used for all statistical analyses. A paired t-test was used if the values could be considered paired. The method of Holm was used to adjust the P-values for multiple comparisons. The intraclass correlation (ICC) was used to evaluate intraand inter-operator reproducibility [33]. Intra-operator reproducibility was estimated by 2 combinations: 1) Op 1 Ob 1 vs. Op 1 Ob 2 , and 2) Op 2 Ob 1 vs. Op 2 Ob 2 .

Reproducibility
Both intra-and inter-operator reproducibility were extremely high at the early phase (Table 2) and the delayed phase ( Table 3). The ICC between the first versus second measurement by Operator-1 or Operator-2 was > 0.99 for any parameters. Similarly, the ICC between Operator-1 versus Operator-2 was > 0.99 for any parameters. Among the parameters, no difference was observed in SUVmax, MTV ARG or TLG ARG in any case (i.e., perfect match). Comparisons between methods revealed that most of the MTV q% values were lower than those of MTV p or MTV ARG .
Parameter changes from the early phase to the delayed phase Parameter changes from early to delayed phases are summarized in Table 4. The SUVmax increased in 49 of the 50 (98 %) cases at the delayed phase compared to the early phase (early, 9.1 ± 4.9; delayed, 11.1 ± 6.0; P < 0.0001). The MTV changes depended on the Operator-2's second observation delineation methods. Among them, the MTV 2.0 and MTV 2.5 neither increased nor decreased from the early phase to the delayed phase with the averaged delayed-toearly ratios of 1.02 and 1.06, respectively (P = nonsignificant for both). The use of a higher threshold (i.e., MTV 3.0 and MTV 3.5 ) led to a significant increase from the early to the delayed phase (P < 0.05 for both). All of the MTV q% values (i.e., MTV 35% , MTV 40% , and MTV 45% ) significantly decreased (P < 0.001), whereas the MTV ARG values significantly increased (P < 0.05) (Fig. 2). In contrast, the TLG obtained by any of the delineation methods was significantly increased at the delayed phase (Fig. 3).

Discussion
In this study of volume-based parameters on FDG-PET for NSCLC, we found high intra-and inter-operator reproducibility for all parameters (ICC >0.99 each). We also evaluated the time sensitivity of the parameters by comparing early-phase images with delayed-phase images. Whereas the SUVmax increased significantly at the delayed phase, the MTV changes depended on the delineation method, and the TLG obtained by any of the delineation methods was significantly increased at the delayed phase (P < 0.05). Among the parameters examined, only MTV 2.0 and MTV 2.5 were the parameters that neither increased nor decreased at the delayed phase.

Intra-and inter-operator reproducibility
In case that the tumor exists without adjacent nontumor uptakes (i.e., physiological or inflammatory), the semi-automated methods we employed in this study should not cause variability of measurement theoretically. However, it is not uncommon that the tumor is so close to mediastinum that the manual ROIs include parts of blood pool or lymph nodes. In such cases, even semi-automated methods are expected to cause some variation if the threshold is lower than the non-tumor uptake. In this study, we observed both the intra-and inter-operator reproducibility were high for all parameters. Although we observed minimal differences in some cases between the two measurements when relatively low threshold (absolute or fixed% SUVmax) was used, as expected, we consider that the high ICCs may allow use of the methods. Shah et al. reported high inter-operator reproducibility of MTV and TLG using a fixed% SUVmax threshold method that showed the ICCs between two measurements by one operator as > 0.98 for MTV and > 0.99 for TLG [33]. Frings et al. demonstrated high repeatability in the same examination of the two measurement within 1 week using FDG or 18Ffluorothymidine (FLT) [34]. Our results are in line with these previous reports. The difference we observed may be small enough for clinical use. In contrast, using the ARG method, the twicemeasurements of the tumor volume completely agreed, because this method delineates the tumor boundary without requiring a manual ROI [26]. Our results are consistent with this report in terms of high interoperator reproducibility. However, as a shortcoming, this method does not always successfully determine the tumor boundary, especially when images are noisy or the boundaries are indistinct (or ambiguous). Conducting phantom experiments, Li et al. reported that the Fig. 2 Bland-Altman plots showing the parameter changes between the early phase and delayed phase of the MTV, which is a general threshold value. MTV 2.5 had few parameter changes (a), MTV 40% decreased (b), and MTV ARG increased (c) from the early phase to the delayed phase ARG method generates a slightly larger volume than the actual tumor volume, and that the degree of volume overestimation depended on the source-to-background ratio. They thus recommended that use of the ARG method should be followed by an appropriate volume correction.

Early and delayed scans
MTV is the volume where the tumor cells are actively metabolizing glucose. Note that MTV is not an uptake quantification. The volume should not change within a few hours but should be stable if there is no significant tumor growth. In fact, however, many methods of MTV measurement resulted in significant volume changes from the early phase to the delayed phase except for MTV 2.0 and MTV 2.5 . In contrast, TLG is the arbitrary amount of glucose metabolized during the period from injection to image acquisition. Thus, TLG may change over time theoretically. In the present study, we investigated malignant tumors only; thus, the FDG inflow is thought to continue even 1 h after the injection, resulting in higher uptake at 2 h [35,36]. Among the MTVs measured by different methods, MTV 2.0 and MTV 2.5 neither increased nor decreased from the early to the delayed phase, probably because the increase in tumor uptake and the decrease in the surrounding background uptake (e.g., in a lung field or mediastinum blood pool) would have cancelled each other out. Conversely, the MTV 35% , MTV 40% , and MTV 45% values all significantly decreased because the increase in the SUVmax raised the delineation cut-off value. MTV ARG increased due to the increase in the tumor-to-background ratio at the delayed phase. TLG by all delineation methods significantly increased; this is likely due to the increase in the SUVmean within the region. Our present report is the first to show parameter changes from the early to delayed phases.
PERCIST, the guideline for PET response criteria in solid tumors, requires that a PET scan for baseline should be obtained at 50-70 min after injection, and the follow-up scan should be obtained within 15 min of the baseline scan [13]. In our observation, almost all parameters changed from the early phase to the delayed phase, which further supported the importance of time strictness. However, it is not always easy to perform scanning under such a strict protocol in many clinical conditions. In particular, when we try to carry out a retrospective analysis, the uptake time restriction will exclude a number of scans. We suggest that use of MTV 2.0 or MTV 2.5 could be an alternative way to minimize the influence of uptake time variability.
It should be noted that MTV 2.5 is the most commonly used method thus far, and is known to be well correlated with patient outcomes of various cancers [27,28,37]. For instance, Kao et al. showed that MTV 2.5 was the most appropriate parameter for predicting recurrence after radiotherapy for pharyngeal cancer patients in comparison with MTV 3.0 , MTV 40% , and MTV 50% [28]. Based on our present findings, MTV 3.0 or MTV with higher thresholds may not be appropriate if the uptake time is not constant. Another reason to avoid higher thresholds is that a significant number of cases showed zero volume using such thresholds.
MTV q% has also been frequently used. MTV q% is actually better at tumor volume measurements in a phantom study because it is relatively resistant to partial volume effects. However, this method may appropriately work when the tumor has intermediate SUVmax (e.g., 5-10) but may under-or over-estimate the volume in cases of considerably high or low SUVmax of tumor, respectively. Therefore, it is difficult to fix relative threshold (%) in studies investigating a large number of patients. Considering the difficulty in fixing an absolute or relative SUV threshold, the ARG procedure is an attractive method that does not require manual interaction. Although the ARG method did achieve very good intra-and inter-operator reproducibility in the present study, its high sensitivity to uptake time necessitated further improvement. TLG seems to extract more information of PET than MTV does, because TLG is an uptake quantification whereas MTV is just a volume. Superiority of TLG to MTV for treatment response of lung cancer has been reported recently [38,39]. As mentioned above, however, the TLG obtained by any of the delineation methods was significantly increased at the delayed phase. Therefore, when we use datasets acquired with a fluctuating uptake time, we recommend that MTV 2.5 should be chosen as the best volume-based parameter among many MTVs and TLGs.
The limitations of this study include the following. We investigated reproducibility and parameter changes by uptake time, but we did not report the prognostic value. Future studies will be needed to combine the present findings and prognostic information. In addition, it is necessary to study cancers other than lung cancer. For lung cancer, a manual ROI was defined relatively easily because the tumor existed in the lung showing low FDG uptake. Reproducibility may be affected in fields that have higher physiological uptake, such as the head-andneck and pelvis.

Conclusions
The MTV and TLG of primary lesions of 50 NSCLC patients were measured with different tumor delineation methods and different uptake times. We found that both the intra-and inter-operator reproducibility were extremely high for all parameters. Most of the MTV values and all of the TLG values were significantly affected by the uptake time. Among the various parameters studied, MTV 2.0 and MTV 2.5 were the least sensitive to the uptake time, and may be good alternatives when we compare images acquired with different uptake times, although applying constant uptake time is important for volume measurement.
Abbreviations CT, computed tomography; DRAMA, dynamic row-action maximum likelihood algorithm; FDG, fluorodeoxyglucose; FLT, fluorothymidine; ICC, intraclass correlation; ICCop1, ICC between the first versus second measurement by operator-1; ICCop1op2, ICC between operator-1 versus operator-2; MRI, magnetic resonance imaging; MTV, metabolic tumor volume; NSCLC, nonsmall cell lung cancer; PET, positron emission tomography; RAMLA, rowaction maximum likelihood algorithm; ROIs, regions of interest; SUV, standardized uptake value; SUVmax, maximum of SUV; TLG, total lesion glycolysis; VOI, volume-of-interest Ethics approval and consent to participate This retrospective study was approved by the institutional ethics committee of Hokkaido Cancer Center (Approval number: . The informed consent was waived from individual participants in the retrospective study according to the institutional ethics committee of Hokkaido Cancer Center. Patient records/information was anonymized and de-identified prior to analysis.

Statement of human rights
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.