Evaluation of 28 Biomarkers to Diagnose Endometriosis
Evaluation of 28 Biomarkers to Diagnose Endometriosis
Since 1999, a biobank has been developed based on the collection and storage of plasma samples after signed informed consent from women undergoing laparoscopic surgery at the Leuven University Fertility Center (LUFC). For each patient, detailed clinical information is available in the electronic file of the patient, including age, cycle phase at surgery, detailed surgery report with scoring and staging according to the classification of the ASRM (1997), medication use, data of preoperative ultrasound (US). All patients had signed a written informed consent and the study protocol was approved by the Commission for Medical Ethics of the Leuven University Hospital Belgium.
The electronic biobank database of the LUFC was searched for all plasma samples that had both the necessary minimal volume (2.5 ml) combined with the following essential clinical information of the patient at the time of sample collection (age, indication for surgery (infertility and/or pain), stage and score of endometriosis (ASRM, 1997) and menstrual cycle phase. Plasma samples from patients using hormonal medication (combined oral contraceptive pill or progestins or GnRH analogues) and from patients operated within 6 months prior to the time of sample collection were excluded.
The first comparison was of controls (endometriosis was excluded laparoscopically by an experienced endometriosis surgeon), versus all stages of endometriosis. Endometriosis patients were then divided into three groups, minimal–mild endometriosis, moderate–severe endometriosis and US-negative endometriosis. Histological confirmation of endometriosis was available for the majority (202/232, 87.1%) of the endometriosis patients included in our study. A total of 353 plasma samples met our inclusion criteria and were randomly divided into an independent training- and test set, with equal distribution of controls (34%) and endometriosis (66%) patients in both data sets using a stratified random sample selection step. The training set population included samples from 235 patients (80 controls and 155 endometriosis (102 minimal–mild endometriosis; 53 moderate–severe endometriosis). The independent test set population included samples from 118 patients (41 controls and 77 endometriosis (46 minimal–mild endometriosis; 31 moderate–severe endometriosis). Training- and test set plasma samples were collected during the menstrual (n = 57; n = 26), luteal (n = 92; n = 43) and follicular (n = 86; n = 49) phases of the cycle, respectively.
As a non-invasive diagnostic test would be especially useful in women with endometriosis which is not diagnosed by TVU, as mentioned in the section Introduction, a subset analysis was done on samples collected from the 175 women with laparoscopically confirmed endometriosis without evidence of endometriosis on a preoperative gynaecological US. For this subset, the training set population included 117 US-negative endometriosis patients (99 minimal–mild endometriosis, 18 moderate–severe endometriosis) and 81 controls. The independent test set included 58 US-negative endometriosis patients (47 minimal–mild endometriosis, 11 moderate–severe endometriosis) and 40 controls. For this subset analysis, both training- and test set plasma samples were collected during the menstrual (n = 40; n = 27), luteal (n = 78; n = 33) and follicular (n = 80; n = 38) phases of the cycle, respectively.
Plasma samples had been collected at the time of surgery (according to the standard operation procedure) in EDTA tubes, centrifuged at 3000 rpm for 10 min at 4°C, aliquoted, labelled and stored at −80°C till analysis. The time interval between sample collection and storage in the −80°C freezer was maximum 1 h.
After an extensive literature search, 28 plasma biomarkers were selected based on their potential role in the pathogenesis of endometriosis, differential expression in endometriosis patients compared with controls (reviewed by May et al., 2010) and commercial availability of the assays. Table I shows the complete list of biomarkers analysed in this study according to their biological function (glycoproteins, inflammatory and non-inflammatory markers, adhesion molecules, angiogenic and growth factors).
The following multiplex and single immunoassay technologies were used: Bio-Plex Protein Array System (Bio-Rad Laboratories, Hercules, CA, USA) was used for the measurement of IL-1beta, IL-4; IL-6; IL-8; IL-10, IL-17, TNF-alpha, RANTES, NGF, b-FGF, IFN-gamma, MIF, MCP-1, VCAM, VEGF, M-CSF, HGF. Multiplexing sandwich-ELISA system of Aushon Biosystems Search Light Assay Services (Woburn, USA) was used for the measurement of osteopontin, IGFBP-3 and leptin. Single ELISAs were used for the measurement of sICAM-1 and follistatin (R&D Systems, Minneapolis, USA), annexin V (American Diagnostica, Inc., Stamford, USA), IL-21 (Bender Med Systems, Vienna, Austria) and glycodelin (Bioserv Diagnostics, Rostock, Germany). Plasma concentrations of CA-125, CA-19–9 and hsCRP were measured by automated immunoassays (Roche, Vilvoorde, Belgium).
Since the commercially available glycodelin ELISA kit (Bioserv Diagnostics, Rostock, Germany) has been validated only in serum samples, an additional analytical validation step on plasma was performed to validate the use of the glycodelin ELISA kit in plasma. An intra-assay variation was between 12.6 and 15.3%. The inter-assay coefficient of variation was between 6.8 and 18.8%. The recovery range of 10 samples in the spike-recovery experiment was between 82 and 120%. The glycodelin ELISA (Bioserv Diagnostics, Rostock, Germany) showed a good linearity [a slope of 0.96 and a Spearman correlation coefficient of 0.92 (P=0.0013)] between the observed and expected levels of glycodelin in plasma. The data of analytical validation of glycodelin ELISA kit (Bioserv Diagnostics, Rostock, Germany) on EDTA plasma showed that the assay is accurate for EDTA plasma.
In an additional methodology study (Vodolazkaia et al., 2011) we confirmed that the hsCRP assay was superior to the classical CRP assay for the detection of low CRP levels (indicating subclinical inflammation in the plasma of endometriosis patients) and for the diagnosis of moderate–severe endometriosis. The hsCRP assay was used for the measurement of CRP in the entire study population.
As mentioned above, all samples were randomly divided into a training set (70%) and in a test set (30%), and data were analysed separately for each set using univariate and multivariate statistical analyses. Undetectable amounts of a target molecule measured were considered to be one-half the limit of quantification for statistical analysis. IL-4, NGF-beta and M-CSF were not detectable in >90% of the samples and have been excluded from the statistical analysis.
Univariate Statistical Analysis Data are presented as median and interquartile range. A P-value of <0.05 was considered statistically significant. Differences in biomarkers levels were evaluated using the Mann–Whitney test and the Kruskal–Wallis test with post hoc Dunn analysis in the training- and the test data set separately.
A receiver operating characteristic (ROC) curve analysis was performed to determine the diagnostic performance of each biomarker separately. The optimal cut-offs levels resulting in the highest sensitivity at the acceptable specificity (>50%) in the training set were validated on the independent test set.
In our study, the area under the ROC curve (AUC) was calculated and evaluated based on previously published guidelines (Akobeng, 2007; Bossuyt, 2009). The clinical value of a laboratory test with AUC values between 0 and 0.5, 0.5–0.7, 0.7–0.9 or >0.9 can be defined as zero, limited, moderate and high, respectively (Bossuyt, 2009). Taking into account our clinical perspective on the requirements for a diagnostic test for endometriosis, as explained in the section Introduction and published before (D'Hooghe et al., 2006), our data analysis focused on the need for a diagnostic test with a high sensitivity (>80%) and an acceptable specificity (>50%).
Multivariate Statistical Analysis Multivariate analysis was carried out to identify whether a panel of biomarkers could increase the sensitivity and specificity of the non-invasive test for endometriosis when compared with univariate analysis. We implemented and applied univariate and multivariate biomarker selection methods and used the selected biomarkers in the multivariate classification to assess their performances. Two fundamentally different classifiers—multivariate logistic regression and the least squares support vector machines (LS-SVM)—were used, as published before (Mihalyi et al., 2010). When compared with multivariate logistic regression, LS-SVM is less sensitive to the influence of irrelevant features as it has an internal mechanism to minimize their effect. An agreement between these two classifiers strongly indicates robustness of the selected biomarker panel (Pochet and Suykens, 2006).
Three biomarker selection methods were used to obtain the most accurate biomarker panel. For both univariate and multivariate biomarker selection, bootstraps (70% of the training data set, in a stratified manner) were repeatedly thrown out from the training data set within the loop for 500 times, randomizing the whole training set before every iteration (François et al., 2007). In each run, the biomarkers selection method has been applied on bootstrap sample to collect corresponding statistics, with only the biomarkers significant across repetitions being kept.
When the univariate biomarker selection scheme was applied, only biomarkers that were significant according to the Mann–Whitney test in 70% and more randomizations were selected (univariate approach; Supplementary data, Tables SI and SII).
When the multivariate biomarker selection scheme was applied, two approaches based on multivariate stepwise logistic regression with Akaike information criteria were used to account for possible correlation between biomarkers. The Akaike information criteria were chosen due to the robustness for the prediction (Agresti, 2002). In the first approach, only the biomarkers with high frequency of appearance in regression models in all runs (70% and more randomizsations) were considered for feeding the classification step (Multivariate approach 1; Supplementary data, Tables SI and SII).
In the second approach, all of multivariate logistic regression models containing the most frequent biomarkers as determined in the first approach have been selected [Multivariate approach 2; Table VI , Supplementary data,Tables SI and SII)]. After this, all biomarkers figuring in the best among these models were considered informative.
Using the biomarkers selected in the previous step, we applied two classification algorithms (multivariate logistic regression and LS-SVM) on the independent training- and test set separately to estimate several measures of performance—accuracy, area under the ROC curve, sensitivity, specificity, positive (PPV) and negative predictive values (NPV), positive and negative likelihood ratio (LR) and diagnostic odds ratio (DOR).
Materials and Methods
Selection of Plasma Samples From the LUFC Endometriosis Research Biobank
Since 1999, a biobank has been developed based on the collection and storage of plasma samples after signed informed consent from women undergoing laparoscopic surgery at the Leuven University Fertility Center (LUFC). For each patient, detailed clinical information is available in the electronic file of the patient, including age, cycle phase at surgery, detailed surgery report with scoring and staging according to the classification of the ASRM (1997), medication use, data of preoperative ultrasound (US). All patients had signed a written informed consent and the study protocol was approved by the Commission for Medical Ethics of the Leuven University Hospital Belgium.
The electronic biobank database of the LUFC was searched for all plasma samples that had both the necessary minimal volume (2.5 ml) combined with the following essential clinical information of the patient at the time of sample collection (age, indication for surgery (infertility and/or pain), stage and score of endometriosis (ASRM, 1997) and menstrual cycle phase. Plasma samples from patients using hormonal medication (combined oral contraceptive pill or progestins or GnRH analogues) and from patients operated within 6 months prior to the time of sample collection were excluded.
The first comparison was of controls (endometriosis was excluded laparoscopically by an experienced endometriosis surgeon), versus all stages of endometriosis. Endometriosis patients were then divided into three groups, minimal–mild endometriosis, moderate–severe endometriosis and US-negative endometriosis. Histological confirmation of endometriosis was available for the majority (202/232, 87.1%) of the endometriosis patients included in our study. A total of 353 plasma samples met our inclusion criteria and were randomly divided into an independent training- and test set, with equal distribution of controls (34%) and endometriosis (66%) patients in both data sets using a stratified random sample selection step. The training set population included samples from 235 patients (80 controls and 155 endometriosis (102 minimal–mild endometriosis; 53 moderate–severe endometriosis). The independent test set population included samples from 118 patients (41 controls and 77 endometriosis (46 minimal–mild endometriosis; 31 moderate–severe endometriosis). Training- and test set plasma samples were collected during the menstrual (n = 57; n = 26), luteal (n = 92; n = 43) and follicular (n = 86; n = 49) phases of the cycle, respectively.
As a non-invasive diagnostic test would be especially useful in women with endometriosis which is not diagnosed by TVU, as mentioned in the section Introduction, a subset analysis was done on samples collected from the 175 women with laparoscopically confirmed endometriosis without evidence of endometriosis on a preoperative gynaecological US. For this subset, the training set population included 117 US-negative endometriosis patients (99 minimal–mild endometriosis, 18 moderate–severe endometriosis) and 81 controls. The independent test set included 58 US-negative endometriosis patients (47 minimal–mild endometriosis, 11 moderate–severe endometriosis) and 40 controls. For this subset analysis, both training- and test set plasma samples were collected during the menstrual (n = 40; n = 27), luteal (n = 78; n = 33) and follicular (n = 80; n = 38) phases of the cycle, respectively.
Plasma samples had been collected at the time of surgery (according to the standard operation procedure) in EDTA tubes, centrifuged at 3000 rpm for 10 min at 4°C, aliquoted, labelled and stored at −80°C till analysis. The time interval between sample collection and storage in the −80°C freezer was maximum 1 h.
Selection and Measurement of Target Biomarkers
After an extensive literature search, 28 plasma biomarkers were selected based on their potential role in the pathogenesis of endometriosis, differential expression in endometriosis patients compared with controls (reviewed by May et al., 2010) and commercial availability of the assays. Table I shows the complete list of biomarkers analysed in this study according to their biological function (glycoproteins, inflammatory and non-inflammatory markers, adhesion molecules, angiogenic and growth factors).
The following multiplex and single immunoassay technologies were used: Bio-Plex Protein Array System (Bio-Rad Laboratories, Hercules, CA, USA) was used for the measurement of IL-1beta, IL-4; IL-6; IL-8; IL-10, IL-17, TNF-alpha, RANTES, NGF, b-FGF, IFN-gamma, MIF, MCP-1, VCAM, VEGF, M-CSF, HGF. Multiplexing sandwich-ELISA system of Aushon Biosystems Search Light Assay Services (Woburn, USA) was used for the measurement of osteopontin, IGFBP-3 and leptin. Single ELISAs were used for the measurement of sICAM-1 and follistatin (R&D Systems, Minneapolis, USA), annexin V (American Diagnostica, Inc., Stamford, USA), IL-21 (Bender Med Systems, Vienna, Austria) and glycodelin (Bioserv Diagnostics, Rostock, Germany). Plasma concentrations of CA-125, CA-19–9 and hsCRP were measured by automated immunoassays (Roche, Vilvoorde, Belgium).
Since the commercially available glycodelin ELISA kit (Bioserv Diagnostics, Rostock, Germany) has been validated only in serum samples, an additional analytical validation step on plasma was performed to validate the use of the glycodelin ELISA kit in plasma. An intra-assay variation was between 12.6 and 15.3%. The inter-assay coefficient of variation was between 6.8 and 18.8%. The recovery range of 10 samples in the spike-recovery experiment was between 82 and 120%. The glycodelin ELISA (Bioserv Diagnostics, Rostock, Germany) showed a good linearity [a slope of 0.96 and a Spearman correlation coefficient of 0.92 (P=0.0013)] between the observed and expected levels of glycodelin in plasma. The data of analytical validation of glycodelin ELISA kit (Bioserv Diagnostics, Rostock, Germany) on EDTA plasma showed that the assay is accurate for EDTA plasma.
In an additional methodology study (Vodolazkaia et al., 2011) we confirmed that the hsCRP assay was superior to the classical CRP assay for the detection of low CRP levels (indicating subclinical inflammation in the plasma of endometriosis patients) and for the diagnosis of moderate–severe endometriosis. The hsCRP assay was used for the measurement of CRP in the entire study population.
Statistical Analysis
As mentioned above, all samples were randomly divided into a training set (70%) and in a test set (30%), and data were analysed separately for each set using univariate and multivariate statistical analyses. Undetectable amounts of a target molecule measured were considered to be one-half the limit of quantification for statistical analysis. IL-4, NGF-beta and M-CSF were not detectable in >90% of the samples and have been excluded from the statistical analysis.
Univariate Statistical Analysis Data are presented as median and interquartile range. A P-value of <0.05 was considered statistically significant. Differences in biomarkers levels were evaluated using the Mann–Whitney test and the Kruskal–Wallis test with post hoc Dunn analysis in the training- and the test data set separately.
A receiver operating characteristic (ROC) curve analysis was performed to determine the diagnostic performance of each biomarker separately. The optimal cut-offs levels resulting in the highest sensitivity at the acceptable specificity (>50%) in the training set were validated on the independent test set.
In our study, the area under the ROC curve (AUC) was calculated and evaluated based on previously published guidelines (Akobeng, 2007; Bossuyt, 2009). The clinical value of a laboratory test with AUC values between 0 and 0.5, 0.5–0.7, 0.7–0.9 or >0.9 can be defined as zero, limited, moderate and high, respectively (Bossuyt, 2009). Taking into account our clinical perspective on the requirements for a diagnostic test for endometriosis, as explained in the section Introduction and published before (D'Hooghe et al., 2006), our data analysis focused on the need for a diagnostic test with a high sensitivity (>80%) and an acceptable specificity (>50%).
Multivariate Statistical Analysis Multivariate analysis was carried out to identify whether a panel of biomarkers could increase the sensitivity and specificity of the non-invasive test for endometriosis when compared with univariate analysis. We implemented and applied univariate and multivariate biomarker selection methods and used the selected biomarkers in the multivariate classification to assess their performances. Two fundamentally different classifiers—multivariate logistic regression and the least squares support vector machines (LS-SVM)—were used, as published before (Mihalyi et al., 2010). When compared with multivariate logistic regression, LS-SVM is less sensitive to the influence of irrelevant features as it has an internal mechanism to minimize their effect. An agreement between these two classifiers strongly indicates robustness of the selected biomarker panel (Pochet and Suykens, 2006).
Selection of Biomarkers Based on the Training Data Set
Three biomarker selection methods were used to obtain the most accurate biomarker panel. For both univariate and multivariate biomarker selection, bootstraps (70% of the training data set, in a stratified manner) were repeatedly thrown out from the training data set within the loop for 500 times, randomizing the whole training set before every iteration (François et al., 2007). In each run, the biomarkers selection method has been applied on bootstrap sample to collect corresponding statistics, with only the biomarkers significant across repetitions being kept.
When the univariate biomarker selection scheme was applied, only biomarkers that were significant according to the Mann–Whitney test in 70% and more randomizations were selected (univariate approach; Supplementary data, Tables SI and SII).
When the multivariate biomarker selection scheme was applied, two approaches based on multivariate stepwise logistic regression with Akaike information criteria were used to account for possible correlation between biomarkers. The Akaike information criteria were chosen due to the robustness for the prediction (Agresti, 2002). In the first approach, only the biomarkers with high frequency of appearance in regression models in all runs (70% and more randomizsations) were considered for feeding the classification step (Multivariate approach 1; Supplementary data, Tables SI and SII).
In the second approach, all of multivariate logistic regression models containing the most frequent biomarkers as determined in the first approach have been selected [Multivariate approach 2; Table VI , Supplementary data,Tables SI and SII)]. After this, all biomarkers figuring in the best among these models were considered informative.
Classification and Validation
Using the biomarkers selected in the previous step, we applied two classification algorithms (multivariate logistic regression and LS-SVM) on the independent training- and test set separately to estimate several measures of performance—accuracy, area under the ROC curve, sensitivity, specificity, positive (PPV) and negative predictive values (NPV), positive and negative likelihood ratio (LR) and diagnostic odds ratio (DOR).
Source...