Mutational profile through next-generation sequencing analysis
A total of 1240 candidate variants were identified in 46 genes (Supplementary Table S3). Of 342 cases, 330 (96.5%) harbored at least one mutation. Firstly, as for de novo DLBCL, NOS patients, the most frequently mutated genes were TP53 (28.9%), PIM1 (27.2%), KMT2D (25.9%), MYD88 (23.0%), and CD79B (22.6%) (Supplementary Fig. S1), and variant frequency of MYD88 was significantly more frequent compared with relapse DLBCL cases (P = 0.014). As for relapse DLBCL cases, KMT2D and KRAS mutations were significantly more frequent compared with those in de novo cases (P = 0.031; P = 0.012). Secondly, as for gene functional groups, mutations in genes associated with chromatin remodeling, including KMT2D and CREBBP, and genes associated with apoptosis resistance, including BCL2, were significantly enriched in cases of transformed follicular lymphoma (tFL) in comparison with de novo cases (P = 2.20 × 10− 4; P = 3.51 × 10− 4; P = 1.93 × 10− 5). While immune evasion-related genes, including B2M, CD58, and CD70, were frequently detected in de novo cases (16.3, 12.1, 9.6%) but not in tFL in our cohort (P = 0.039; P = 0.095; P = 0.158). Thirdly, through comparison of mutation gene pattern differences among GCB and non-GCB DLBCL subgroups in all de novo, relapsed, and transformed cases, we determined that the BCL2 translocation; the MYC translocation; and the CREBBP, TNFRSR14, BCL2, EZH2, SGK1, and ID3 mutations were significantly more frequent in GCB DLBCL (P < 0.001), whereas CD79B mutations, the MYD88L265P mutation, and the BCL6 translocation were more common in non-GCB DLBCL (P < 0.001) (Fig. 1a).
In addition, in this study we specifically focused on the analysis of CD79B mutation pattern. In general, the variant frequency of CD79B was relative higher compared with recent related studies [3,4,5, 23, 24]. In detail, for the typical hotspot variant CD79BY196, we found the top co-occurrent mutation with CD79BY196 was MYD88L265P (n = 37) (adjusted P value 1.09 × 10− 9) in our de novo DLBCL, NOS cases, which was mainly identified in non-GCB subtype (4.5% of GCB cases vs. 22.6% of non-GCB cases, P = 5.87 × 10− 7) (Fig. 1b). Meanwhile, we found that CD79B non-Y196 codon mutations accounted for 41% (33/80) of all CD79B mutations. Moreover, through validation Sanger sequencing of tumor and paired normal tissue DNA, we determined several novel hotspot intron splice site mutations, including c.550-1G > A, c.550-1G > C, c.550-3_552del, c.549 + 1G > A, c.549 + 1G > C, and c.540_549 + 1del (Fig. 1d). Focusing on the molecular impact of c.550-1G > A mutation, we subsequently performed RNA sequencing and revealed that this mutation resulted in exposing two novel potential splice acceptor sites, thereby synthesizing two truncating proteins (Fig. 1e). Furthermore, we also found that CD79B truncating mutations were mutually exclusive with CD79BY196 (adjusted P value 0.011). Similar tendency was witnessed for CD79B truncating mutations with MYD88L265P, while without statistical significance (adjusted P value 0.14) (Fig. 1b, Fig. 1c).
Identification of genetic signatures via iterative random forest (RF) algorithm
In this study, based on targeted sequencing results and FISH findings, we attempted to identify several non-mutually exclusive representative genetic signatures instead of categorizing subjects into several mutually exclusive distinct subgroups. Therefore, we decided to seed our analysis from 5 genetic alterations which participated in the most important cellular signaling pathways in DLBCL pathogenesis, i.e. cellular proliferation (MYC translocation), apoptosis resistance (BCL2 translocation), immune cell differentiation abruption (BCL6 translocation) and activation of inflammation pathway (CD79B Y196 and MYD88 L265P). Moreover, all five genetic alterations were specifically enriched in either GCB or non-GCB subtype DLBCL patients (> 20% positive in GCB or non-GCB DLBCL patients). In addition, these alterations exhibited most distinctive frequencies between GCB and non-GCB DLBCL subtypes by Fisher’s test (Fig. 1a). Thus, using the five main features above, we initially defined four non-mutually exclusive genetic signatures: 1) the MYC-trans signature, with MYC translocation (n = 54); 2) the BCL2-trans signature, with BCL2 translocation (n = 59); 3) the BCL6-trans signature, with BCL6 translocation (n = 91); and 4) the MC signature, with MYD88L265P and/or CD79BY196 mutations (n = 72) (Fig. 1f).
Among the above-mentioned four signatures, MC signature combined CD79B Y196 and MYD88 L265P variants as they not only presented as hotspot mutations in DLBCL patients, but also exhibited statistically significant tendency for co-occurrence (adjusted P value 1.09 × 10− 9). In addition, previous researches also revealed that both variants resulted in constitutive activation of NF-κB signaling pathway [5]. Inspired by the study conducted by R. Schmitz et al., we aimed to evolve and maximize each genetic signature with our set of genetic features while appropriately maintaining the pattern suggested by the initial genetic signature. To alleviate such semisupervised problems, we developed an iterative random forest (RF) algorithm (Supplementary Appendix). The label of each genetic signature among cases gradually propagated and obtained convergence (Supplementary Table S4; Fig. 1g). Additionally, 8 (14.8%), 10 (16.9%), 17 (18.7%), and 43 (59.7%) cases were predicted to exhibit the MYC-trans, BCL2-trans, BCL6-trans, and MC signatures, respectively, suggesting that the initial definition of the MC signature might be conservative. As a result, 252 out of 342 cases (73.7%) were finally confirmed to be associated with at least one genetic signature.
Next, we investigated other genetic mutations statistically associated with one of these genetic signatures. As illustrated in Fig. 2, genetic mutations of each case were combined and clustered within different genetic signatures, and were shown in factorized mutational heatmap. Firstly, MYC and ID3 mutations were associated with the MYC-trans signature (P < 0.001), and 40% (8/20) of cases with isolated MYC-trans signatures harbored mutations in the ID3-TCF3-CCND3 pathway. We also recognized that all MYC hypermutations were identified in cases with MYC-trans signatures (20/20, 100%), while MYC non-hypermutations were common in cases with either MYC-trans signatures (10/25, 40%) or BCL6-trans signatures (15/25, 60%). Secondly, BCL2, EZH2, CREBBP, STAT6, and KMT2D mutations were significantly related to the BCL2-trans signature (P < 0.001). Although the BCL2 mutation was associated with the BCL2-trans signature, cases harboring the BCL2 hypermutation usually implied that they had a combined MYC-trans and BCL2-trans signature (6/6, 100%). For chromatin modification-associated genes such as KMT2D and CREBBP, cases harboring co-occurring mutations in KMT2D and CREBBP generally indicated a BCL2-trans signature (21/24, 87.5%). Thirdly, for the BCL6-trans signature, the CD70, KLF2, NOTCH2, and RRAGC mutations were specifically identified (P < 0.001). Although the CCND3 mutation was more specifically associated with the MYC-trans signature (P = 0.001), it was also frequent in cases with the BCL6-trans signature (16/108, 14.8%). A vast majority of KLF2 zinc finger mutations (15/22, 68.2%) were identified in cases with BCL6 translocation (or BCL6-tran signature, 21/22, 95.5%), which had not been previously reported. Finally, for the MC signature, in addition to the CD79BY196 and MYD88L265P mutations, other types of mutations, such as PIM1 and PRDM1, were also significantly related to the MC signature (P < 0.001). XPO1 E571K, a hotspot mutation in chronic lymphocytic leukemia (CLL) and PMBL, was also frequently identified in cases with MC signatures and was usually accompanied by the BCL6-trans signature [25, 26].
Model comparison with classical DLBCL subtype classifier and its prognostic significance
In order to validate our genetic classification algorithm, we compared our model with the classical DLBCL genetic classifier built from Schimtz et al. [4] for 239 de novo DLBCL NOS cases in our study cohort. As illustrated from Fig. 3a, 65% of all cases (n = 155) were successfully classified into four genetic subtypes (MCD n = 66, BN2 n = 55, EZB n = 30, N1 n = 4). In comparison, 75% of all cases (n = 175) could be classified in at least one signature subtype. COO classification also demonstrated similar type distribution (GCB and non-GCB) between two models (Fig. 3b). As for each genetic subtype of Schimtz et al. (Fig. 3c), the majority of cases within MCD subtype could be grouped in MC-trans signature (63 of 66, 95.4%), and the consistent result was seen in BCL6-trans signature within BN2 subtype (54 of 55, 98.2%) and BCL2-trans signature within EZB subtype (29 of 30, 96.7%). However, in addition to the consistency between two models mentioned above, we did find that a portion of the DLBCL cases within each subtype of Schimtz’s model carried 2 or more signatures. In detail, within MCD and BN2 subtypes, 15 out of 66 (22.7%) and 19 out of 55 (34.5%) patients carried both MC-trans and BCL6-trans signatures, respectively. While in EZB subtypes, 6 out of 30 (20.0%) patients carried both BCL2-trans and BCL6-trans signatures.
To evaluate the prognostic value of our genetic subtype model, we selected all de novo patients with large B-cell lymphoma who received R-CHOP or R-CHOP-like chemotherapy (n = 280, maximum follow-up 60 months, median follow-up 26 months). We next constructed a multivariate Cox proportional hazard regression model considering both genetic signatures and IPI scores as variables. The MYC-trans signature was the most unfavorable genetic signature, and the MYC-trans signature had a hazard ratio (HR) of 2.00 compared with the absented MYC-trans signature (OS: P = 0.006) (Supplementary Table S5). Those who presented a BCL2-trans signature had a relatively favorable 5-year PFS, with a borderline significance (P = 0.087). According to the non-mutually exclusive nature of our set of four genetic signatures and several latest research achievements [3, 5, 11, 12, 27], we aimed to explore the differences in prognostic impact for de novo DLBCL cases with various genetic signature numbers. Firstly, in order to exclude the potential influences of confounding factors, especially IPI score, we examined the statistical differences of IPI score group distribution (low 0–1, intermediate 2–3, high 4–5) between groups of patients with varying number of genetic signatures (0-sig, 1-sig, 2-sig, 3-sig). As a result, no statistical differences of IPI level distribution were identified between 0-.
Sig, 1-sig, 2-sig and 3-sig patient groups (p > 0.05, Chi-square and Fisher Exact test with Bonferroni adjustment). As reflected by the 5-year OS and PFS time (Fig. 4a-b), we found that individuals carrying three signatures had much worse prognosis than individuals without any genetic signature (OS: P = 0.0084; PFS: P = 0.3274), while patients with only one genetic signature exhibited no significant difference in prognosis compared with those without any signature (Fig. 4c-d). In addition, further subgroup survival analysis indicated that within EZB subtypes of Schmitz model, patients carrying BCL2-trans plus BCL6-trans or MC-trans signatures exhibited significantly inferior prognosis, compared with patients carrying BCL2-trans signature only (OS: P = 0.002; PFS: P = 0.039) (Fig. 4e-f). However, no prognostic differences were identified in patients carrying different number of signatures within MCD, BN2 or N1 subgroups. The above findings provided evidence that these non-mutually exclusive genetic signatures exhibited cumulative prognostic influences, and patient heterogeneity still existed in traditional mutually exclusive classification model for DLBCL patients in our cohort, which requires further confirmation in larger multi-center cohort studies.