Unraveling the Correlation between Raman and Photoluminescence in Monolayer MoS 2 through Machine-Learning Models

Two-dimensional (2D) transition metal dichalcogenides (TMDCs) with intense and tunable photoluminescence (PL) have opened up new opportunities for optoelectronic and photonic applications such as light-emitting diodes, photodetectors, and single-photon emitters. Among the standard characterization tools for 2D materials, Raman spectroscopy stands out as a fast and non-destructive technique capable of probing material crystallinities and perturbations such as doping and strain. However, a comprehensive understanding of the correlation between photoluminescence and Raman spectra in monolayer MoS2 remains elusive due to its highly nonlinear nature. Here, we systematically explore the connections between PL signatures and Raman modes, providing comprehensive insights into the physical mechanisms correlating PL and Raman features. Our analysis further disentangles the strain and doping contributions from the Raman spectra through machine learning models. First, we deploy a DenseNet to predict PL maps by spatial Raman maps. Moreover, we apply a gradient boosted trees model (XGBoost) with Shapley additive explanation (SHAP) to bridge the impact of individual Raman features in PL features, allowing us to link the strain and doping of monolayer MoS2 . Last, we adopt a support vector machine (SVM) to project PL features on Raman frequencies. Our work may serve as a methodology for applying machine learning in 2D material characterizations and providing the knowledge for tuning and synthesizing 2D semiconductors for high-yield photoluminescence. This article is protected by copyright. All rights reserved.


Introduction
2D materials possess exotic physical and chemical properties due to their ultrathin thickness and ultrahigh surface-to-volume ratio.Monolayer transition metal dichalcogenides (TMDCs) semiconductors exhibit tunable photoluminescence (PL), which can be manipulated by external perturbations, such as strain and doping.For instance, monolayer MoS 2 possess a strain-tunable band structure, exhibiting broadband optical absorption for photovoltaics [1] and promising single-photon emission for quantum information [2] applications.Monolayer MoS 2 also shows near-unity PL quantum yield induced by chemical- [3] or electrostatic-doping [4] , enabling the development of efficient lightemitting diodes [5] or lasers. [6]In order to probe external perturbations, Raman spectroscopy represents a powerful and non-destructive tool to quantitatively determine strain and doping effects on MoS 2 .Although the strain and doping effects on 2D transition metal dichalcogenides (TMDCs) with intense and tunable photoluminescence (PL) have opened up new opportunities for optoelectronic and photonic applications such as light-emitting diodes, photodetectors, and single-photon emitters.Among the standard characterization tools for 2D materials, Raman spectroscopy stands out as a fast and non-destructive technique capable of probing material's crystallinity and perturbations such as doping and strain.However, a comprehensive understanding of the correlation between photoluminescence and Raman spectra in monolayer MoS 2 remains elusive due to its highly nonlinear nature.Here, the connections between PL signatures and Raman modes are systematically explored, providing comprehensive insights into the physical mechanisms correlating PL and Raman features.This study's analysis further disentangles the strain and doping contributions from the Raman spectra through machine-learning models.First, a dense convolutional network (DenseNet) to predict PL maps by spatial Raman maps is deployed.Moreover, a gradient boosted trees model (XGBoost) with Shapley additive explanation (SHAP) to bridge the impact of individual Raman features in PL features is applied.Last, a support vector machine (SVM) to project PL features on Raman frequencies is adopted.This work may serve as a methodology for applying machine learning to characterizations of 2D materials.
MoS 2 have been extensively investigated through Raman and PL spectroscopy, most studies focused on these perturbations independently (see a summary of previous results in Tables S1  and S2, Supporting Information, for strain and doping, respectively).Discovering the hidden correlations between Raman and PL spectra can enable us to understand the strain and doping effects comprehensively; however, these connections have not been established yet.Recently, machine learning significantly propelled the advancement of computer vision and natural language processing, and contributed to many scientific fields, such as biology, [7] mathematics [8] and material science. [9]In addition, multiple research efforts have been applied machine-learning methods to 2D materials research, [10,11] but at this stage, the investigations are still preliminary.In this work, we leveraged a collection of machine-learning techniques to effectively discover the hidden pattern between Raman and PL spectra of MoS 2 , which provide insights into the physical mechanisms connecting PL and Raman features.First, we deployed a DenseNet model with high accuracy to predict PL features from Raman spectral maps.Second, by combining a XGBoost model with the SHAP explainer, we bridge Raman and PL features with the global importance and local explanations based on the feature attributions.Finally, we projected PL features with Raman frequencies using a SVM model and combined them with the density probability estimated by the Gaussian mixture model (GMM) to disentangle the strain and doping effects.Our work demonstrates how machine-learning models can be used to establish hidden connections between different characterizations of 2D materials.

Result and Discussion
The diagrammatic overview in Figure 1a shows the knowledge path of the machine-learning models (represented by the red lines), which starts with experimental data obtained from Raman or PL, links to the knowledge about the material's properties, and ends at the external perturbations and structural defects.This approach allows us to use the information from previous studies investigating changes in the Raman and PL spectra by controlling a single external perturbation, that is, either strain (the green line) or doping (the blue line), to form a more comprehensive understanding of the combined effects of doping and strain on MoS 2 monolayers.Also, statistical data analysis allows us to significantly eliminate bias from samples or experimental design.In our framework, the statistical analysis/machine-learning model path establishes the connections between Raman and PL features, crystal and electronic structures, and strain and electrostatic doping.The MoS 2 monolayers investigated in our work consisted of chemical vapor deposition (CVD)-grown flakes with triangle, random, and hexagonal shapes (see Section S1.1 and Figure S1, Supporting Information) as well as mechanically exfoliated flakes (see Section S1.2 and Figure S2, Supporting Information) obtained from naturally grown and synthetic crystals.For each flake, we measured Raman and PL spectral maps using a 532 nm laser excitation (see Section S1.3, Supporting Information).The Raman spectra of MoS 2 monolayers exhibit three prominent features, as shown in Figure 1b, corresponding to the first-order in-plane E′ mode (≈385 cm −1 ) and out-of-plane A 1 ′ mode (≈405 cm −1 ), as well as the second-order double resonance 2LA mode (≈450 cm −1 ).Each mode can be fitted using a Voigt function, as shown in Figure S3, Supporting Information, from which we can extract three parameters: frequency (Freq, ω ), full-width-at-half-maximum (FWHM, Γ ), and intensity (Int, I).The distribution of Raman frequencies for the investigated MoS 2 monolayers is shown in Figure S4a, Supporting Information, revealing the softening of A1 ω ′ in natural grown MoS 2 and the softening of ω ′ E in hexagonal-shape MoS 2 compared with the one exfoliated from synthetic MoS 2 .The PL signals from MoS 2 shown in Figure 1c are dominated by trions due to the laser power of ≈1 mW [12] used in our measurements (see Section S1.3, Supporting Information).From inspection of Figure 1c, we note that the hexagonal-and randomly-shaped MoS 2 flakes typically exhibit a lower PL energy with broader FWHM and weaker intensity than the triangle-shaped and exfoliated MoS 2 crystals.To further investigate the PL features, we plotted the PL FWHM as a function of the normalized PL intensity in Figure 1d, showing a clear correlation between a stronger PL intensity and a narrower PL FWHM, which is typical of synthetic and triangle-shaped crystals.
1.855 0.0047/( 0.25) PL PL 1.879 0.3 Since the PL spectrum is fitted with a Voigt profile, we plotted the FWHM as a function of the intensity with a fixed integrated area of 0.1, displayed as the blue dots in Figure S5, Supporting Information.This trend can be well fitted by a reciprocal function (Equation ( 1)) as the blue curve in Figure S5, Supporting Information.Due to the unideal processing of spectral background subtractions, the red curve represents the blue curve with intensity and FWHM backgrounds of 0.25 and 0.08 in Figure 1d, respectively.As can be seen, the data distribution in Figure 1d matches the red curve obtained from Equation (2).Similarly, Figure 1e displays a reverse reciprocal function distribution (Equation ( 3)), indicating that a stronger PL intensity is accompanied by a higher PL energy.Furthermore, the PL energy has a linear relation (Equation ( 4)) with the PL FWHM, as shown in Figure 1f.Overall, our combined results indicate that the triangular, natural and synthetic MoS 2 flakes exhibit stronger, narrower and blueshifted PL peaks, indicating higher crystal quality compared to the random and hexagonal MoS 2 flakes.
In order to establish correlations between our Raman and PL data, we leverage several machine-learning models to reveal hidden patterns and bridge physical phenomena.Among machine-learning models, deep convolutional neural networks (CNNs) are extensively used in many visual recognition tasks, allowing us to extract invaluable information from different imaging systems ranging from biomedical [13] to transmission electron microscopy [14] and hyperspectral images. [15]Similarly, spectral maps can be considered an image-based dataset with multiple channels, such as the number of spectral points.
Here, by using CNNs, we correlate the Raman spectra with the corresponding PL features for the CVD-grown and exfoliated MoS 2 flakes.The spectral dimensions of Raman and PL spectra are reduced by the fitting process (see Section S1.6, Supporting Information).The spectral-map dataset for a hexagonal-shaped MoS 2 is depicted in Figure S7, Supporting Information, including the optical microscopic image, the Raman maps with eight channels as the input, and the corresponding PL maps with three channels as the output.It is noted that 2LA Γ is removed from the input channels due to the high variance of its data distribution.
In contrast to traditional images with a fixed spatial dimension, our spectral images have varied spatial dimensions from 32-by-32 to 48-by-48 pixels, depending on the crystal size of MoS 2 .Therefore, we cropped the Raman spectral images into smaller patches with odd spatial dimensions as the inputs and labeled the PL of the central pixel as the output, as shown in Figure S8, Supporting Information.Next, we deployed a DenseNet [16] model to predict three PL features from Raman spectral images.Compared to many powerful CNN models, such as U-Net [17] and SegNet, [18] DenseNet requires fewer down-sampling layers and fewer parameters with comparable  accuracy, which is more suitable for a small dataset and small pixelated inputs.Figure 2a shows the schematic illustration of the DenseNet with two dense blocks and a transition layer (see Section S1.8, Supporting Information).To reveal the correlation between the spatial information of the Raman patched maps and the DenseNet performance, we input various Raman patched maps from a local spatial size of 1-by-1 to a global spatial size of 11-by-11 for the intensity of a hexagonal MoS 2 , as shown in Figure 2b-d.The 1-by-1 patch size shows a higher error of 21.48% due to the less spatial information of Raman maps; on the other hand, the slightly higher relative error of 11.86% for the 11-by-11 patch size could be attributed to the zero-padding around the edges of patched input.Among all the patch sizes, the 5-by-5 patch size exhibits the lowest relative absolute error (RAE, see Section S1.8, Supporting Information) of 10.31% on the PL intensity of a triangle-shaped MoS 2 , which collects neighboring Raman signals while excluding irrelevant spatial information.The middle column of Figure 2e-g displays the typical predictions of the PL energy, FWHM, and intensity from the trained DenseNet model for a random-shape MoS 2 with a 5-by-5 patch size.To evaluate the model performance, we used the experimentally measured PL maps as the ground truth in the left column and calculated the relative error in the right column of Figure 2e-g.Compared with PL energy and FWHM with the low RAE of 0.25% and 4.61%, the RAE of 10.93% for PL intensity could be due to the unideal experimental conditions and processing error, such as the spectroscopic conditions, the spectral background subtraction and the fitting process.
Although CNNs represent the state-of-the-art technique to make inference on image-or spectral-based tasks, their multilayer nonlinear structures are often criticized as non-transparent and non-explainable. [19]Hence, we divided spectral maps into a tabular dataset with appropriately 7000 individual points.We applied a XGBoost model trained on the tabular dataset and interpreted the model through SHAP to unravel the correlation between Raman and its corresponding PL on MoS 2 monolayers.XGBoost is adopted widely on a tabular-style dataset with individually meaningful features but lacks temporal or spatial structures. [20]To bridge the XGBoost predictions with the physical meaning behind them, SHAP provides local explanations and global patterns based on game theory. [21]Figure 3 displays the SHAP summary plots, showing how a Raman feature's value impacts the predicted PL feature.Each dot represents the prediction from the model, and the coded colors indicate the value of a Raman feature.For example, a higher value (red) of ω ′ E is associated with a higher SHAP value, indicating the higher PL energy in Figure 3a.The bar charts in Figure 3   characteristics of the Raman spectrum to predict the PL features.
The average SHAP importance of E′ , A 1 ′ and 2LA modes for PL are 67.6%,25.5% and 6.9%, respectively.[24] Since the E′ has the most significant contribution (67.6%) in predicting the PL features, we conclude that the PL response should be dominated by strain rather than  Γ ′ is associated with higher PL energy, as shown in Figure 3a.The FWHM correlation with the PL features extracted from Figure 3 is consistent with Figure 1d, showing that narrower PL FWHM leads to a higher PL energy and stronger PL intensity; ii) strain effects: tensile strain in MoS 2 leads to decreased PL intensity and energy.[25,26] Figures 3a and 3c show that the softening of ω ′ E is linked to lower PL energy and intensity, which is consistent with the strain-induced modulations in the electronic structure of monolayer TMDs.This phenomenon connects to three scenarios.First, the application of tensile strain induces a direct-toindirect bandgap transition from K K − to K Γ − points in the Brillouin zone (BZ) of monolayer MoS 2 . [1,23]Although the transition shall occur at relatively high strain, the expected response of the PL under tensile strain is to gradually quench and reduce its energy, as can be seen in Figures 3a and 3c.Second, a weaker 2LA I intensity correlates with a lower PL energy and weaker PL intensity, as shown in Figures 3a and 3c.This feature could also be explained by a tensile-strain induced effect, which results in reducing the bandgap energy and increasing the energy difference between the conduction band minima at K and Q points (ΔE KQ ) in the BZ of monolayer MoS 2 . [1,23]he redshift of the bandgap energy away from our laser excitation energy (at 2.33 eV) reduces the probability of optical absorption/emission at the K/K′ valleys.At the same time, the increase in ΔE KQ dramatically suppresses the intervalley scattering between K and Q; both factors inhibit the double-resonance Raman (DRR) process associated with the 2LA band, quenching its intensity. [27,28]Third, stronger I E ′ and ′ A1 I show a lower PL energy and broader PL FWHM (Figure 3a,b).The reason is that the decrease of the MoS 2 bandgap lowers the PL energy and broadens the PL FWHM, and it also redshifts the C excitons energy (at ≈2.8 eV), [25] bringing its energy closer to resonance with our laser excitation; and iii) doping effects: electron-accumulated MoS 2 shows weaker PL intensity.From Figure 3, we can extract that the ′ A 1 softening is associated with lower PL energy (Figure 3a), broader PL FWHM (Figure 3b), and weaker PL intensity (Figure 3c).Since the ′ A 1 mode softening indicates electron doping, [22] a weaker, broadened and redshifted PL is expected from recent theoretical predictions [23] and experimental observations, [24] which is consistent with our results.
Decomposition of strain and doping effects on graphene has been demonstrated by analyzing the shifts of the G and 2D band frequencies. [29,30]Based on the SHAP importance results, E ω ′ and ω ′ A1 dominate the PL features for monolayer MoS 2 .Although this approach of decomposing strain and doping has been applied to MoS 2 , [31,32] the details of the physical phenomenon were not fully captured in the Raman frequency plot for strain and doping effects.Therefore, we study the decomposition of strain and doping effects as a function of E ω ′ and ω ′ A1 , as shown in Figure 4a, and define three quantities: the intrinsic point (defined as the undoped and unstrained state), the strain-, and the doping-base vectors.First, the Raman frequencies of the E′ and ′ A 1 modes for the intrinsic point of MoS 2 are elusive.To determine the intrinsic point, we compared several E ω ′ and ω ′ A1 data of CVD-grown and exfoliated MoS 2 monolayers from the literature and our data, as shown in Figure S11, Supporting Information.The majority of the data points lay on the line defined by the frequency difference: A E 1 ω ω − ′ ′ = 19 cm −1 , which is recognized as the standard value for monolayer MoS 2 . [33]ince the Raman frequencies of exfoliated synthetic MoS 2 is at the center of the data distribution in Figure S11, Supporting Information, we define its Raman frequencies as the intrinsic point, at (385. 3, 404.5) for ( E ω ′ , ω ′ A1 ), which is represented by the orange circle in Figure 4a.Second, tensile strain shifts the E ω ′ and ω ′ A1 of 4.48 cm −1 /% and 1.02 cm −1 /% as shown by the red line in Figure 4a, which is attributed to the ratio of the Grüneisen parameters for the E′ and ′ A 1 phonons. [34]Owing to the lack of Raman studies for MoS 2 with compressive strain, we referenced the result from Pak et al., [35] showing that the shifts of Raman frequencies for tensile strain are 1.56 times higher than for compressive strain.Third, the doping effects in MoS 2 are drawn by a black line.Recently, Sohier et al. [23] reported that the ω ′ A1 mode softens for electron accumulation but remains unchanged for hole doping.Hence, the vector of the electron doping ( E ω ′ , ω ′

A1
) is (−0.15,−1.19) cm −1 /10 13 cm −2 in the low electron concentration region, shown in Figure 4a. [24]To understand the phenomena of the hardening of A1 ω ′ , one possible reason is that it is caused by substitutional doping, which could be induced by the molybdenum precursor (MoO 3 ) or the oxygen flow during the CVD growth.Tang et al. [36] demonstrated that CVD-grown MoS 2 with substitutional oxygen doping exhibits the softening of E ω ′ and the hardening of A1 ω ′ , resulting in the shift of −0.18 and 0.2 cm −1 /at% in Figure 4a. Figure S12, Supporting Information, shows the scatter plot of E ω ′ and ω ′ A1 considering strain and doing effects; the coded color indicates the PL energy, intensity and FWHM for Figures S12a, S12b and S12c, Supporting Information, respectively.The results show that the intrinsic point exhibits higher PL energy, stronger intensity, and narrower FWHM.To further predict PL from the Raman frequencies of MoS 2 , we adopted a SVM method, which has better performance in handling sparse data and is more robust against overfitting than XGBoost. [37,38]The probability density estimation of our dataset is fitted by a GMM with three clusters, and the negative log probability of 5 is drawn in Figure S12d, Supporting Information, providing the predicted probability for the SVM projections for Figure 4. Figure 4b-d shows the predictive results by SVMs for PL energy, FWHM, and intensity, respectively.Based on the projections, the highest PL energy is 1.89 eV, shown as the orange circle at (386.3, 405.1) in Figure 4b, consistent with previous experimental results for slightly holedoped and compressively strained monolayer MoS 2 . [24,39]The predicted FWHM minimum is 0.096 eV, located at (385.0, 405.4) in Figure 4c, which is consistent with the result of the PL energy (narrower PL correlates with higher PL energy, see Figure 1g).In Figure 4d, the predicted strongest PL intensity is located at (385.0, 404.4).To further decompose various structural defects and external perturbations into strain and doping effects, we plotted various defected-based vectors on MoS 2 Raman frequencies from the literature in Figure S13, Supporting Information, which may simplify complex contributions into two simple factors, guiding us for optimizing and manipulating material synthesis and modifications.

Conclusion
We have demonstrated a framework for capturing the correlations between Raman and PL, which are essential to tune the optical properties of MoS 2 by external perturbations for the understanding, prediction, and design of next-generation devices.We utilize the DenseNet model to build end-to-end connections from Raman spectral maps to photoluminescence.To gain more comprehensive insights into the physical mechanisms of strain and doping effects, we adopt the XGBoost model with the SHAP explainer and reveal that E ω ′ , A1 ω ′ , and I E ′ are the three dominant Raman characteristics for prediction of PL features, which further indicates that the strain effects govern the PL response more than the doping effects in our dataset.We further disentangle strain and doping effects and predict the location of the intrinsic point using the SVM model with the probability density estimation by GMM, where PL features are projected to predict the extremum points on the Raman frequency plot.The proposed methodology establishes an analytical approach to comprehensively interpret experimental data to explore hiden connections and novel phyisics from Raman and PL spectra of 2D materials and may extend to other types of optical spectroscopies and condensed matters.

Figure 1 .
Figure 1.Raman and PL spectra of CVD-grown MoS 2 monolayers.a) Overview of the path to unravel correlations between Raman and PL features with external perturbations.The green and blue lines correspond to studies of strain and doping effects obtained from the literature in Table S1 and S2, Supporting Information, respectively.The red-dashed line indicates the discovering path by the machine-learning models in this work.b) Raman and c) PL spectra of CVD-grown (hexagonal, random, and triangle) and exfoliated (natural and synthetic) MoS 2 .The vertical dashed lines denotes the Raman E′ and A 1 ′ frequencies for the synthetic MoS 2 in (b) and the MoS 2 exciton energy of 1.86 and 1.89 eV for trion and exciton, respectively, in (c).d-f) Scatter plots for PL features for MoS 2 monolayers: d) PL FWHM as a function of the normalized intensity following the multiplicative inverse function (solidblack line) described by Equation (2); e) PL energy as a function of the normalized intensity following the reverse multiplicative function (solid-black line) described by Equation (3); and f) PL energy as a function of the PL FWHM following a linear function (solid black) described by Equation (4).
show the SHAP importance value, allowing us to evaluate the global contribution from each Raman parameter to the PL features.Among the Raman features, E ω ′ , A1 ω ′ , and I E ′ are the three dominant Adv.Mater.2022, 34, 2202911

Figure 2 .
Figure 2. PL predictions from the trained DenseNet.a) Schematic illustration of a DenseNet model with two dense blocks.b-d) The patch size effect on the DenseNet for the predicted intensity of a triangle MoS 2 : b) 1-by-1, c) 5-by-5 and d) 11-by-11 with 21.48%, 10.31% and 11.86% relative absolute errors, respectively.e-g) the PL mapping predictions of energy (e), FWHM (f), and intensity (g) for CVD-grown MoS 2 with random shape.Left: the measured PL maps as the ground truth for the DenseNet model.Middle: the predicted results by the trained DenseNet with 5-by-5 patch inputs.Right: the relative error between measured and predicted PL maps.

Figure 3 .
Figure 3. Correlation analysis for Raman and PL by XGBoost with SHAP values.a) PL energy, b) PL FWHM, and c) PL intensity results are interpreted by the trained XGBoost model with Shapley additive explanations (SHAP).The Raman features are sorted in descending order according to global parameter importance.Left: the global importance of Raman features based on the average SHAP value magnitude for PL features.Right: a set of beeswarm plots corresponding to a single pair of Raman and PL.The vertical axis displays the sorted Raman features while the horizontal axis shows the impact of the model output.Each data point represents a predicted output and the color indicates the Raman features values.

Figure 4 .
Figure 4. Predictions of PL for monolayer MoS 2 by support vector machine models with the strain and doping vectors.a) Schematic representation of strain and doping base vectors for ( , ) E A1 ω ω ′ ′ coordinates.The red and black solid lines correspond to strain and doping, respectively.The orange circle is denoted as the intrinsic point, defined as the charge-neutral and unstrained state, according to the mean value of Raman frequencies of the exfoliated synthetic MoS 2 .b-d) Contour maps of PL predictions for energy (b), FWHM (c), and intensity (d) through the trained SVM models.The orange circles are the extremum points based on the predictions by the SVM models, located at (386.3, 405.1) with 1.89 eV in (b), (385.0,405.4) with 0.096 eV in (c), and (385.0,404.4) with 1.22 in (d).