Compare multiple of spectra
Multiple spectra analysis allows to quickly extract information about a set of spectra.
Spectra selection, normalization and previsualization
The first step is to select the spectra :
How to select spectra.
Spectra selection
All the spectra analysis tools start with a phase of selection.
Select samples
In order to facilitate the analysis of the spectra it is advised to have samples containing representative spectra in order to evaluate the intra-variability as well as the reproducibility.
Selection of spectra to analyze is achieved with one of those 3 methods:
At the level of the sample by either clicking on the +
, this will add all the spectra related to this sample or on the +
on the top of the sample box to add all the spectra of all the selected samples.
If you select a sample it is also possible to add a specific spectrum by clicking on the +
at the level of the spectra list.
Once spectra have been selected, data normalization filters can be applied :
How to normalize spectra.
Preprocessing
In order to compare spectra it is required to create a matrix. In this matrix each row corresponds to a spectrum while the columns are the various values for a specific X. To create this matrix we apply various preprocessing methods that consist of:
- filter the data in order to reduce the impact of sample preparation or experimental artifacts using various filters
- select the representative part of the spectra that is expected to be important for the analysis
- remove large peaks not characteristic to the sample (like water in NMR spectra) that could interfere with the analysis
- reduce the number of points in order to accelerate the analysis
- apply matrix related processing allowing to normalize the columns
Filters
You may also apply various Filters
that allows to normalize or transform the data. Among those filters we have:
- Center mean
- Divide by SD (standard deviation)
- Rescale: set the min value to 0 and the max value to 1
- Normalize: set the sum of all the points to 1
- Align: create a peak picking between
from
/to
and calculate the mean X value between thenbPeaks
highest peaks. The spectrum will be moved so that the mean has thetargetX
value. - Pareto: Pareto scaling, which uses the square root of standard deviation as the scaling factor, circumvents the amplification of noise by retaining a small portion of magnitude information. 10.1016/j.molstruc.2007.12.026
- Savitzky-golay: smoothing spectra and calculate derivatives based on the following parameters:
windowSize
: smoothing window, must be an odd numberderivative
: enter 0, 1 or 2polynomial
: the degree of the polynomial used to calculate SG
- X function: a function that modifies the X axis based on the
x
parameter. Like for examplelog(x)
- Y function: a function that modifies the Y axis based on the
y
parameter. Like for examplelog10(y+1)
One classical preprocessing algorithm is Standard Normal Variate (SNV). This preprocessing can be achieved by selecting the 2 options Center mean
and Divide by SD
.
Selecting the range
Only the information between the From
and To
values of the range will be considered.
Exclusions
Depending on the analysis some region should be removed in order to improve the analysis. For example NMR spectroscopy in water yields to a large peak around 4.5ppm and using exclusion zone it can be removed from the analysis.
Number of points
The data normalization process will select equidistant Nb points
between the From
and To
values.
Matrix processing
Once all the previous filters have been applied we obtain a matrix in which rows represent the normalized spectra and columns represent the intensity of teach spectrum.
Some filters are using the columns for further processing like:
- PQN: Probabilistic Quotient Normalization (10.1021/ac051632c)
- Center mean: for each column the mean of the values will be centered
- Rescale (0 to 1): for each column the min value will be set to 0 and the max value to 1
Large dataset
The list of the spectra in the dataset is displayed in the following table:
In some cases it is not possible to keep in memory the original spectra and the system will only keep the normalized spectra. Therefore, it will not be possible to change the normalization parameters anymore.
Preview
A preview of the normalized spectra as well as the exclusions zones will be displayed. This allows to fine tune the processing.
The superimposed spectra can be manipulated without numerous advanced features described here.
The superimposed spectra can be manipulated without numerous :
How to visualize spectra.
Spectra visualization
Numerous options are available to display the either all the spectra in the dataset or the selected spectra in the dataset.
Selection of spectra in the dataset
The toolbar on the top of the list of spectra in the dataset provides many options (from left to right):
- Remove all spectra from dataset
- Select category: select which property contains the category description
- Download normalized matrix
- Recolor spectra based on category: a different color will be applied for each category. By default, the sample reference
- Select all spectra
- Append to selected spectra
- Select only current spectra
- Remove spectra from current selection
- Unselect all spectra
Graph options
It is possible to either display the selected spectra, all the spectra or various derived information.
Customization of the display is achieved using the chart toolbar:
Display spectra
The first options allow to either display all the spectra, only the selected spectra or nothing.
Displaying no spectrum is useful when displaying other derived data.
Original / normalized
These options allow to either display the original spectra or the normalized data. Most of the time we will display normalized data. Those are the data that will be analyzed, and normally they also take less room in memory.
Boxplot
The boxplot kind of representation allows to display the first / third quartile as a dark grey zone for each X point. The min and max values are represented as a light gray zone and the median is represented as a line for which the color varies based on the standard deviation (red: high variation, blue: small variation).
Tracking information
By selecting the tracking information you will display the X values and the corresponding Y values for all the spectra.
Correlation
Correlation of the vector represented by the Y points can be useful to determine which peaks are correlated in a big mixture of products. This is known in NMR metabolomics as STOCSY.
By SHIFT ⇧
+ ALT + click you can select the X value for which you would like to check correlation. Strongly correlated signals will appear in red while non correlated signals are blue.
Comparing spectra
Once you have superimposed spectra you can define ranges by maintaining alt pressed and click once left and right of the range you want to define.
A column will be added in the dataset table that contains the integration of the selected area. It is also possible to define more than one range by repeating this procedure.
In the Define ranges
tab it is possible to rename the variables as well as to define custom formulae based on the ranges integration.
All the ranges and custom calculations will be automatically added in the table. From the table it is also possible to export all the data to a spreadsheet by clicking the export icon in the table toolbar.
Relative spectra
It is possible to select a reference spectrum (target spectrum) so that all the other spectra are represented as the relative value to the target. To select the target spectrum click on the icon in the last column. The line will become green.
The target spectrum will now be a horizontal line.
In the preferences you can uncheck Relative
to come back to the original view.
Scaling spectra
Spectra can be rescaled based on the full spectrum or a specific range. Rescale can be based either on the integral, the minimal peak height or maximal peak height.