Welcome
Welcome to use this package, you can download the code from Github. If you have any question, please send E-mail to
Before calculating spectral similarity, it’s highly recommended to remove spectral noise. For example, peaks have intensity less than 1% maximum intensity can be removed to improve identificaiton performance.
Usage
- spectral_similarity.all_distance(spectrum_query: Union[list, ndarray], spectrum_library: Union[list, ndarray], ms2_ppm: Optional[float] = None, ms2_da: Optional[float] = None, need_clean_spectra: bool = True, need_normalize_result: bool = True) dict[source]
-
Calculate the distance between two spectra, find common peaks. If both ms2_ppm and ms2_da is defined, ms2_da will be used.
- Parameters:
-
spectrum_query – The query spectrum, need to be in numpy array format.
spectrum_library – The library spectrum, need to be in numpy array format.
ms2_ppm – The MS/MS tolerance in ppm.
ms2_da – The MS/MS tolerance in Da.
need_clean_spectra – Normalize spectra before comparing, required for not normalized spectrum.
need_normalize_result – Normalize the result into [0,1].
- Returns:
-
Distance between two spectra
- spectral_similarity.all_similarity(spectrum_query: Union[list, ndarray], spectrum_library: Union[list, ndarray], ms2_ppm: Optional[float] = None, ms2_da: Optional[float] = None, need_clean_spectra: bool = True, need_normalize_result: bool = True) dict[source]
-
Calculate all the similarity between two spectra, find common peaks. If both ms2_ppm and ms2_da is defined, ms2_da will be used.
- Parameters:
-
spectrum_query – The query spectrum, need to be in numpy array format.
spectrum_library – The library spectrum, need to be in numpy array format.
ms2_ppm – The MS/MS tolerance in ppm.
ms2_da – The MS/MS tolerance in Da.
need_clean_spectra – Normalize spectra before comparing, required for not normalized spectrum.
need_normalize_result – Normalize the result into [0,1].
- Returns:
-
A dict contains all similarity.
- spectral_similarity.distance(spectrum_query: Union[list, ndarray], spectrum_library: Union[list, ndarray], method: str, ms2_ppm: Optional[float] = None, ms2_da: Optional[float] = None, need_clean_spectra: bool = True, need_normalize_result: bool = True) float[source]
-
Calculate the distance between two spectra, find common peaks. If both ms2_ppm and ms2_da is defined, ms2_da will be used.
- Parameters:
-
spectrum_query – The query spectrum, need to be in numpy array format.
spectrum_library – The library spectrum, need to be in numpy array format.
method – Supported methods: “entropy”, “unweighted_entropy”, “euclidean”, “manhattan”, “chebyshev”, “squared_euclidean”, “fidelity”, “matusita”, “squared_chord”, “bhattacharya_1”, “bhattacharya_2”, “harmonic_mean”, “probabilistic_symmetric_chi_squared”, “ruzicka”, “roberts”, “intersection”, “motyka”, “canberra”, “baroni_urbani_buser”, “penrose_size”, “mean_character”, “lorentzian”, “penrose_shape”, “clark”, “hellinger”, “whittaker_index_of_association”, “symmetric_chi_squared”, “pearson_correlation”, “improved_similarity”, “absolute_value”, “dot_product”, “dot_product_reverse”, “spectral_contrast_angle”, “wave_hedges”, “jaccard”, “dice”, “inner_product”, “divergence”, “avg_l”, “vicis_symmetric_chi_squared_3”, “ms_for_id_v1”, “ms_for_id”, “weighted_dot_product”
ms2_ppm – The MS/MS tolerance in ppm.
ms2_da – The MS/MS tolerance in Da.
need_clean_spectra – Normalize spectra before comparing, required for not normalized spectrum.
need_normalize_result – Normalize the result into [0,1].
- Returns:
-
Distance between two spectra
- spectral_similarity.multiple_distance(spectrum_query: Union[list, ndarray], spectrum_library: Union[list, ndarray], methods: Optional[list] = None, ms2_ppm: Optional[float] = None, ms2_da: Optional[float] = None, need_clean_spectra: bool = True, need_normalize_result: bool = True) dict[source]
-
Calculate multiple distance between two spectra, find common peaks. If both ms2_ppm and ms2_da is defined, ms2_da will be used.
- Parameters:
-
spectrum_query – The query spectrum, need to be in numpy array format.
spectrum_library – The library spectrum, need to be in numpy array format.
methods – A list of method names.
ms2_ppm – The MS/MS tolerance in ppm.
ms2_da – The MS/MS tolerance in Da.
need_clean_spectra – Normalize spectra before comparing, required for not normalized spectrum.
need_normalize_result – Normalize the result into [0,1].
- Returns:
-
Distance between two spectra
- spectral_similarity.multiple_similarity(spectrum_query: Union[list, ndarray], spectrum_library: Union[list, ndarray], methods: Optional[list] = None, ms2_ppm: Optional[float] = None, ms2_da: Optional[float] = None, need_clean_spectra: bool = True, need_normalize_result: bool = True) dict[source]
-
Calculate multiple similarity between two spectra, find common peaks. If both ms2_ppm and ms2_da is defined, ms2_da will be used.
- Parameters:
-
spectrum_query – The query spectrum, need to be in numpy array format.
spectrum_library – The library spectrum, need to be in numpy array format.
methods – A list of method names.
ms2_ppm – The MS/MS tolerance in ppm.
ms2_da – The MS/MS tolerance in Da.
need_clean_spectra – Normalize spectra before comparing, required for not normalized spectrum.
need_normalize_result – Normalize the result into [0,1].
- Returns:
-
A dict contains all similarity.
- spectral_similarity.similarity(spectrum_query: Union[list, ndarray], spectrum_library: Union[list, ndarray], method: str, ms2_ppm: Optional[float] = None, ms2_da: Optional[float] = None, need_clean_spectra: bool = True, need_normalize_result: bool = True) float[source]
-
Calculate the similarity between two spectra, find common peaks. If both ms2_ppm and ms2_da is defined, ms2_da will be used. :param spectrum_query: The query spectrum, need to be in numpy array format. :param spectrum_library: The library spectrum, need to be in numpy array format. :param method: Supported methods:
“entropy”, “unweighted_entropy”, “euclidean”, “manhattan”, “chebyshev”, “squared_euclidean”, “fidelity”, “matusita”, “squared_chord”, “bhattacharya_1”, “bhattacharya_2”, “harmonic_mean”, “probabilistic_symmetric_chi_squared”, “ruzicka”, “roberts”, “intersection”, “motyka”, “canberra”, “baroni_urbani_buser”, “penrose_size”, “mean_character”, “lorentzian”, “penrose_shape”, “clark”, “hellinger”, “whittaker_index_of_association”, “symmetric_chi_squared”, “pearson_correlation”, “improved_similarity”, “absolute_value”, “dot_product”, “dot_product_reverse”, “spectral_contrast_angle”, “wave_hedges”, “jaccard”, “dice”, “inner_product”, “divergence”, “avg_l”, “vicis_symmetric_chi_squared_3”, “ms_for_id_v1”, “ms_for_id”, “weighted_dot_product”
- Parameters:
-
ms2_ppm – The MS/MS tolerance in ppm.
ms2_da – The MS/MS tolerance in Da.
need_clean_spectra – Normalize spectra before comparing, required for not normalized spectrum.
need_normalize_result – Normalize the result into [0,1].
- Returns:
-
Similarity between two spectra
Supported MS/MS spectral distance
- math_distance.absolute_value_distance(p, q)[source]
-
Absolute Value Distance:
\[\frac { \sum(|Q_i-P_i|)}{\sum P_i}\]
- math_distance.avg_l_distance(p, q)[source]
-
Avg (L1, L∞) distance:
\[\frac{1}{2}(\sum|P_i-Q_i|+\underset{i}{\max}{|P_i-Q_i|})\]
- math_distance.baroni_urbani_buser_distance(p, q)[source]
-
Baroni-Urbani-Buser distance:
\[1-\frac{\sum\min{(P_i,Q_i)}+\sqrt{\sum\min{(P_i,Q_i)}\sum(\max{(P)}-\max{(P_i,Q_i)})}}{\sum{\max{(P_i,Q_i)}+\sqrt{\sum{\min{(P_i,Q_i)}\sum(\max{(P)}-\max{(P_i,Q_i)})}}}}\]
- math_distance.bhattacharya_1_distance(p, q)[source]
-
Bhattacharya 1 distance:
\[(\arccos{(\sum\sqrt{P_{i}Q_{i}})})^2\]
- math_distance.bhattacharya_2_distance(p, q)[source]
-
Bhattacharya 2 distance:
\[-\ln{(\sum\sqrt{P_{i}Q_{i}})}\]
- math_distance.canberra_distance(p, q)[source]
-
Canberra distance:
\[\sum\frac{|P_{i}-Q_{i}|}{|P_{i}|+|Q_{i}|}\]
- math_distance.chebyshev_distance(p, q)[source]
-
Chebyshev distance:
\[\underset{i}{\max}{(|P_{i}\ -\ Q_{i}|)}\]
- math_distance.clark_distance(p, q)[source]
-
Clark distance:
\[(\frac{1}{N}\sum(\frac{P_i-Q_i}{|P_i|+|Q_i|})^2)^\frac{1}{2}\]
- math_distance.cosine_distance(p, q)[source]
-
Cosine distance, it gives the same result as the dot product.
\[1 - \sqrt{\frac{(\sum{Q_iP_i})^2}{\sum{Q_i^2\sum P_i^2}}}\]
- math_distance.dice_distance(p, q)[source]
-
Dice distance:
\[\frac{\sum(P_i-Q_i)^2}{\sum P_i^2+\sum Q_i^2}\]
- math_distance.divergence_distance(p, q)[source]
-
Divergence distance:
\[2\sum\frac{(P_i-Q_i)^2}{(P_i+Q_i)^2}\]
- math_distance.dot_product_distance(p, q)[source]
-
Dot product distance:
\[1 - \sqrt{\frac{(\sum{Q_iP_i})^2}{\sum{Q_i^2\sum P_i^2}}}\]
- math_distance.dot_product_reverse_distance(p, q)[source]
-
Reverse dot product distance, only consider peaks existed in spectrum Q.
\[ \begin{align}\begin{aligned}1 - \sqrt{\frac{(\sum{{} {P_i^{'}}})^2}{{\sum{(Q_i^{'})^2}{\sum (P_i^{'})^2}}}}, with:\\P^{'}_{i}=\frac{P^{''}_{i}}{\sum_{i}{P^{''}_{i}}},\\\begin{split}P^{''}_{i}=\begin{cases} 0 & \text{ if } Q_{i}=0 \\ P_{i} & \text{ if } Q_{i}\neq0 \end{cases}\end{split}\end{aligned}\end{align} \]
- math_distance.entropy_distance(p, q)[source]
-
Entropy distance:
\[-\frac{2\times S_{PQ}^{'}-S_P^{'}-S_Q^{'}} {ln(4)}, S_I^{'}=\sum_{i} {I_i^{'} ln(I_i^{'})}, I^{'}=I^{w}, with\ w=0.25+S\times 0.5\ (S<1.5)\]
- math_distance.harmonic_mean_distance(p, q)[source]
-
Harmonic mean distance:
\[1-2\sum(\frac{P_{i}Q_{i}}{P_{i}+Q_{i}})\]
- math_distance.hellinger_distance(p, q)[source]
-
Hellinger distance:
\[\sqrt{2\sum(\sqrt{\frac{P_i}{\bar{P}}}-\sqrt{\frac{Q_i}{\bar{Q}}})^2}\]
- math_distance.improved_similarity_distance(p, q)[source]
-
Improved Similarity Index:
\[\sqrt{\frac{1}{N}\sum\{\frac{P_i-Q_i}{P_i+Q_i}\}^2}\]
- math_distance.intersection_distance(p, q)[source]
-
Intersection distance:
\[1-\frac{\sum\min{(P_{i},Q_{i})}}{\min(\sum{P_{i},\sum{Q_{i})}}}\]
- math_distance.jaccard_distance(p, q)[source]
-
Jaccard distance:
\[\frac{\sum(P_i-Q_i)^2}{\sum P_i^2+\sum{Q_i^2-\sum{P_iQ_i}}}\]
- math_distance.matusita_distance(p, q)[source]
-
Matusita distance:
\[\sqrt{\sum(\sqrt{P_{i}}-\sqrt{Q_{i}})^2}\]
- math_distance.mean_character_distance(p, q)[source]
-
Mean character distance:
\[\frac{1}{N}\sum{|P_i-Q_i|}\]
- math_distance.motyka_distance(p, q)[source]
-
Motyka distance:
\[-\frac{\sum\min{(P_{i},Q_{i})}}{\sum(P_{i}+Q_{i})}\]
- math_distance.pearson_correlation_distance(p, q)[source]
-
Pearson/Spearman Correlation Coefficient:
\[\frac{\sum[(Q_i-\bar{Q})(P_i-\bar{P})]}{\sqrt{\sum(Q_i-\bar{Q})^2\sum(P_i-\bar{P})^2}}\]
- math_distance.penrose_shape_distance(p, q)[source]
-
Penrose shape distance:
\[\sqrt{\sum((P_i-\bar{P})-(Q_i-\bar{Q}))^2}\]
- math_distance.penrose_size_distance(p, q)[source]
-
Penrose size distance:
\[\sqrt N\sum{|P_i-Q_i|}\]
- math_distance.probabilistic_symmetric_chi_squared_distance(p, q)[source]
-
Probabilistic symmetric χ2 distance:
\[\frac{1}{2} \times \sum\frac{(P_{i}-Q_{i}\ )^2}{P_{i}+Q_{i}\ }\]
- math_distance.roberts_distance(p, q)[source]
-
Roberts distance:
\[1-\sum\frac{(P_{i}+Q_{i})\frac{\min{(P_{i},Q_{i})}}{\max{(P_{i},Q_{i})}}}{\sum(P_{i}+Q_{i})}\]
- math_distance.ruzicka_distance(p, q)[source]
-
Ruzicka distance:
\[\frac{\sum{|P_{i}-Q_{i}|}}{\sum{\max(P_{i},Q_{i})}}\]
- math_distance.spectral_contrast_angle_distance(p, q)[source]
-
Spectral Contrast Angle distance. Please note that the value calculated here is \(\cos\theta\). If you want to get the \(\theta\), you can calculate with: \(\arccos(1-distance)\)
\[1 - \frac{\sum{Q_iP_i}}{\sqrt{\sum Q_i^2\sum P_i^2}}\]
- math_distance.squared_chord_distance(p, q)[source]
-
Squared-chord distance:
\[\sum(\sqrt{P_{i}}-\sqrt{Q_{i}})^2\]
- math_distance.squared_euclidean_distance(p, q)[source]
-
Squared Euclidean distance:
\[\sum(P_{i}-Q_{i})^2\]
- math_distance.symmetric_chi_squared_distance(p, q)[source]
-
Symmetric χ2 distance:
\[\sqrt{\sum{\frac{\bar{P}+\bar{Q}}{N(\bar{P}+\bar{Q})^2}\frac{(P_i\bar{Q}-Q_i\bar{P})^2}{P_i+Q_i}\ }}\]
- math_distance.unweighted_entropy_distance(p, q)[source]
-
Unweighted entropy distance:
\[-\frac{2\times S_{PQ}-S_P-S_Q} {ln(4)}, S_I=\sum_{i} {I_i ln(I_i)}\]
- math_distance.vicis_symmetric_chi_squared_3_distance(p, q)[source]
-
Vicis-Symmetric χ2 3 distance:
\[\sum\frac{(P_i-Q_i)^2}{\max{(P_i,Q_i)}}\]
- math_distance.wave_hedges_distance(p, q)[source]
-
Wave Hedges distance:
\[\sum\frac{|P_i-Q_i|}{\max{(P_i,Q_i)}}\]
- math_distance.whittaker_index_of_association_distance(p, q)[source]
-
Whittaker index of association distance:
\[\frac{1}{2}\sum|\frac{P_i}{\bar{P}}-\frac{Q_i}{\bar{Q}}|\]
- ms_distance.ms_for_id_distance(spec_query, spec_reference, ms2_ppm=None, ms2_da=None)[source]
-
MSforID distance:
\[-\frac{N_m^b(\sum I_{q,i}+2\sum I_{r,i})^c}{(N_q+2N_r)^d+\sum|I_{q,i}-I_{r,i}|+\sum|M_{q,i}-M_{r,i}|},\ \ b=4,\ c=1.25,\ d=2\]The peaks have been filtered with intensity > 0.05.
\(N_m\): number of matching fragments,
\(N_q, N_r\): number of fragments for spectrum p,q,
\(M_q,M_r\): m/z of peak in query and reference spectrum,
\(I_q,I_r\): intensity of peak in query and reference spectrum
- ms_distance.ms_for_id_v1_distance(spec_query, spec_reference, ms2_ppm=None, ms2_da=None)[source]
-
MSforID distance version 1:
\[ \begin{align}\begin{aligned}Similarity = \frac{N_m^4}{N_qN_r(\sum|I_{q,i}-I_{r,i}|)^a}\ ,\ a=0.25\\Distance = \frac{1}{Similarity}\end{aligned}\end{align} \]\(N_m\): number of matching fragments, \(N_q, N_r\): number of fragments for spectrum p,q :return: \(Distance\)
- ms_distance.weighted_dot_product_distance(spec_query, spec_reference, ms2_ppm=None, ms2_da=None)[source]
-
Weighted Dot-Product distance:
\[ \begin{align}\begin{aligned}1 - \frac{(\sum{Q^{'}_{i} P^{'}_{i}})^2}{\sum{Q_{i}^{'2}\sum P_{i}^{'2}}}, here:\\P^{'}_{i} = M_{p,i}^{3}I_{p,i}^{0.6}, Q^{'}_{i} = M_{q,i}^{3}I_{q,i}^{0.6}\end{aligned}\end{align} \]