NIH Research Festival
Peptide identifications (ID) via mass spectrometry (MS) have become the central component in modern proteomics; this component, combined with additional analyses, routinely yields pragmatic metadata, including protein ID, protein quantification, protein structure and protein associations. These metadata, especially the associated statistical significance assignments, need to be as accurate as possible because they often form the building blocks for investigations at the systems biology level and influence the scientific conclusions drawn henceforth. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. We have thus developed a novel protein ID method, improving in particular the accuracy of statistical significance assigned. Weighting the contribution of each peptide in protein ID is important. It helps mitigate the issue of peptide degeneracy, where an identified peptide is a subsequence of multiple database proteins. The optimal weighting scheme, however, can depend on the protein ID methodology employed. For the purpose of our study, we opt for a simple weighting scheme: a peptide’s weight is inversely proportional to the number of database proteins it maps to. Within a sample, when multiple spectral searches identify the same peptide but with different significance levels, only the most significant assignment of that peptide is retained for further analyses. The foundation of our method is built upon a rigorous formula that enables weighted combination of P-values.
Scientific Focus Area: Computational Biology
This page was last updated on Friday, March 26, 2021