Automating compound integration for efficient in-house database development using R

Authors

AM Tisch
D Bennouna
HA Chatelaine
EA Mathé

Abstract

Untargeted metabolomics is a powerful technique for detecting and quantifying metabolites in biological samples, providing insights into biomarkers and mechanisms of action. Compound identification is essential for biological interpretation but complex and time-consuming. An in-house standards library is often required to confidently assign unknown compound identities by matching retention times (RT), masses, and fragmentation patterns. Evaluation of each standard is complex and requires manual review for reliable annotation. Therefore, we developed a workflow to automate compound integration, enhancing library development efficiency.

LC-MS data from 603 compounds were processed from mzML format in R with mzR. Each compound’s extracted ion chromatogram (EIC) for every adduct was generated using several functions. The filter function identified peaks within a 30-ppm error window around each adduct's m/z value, and RT was estimated at the maximum signal intensity. EICs were visualized using ggplot, highlighting RT proposals to aid accuracy assessment. A library matrix was produced, detailing compound identities, m/z values, adducts, and RTs for use annotating unknown data. This workflow was validated by manual inspection for EIC shape, intensity, and quality.

Our workflow showed similar results to manual integration and assessment using both proprietary and open-source software. RT assignment was accurate even for low-abundance compounds, though challenging for noisy peaks, which were removed. While visual assessment remains essential for verification, our tool speeds library construction by cutting integration time per compound from 5 minutes to 20 seconds. Our open-source code is set for public release on Github, providing a valuable resource for metabolite library generation.

Scientific Focus Area: Computational Biology

This page was last updated on Tuesday, August 6, 2024

NIH Research Festival

Automating compound integration for efficient in-house database development using R

Authors

Abstract