Machine Learning-Driven Discovery of SARS-CoV-2 Nsp14-MTase Inhibitors

Authors

  • S Pal
  • Q Hanson
  • NJ Martinez
  • AV Zakharov

Abstract

The global impact of SARS-CoV-2 highlights urgent need for treatments beyond vaccination. This study focuses on designing new antivirals targeting the S-adenosyl methionine-N7 methyl transferase of Nsp14 (Nsp14-MTase), crucial for viral replication. Initially, NCATS performed a quantitative high-throughput screening (qHTS) on a fraction of the in-house library containing ~15,000 molecules against Nsp14-MTase, providing starting hits. To extend the screening campaign, we employed machine learning (ML) models for virtual screening across ~150,000 in-house compounds, enabling chemical space expansion for searching Nsp14-MTase inhibitors.
A combination of Morgan, FeatMorgan, and Avalon fingerprints along with RDKit descriptors were used to build ML models with XGBOOST, Gradient Boost, and Random Forest algorithms, which were evaluated using 5-fold external cross-validation. The XGBOOST model using Avalon fingerprints and RDKit descriptors, emerged as top performer with an AUC of 0.812 and a balanced accuracy of 0.538.
Top 256 virtual hits were selected from the virtual screening campaign, resulting in the discovery of 75 active compounds (IC50: 1.44μM to 33μM) after biochemical evaluation, marking a substantial improvement in hit-rate (~29%) compared to the original qHTS results (1.04%). Scaffold clustering identified 37 distinct chemotypes from the confirmed hits.
Following ML-based hit discovery, an initial SAR study on promising chemotypes was conducted for chemotype expansion via analog search on internal library, maintaining Tanimoto coefficient > 0.7. This preliminary SAR identified several potent hits, with the best compound exhibiting an IC50 of 0.4μM.

Scientific Focus Area: Chemical Biology

This page was last updated on Tuesday, August 6, 2024