Development of a haplotype-aware assembly pipeline for analysis of rearrangements at the human CYP2D6-CYP2D7-CYP2D8 locus

Authors

  • D Dahiya
  • B Alleva
  • F Pratto
  • RD Camerini-Otero

Abstract

Meiosis is a specialized cell division that leads to the formation of gametes. DNA Double Strand Breaks (DSBs) are formed and subsequently repaired during meiosis to allow the exchange of genetic material between homologous chromosomes. In humans, DSBs cluster at regions in the genome called hotspots which are dependent upon the DNA-binding protein PRDM9. DSB repair may also occur using non-allelic sequences which could result in gross genomic rearrangements. In the germline, this type of repair can lead to heritable diseases.

We are interested in examining whether DSB hotspots in repeat regions of the genome lead to a higher propensity for rearrangements at those loci in humans. We analyzed the repetitive CYP2D6-CYP2D7-CYP2D8 locus (2D6-2D7-2D8) known to harbor multiple DSB hotspots defined by different alleles of PRDM9. CYP2D6 is defined as a very important pharmacogene known to metabolize ~25% of all clinical drugs. It is highly variable and located within a 40kb region containing the CYP2D7 and CYP2D8 pseudogenes which have >90% homology to CYP2D6. We used a long-range, overlapping PCR assay to analyze rearrangements in 319 individuals including 57 parent-offspring trios (1K Genomes Project), at the (2D6-2D7-2D8) locus.

In order to effectively and efficiently analyze our 2D6-2D7-2D8 locus long read amplicon data for rearrangements, we developed an in-house haplotype-aware assembly pipeline as no existing assembly software works for our data. In this work, we present our assembly pipeline along with its results for the rearrangements at the 2D6-2D7-2D8 locus.

Scientific Focus Area: Computational Biology

This page was last updated on Tuesday, August 6, 2024