Genomic Language Model Mitigates Chimera Artifacts in Nanopore Direct RNA Sequencing

Northwestern University

*Indicates Equal Contribution
MY ALT TEXT

Abstract

Chimera artifacts in nanopore direct RNA sequencing (dRNA-seq) introduce substantial inaccuracies, complicating downstream applications such as tran- script annotation and gene fusion detection. Current basecalling models are unable to detect or mitigate these artifacts, limiting the reliability and utility of dRNA-seq for transcriptomics research. To address this challenge, we present DeepChopper, a genomic language model specifically designed to identify and remove adapter sequences from base-called dRNA-seq long reads with single-base precision. Operating independently of raw signal or alignment information, Deep- Chopper effectively eliminates chimeric read artifacts, significantly enhancing the accuracy of crucial downstream analyses. This improvement in reliability unlocks the full potential of nanopore dRNA-seq, establishing it as a more robust tool for diverse transcriptomics applications.

BibTeX

@article{Li2024.10.23.619929,
                    author = {Li, Yangyang and Wang, Ting-You and Guo, Qingxiang
                    and Ren, Yanan and Lu, Xiaotong and Cao, Qi
                    and Yang, Rendong},
                    title = {A Genomic Language Model for Chimera Artifact Detection
                    in Nanopore Direct RNA Sequencing},
                    journal = {bioRxiv},
                    year = {2024},
                    doi = {10.1101/2024.10.23.619929},
                    publisher = {Cold Spring Harbor Laboratory},
                    url = {https://www.biorxiv.org/content/early/2024/10/25/2024.10.23.619929},
                    eprint = {https://www.biorxiv.org/content/early/2024/10/25/2024.10.23.619929.full.pdf}
                    }