DeepChopper
¶
Overview¶
🧬 DeepChopper is a genomic language model designed to accurately detect and remove chimeric artifacts in Nanopore direct RNA sequencing (dRNA-seq) data. By leveraging deep learning, DeepChopper identifies adapter sequences within base-called reads, ensuring higher quality and more reliable sequencing results.
Key Features¶
-
- High Accuracy
- State-of-the-art detection of chimeric reads using transformer-based models with >95% sensitivity
-
- Fast Processing
- Optimized Rust core with parallel processing capabilities. Process millions of reads in minutes
-
- Zero-shot Capability
- Works across different RNA chemistries (RNA002, RNA004, and newer) without retraining
-
- Easy Integration
- Simple Python API and CLI for seamless workflow integration
-
- GPU Acceleration
- Optional GPU support (NVIDIA, Apple Silicon) for faster processing of large datasets
-
- Web Interface
- Interactive web UI for quick testing and visualization
Why DeepChopper?¶
The Problem
Chimera artifacts in nanopore dRNA-seq can confound transcriptome analyses, leading to false gene fusion calls and incorrect transcript annotations. Existing basecalling tools fail to detect internal adapter sequences, leaving these artifacts in your data.
The Solution
DeepChopper solves this problem with a three-step approach:
- Detecting adapter sequences that basecallers miss
- Chopping reads at adapter locations to remove chimeric artifacts and split the artifacts to individual reads.
- Preserving high-quality sequence data for downstream analysis
Quick Start¶
Try Online¶
Experience DeepChopper instantly without any installation:
Note
The online version is limited to one FASTQ record at a time. For large-scale analyses, please install DeepChopper locally.
Installation¶
Install DeepChopper using pip:
Verify the installation:
For detailed installation instructions, see the Installation Guide.
Basic Usage¶
# 1. Predict chimera artifacts (automatically encodes FASTQ data)
deepchopper predict raw_reads.fastq --output predictions
# 2. Chop the reads at detected adapter locations
deepchopper chop predictions/ raw_reads.fastq --output chopped.fastq
For a complete walkthrough, check out the Tutorial.
Use Cases¶
-
- Transcriptome Assembly
- Remove chimera artifacts to improve transcript reconstruction and assembly quality
-
- Gene Fusion Detection
- Eliminate false positives from adapter-bridged artifacts for accurate fusion calling
-
- Differential Expression
- Ensure accurate read counts by removing chimeric reads before quantification
-
- RNA-Seq QC
- Assess and improve data quality in dRNA-seq experiments
Citation¶
If DeepChopper helps your research, please cite our paper:
@article{li2026genomic,
title = {Genomic Language Model Mitigates Chimera Artifacts in Nanopore Direct {{RNA}} Sequencing},
author = {Li, Yangyang and Wang, Ting-You and Guo, Qingxiang and Ren, Yanan and Lu, Xiaotong and Cao, Qi and Yang, Rendong},
date = {2026-01-19},
journaltitle = {Nature Communications},
shortjournal = {Nat Commun},
publisher = {Nature Publishing Group},
issn = {2041-1723},
doi = {10.1038/s41467-026-68571-5},
url = {https://www.nature.com/articles/s41467-026-68571-5},
urldate = {2026-01-20}
}
Related Tools¶
- ChimeraLM - For identifying artificial chimeric reads arising from whole genome amplification (WGA) processes in DNA sequencing data
Support¶
License¶
DeepChopper is released under the Apache License 2.0.