Skip to content

DeepChopper social

pypi PyPI - Wheel license pypi version platform Actions status Space

Overview

🧬 DeepChopper is a genomic language model designed to accurately detect and remove chimeric artifacts in Nanopore direct RNA sequencing (dRNA-seq) data. By leveraging deep learning, DeepChopper identifies adapter sequences within base-called reads, ensuring higher quality and more reliable sequencing results.


⭐ Key Features

  • High Accuracy
    State-of-the-art detection of chimeric reads using transformer-based models with >95% sensitivity
  • Fast Processing
    Optimized Rust core with parallel processing capabilities. Process millions of reads in minutes
  • Zero-shot Capability
    Works across different RNA chemistries (RNA002, RNA004, and newer) without retraining
  • Easy Integration
    Simple Python API and CLI for seamless workflow integration
  • GPU Acceleration
    Optional GPU support (NVIDIA, Apple Silicon) for faster processing of large datasets
  • Web Interface
    Interactive web UI for quick testing and visualization

Why DeepChopper?

The Problem

Chimera artifacts in nanopore dRNA-seq can confound transcriptome analyses, leading to false gene fusion calls and incorrect transcript annotations. Existing basecalling tools fail to detect internal adapter sequences, leaving these artifacts in your data.

The Solution

DeepChopper solves this problem with a three-step approach:

  1. Detecting adapter sequences that basecallers miss
  2. Chopping reads at adapter locations to remove chimeric artifacts and split the artifacts to individual reads.
  3. Preserving high-quality sequence data for downstream analysis

Quick Start

Try Online

Experience DeepChopper instantly without any installation:

Open in Hugging Face Spaces

Note

The online version is limited to one FASTQ record at a time. For large-scale analyses, please install DeepChopper locally.

Installation

Install DeepChopper using pip:

pip install deepchopper

Verify the installation:

deepchopper --help

For detailed installation instructions, see the Installation Guide.

Basic Usage

# 1. Predict chimera artifacts (automatically encodes FASTQ data)
deepchopper predict raw_reads.fastq --output predictions

# 2. Chop the reads at detected adapter locations
deepchopper chop predictions/ raw_reads.fastq --output chopped.fastq

For a complete walkthrough, check out the Tutorial.


Use Cases

  • Transcriptome Assembly
    Remove chimera artifacts to improve transcript reconstruction and assembly quality
  • Gene Fusion Detection
    Eliminate false positives from adapter-bridged artifacts for accurate fusion calling
  • Differential Expression
    Ensure accurate read counts by removing chimeric reads before quantification
  • RNA-Seq QC
    Assess and improve data quality in dRNA-seq experiments

Citation

If DeepChopper helps your research, please cite our paper:

@article{li2026genomic,
  title = {Genomic Language Model Mitigates Chimera Artifacts in Nanopore Direct {{RNA}} Sequencing},
  author = {Li, Yangyang and Wang, Ting-You and Guo, Qingxiang and Ren, Yanan and Lu, Xiaotong and Cao, Qi and Yang, Rendong},
  date = {2026-01-19},
  journaltitle = {Nature Communications},
  shortjournal = {Nat Commun},
  publisher = {Nature Publishing Group},
  issn = {2041-1723},
  doi = {10.1038/s41467-026-68571-5},
  url = {https://www.nature.com/articles/s41467-026-68571-5},
  urldate = {2026-01-20}
}
  • ChimeraLM - For identifying artificial chimeric reads arising from whole genome amplification (WGA) processes in DNA sequencing data

Support

License

DeepChopper is released under the Apache License 2.0.


Developed with ❤️ by the YLab team | Happy sequencing! 🧬🔬