ChimeraLM¶
Genomic Language Model for Detecting WGA Chimeric Artifacts¶
A deep learning-powered tool to identify artificial chimeric reads arising from whole genome amplification (WGA) processes.
Key Features¶
High Accuracy¶
Deep learning model trained on real WGA data for precise chimeric artifact detection
GPU Accelerated¶
Optimized for CUDA, MPS (Apple Silicon), and CPU with configurable batch processing
Easy to Use¶
Simple CLI with sensible defaults - get started in minutes
Fast Processing¶
Batch inference with configurable parallelism for large-scale genomic datasets
Web Interface¶
Try the interactive demo on HuggingFace Spaces - no installation needed!
Production Ready¶
Includes filtering, sorting, and indexing of BAM files
Quick Start¶
Get up and running with ChimeraLM in under 15 minutes:
# Install ChimeraLM
pip install chimeralm
# Predict chimeric reads (CPU)
chimeralm predict your_data.bam
# Predict with GPU acceleration
chimeralm predict your_data.bam --gpus 1 --batch-size 24
Ready to dive in? Check out our Quick Start Guide.
Try ChimeraLM Online - No Installation Required!
Want to test ChimeraLM before installing? Try our interactive web demo:
Launch Web Demo on HuggingFace Spaces
Perfect for:
- Testing with individual DNA sequences
- Visualizing prediction confidence scores
- Learning about chimeric artifact detection
- Quick validation before batch processing
The web demo runs the same model as the CLI tool but provides an intuitive visual interface for single-sequence analysis.
What is ChimeraLM?¶
ChimeraLM is a genomic language model that detects chimeric artifacts introduced by whole genome amplification (WGA). Built with PyTorch Lightning and optimized for modern GPUs, it provides fast and accurate identification of chimeric reads in BAM files.
Chimeric artifacts are artificial DNA sequences created during WGA that combine sequences from different genomic locations. These artifacts can lead to incorrect biological conclusions if not removed from analysis.
ChimeraLM uses the HyenaDNA backbone architecture to learn patterns that distinguish biological reads (label 0) from chimeric artifacts (label 1), helping researchers clean their sequencing data before downstream analysis.
Citation¶
If you use ChimeraLM in your research, please cite:
@software{chimeralm2025,
title={ChimeraLM: A genomic language model to identify chimera artifacts},
author={Li, Yangyang and Guo, Qingxiang and Yang, Rendong},
year={2025},
url={https://github.com/ylab-hi/ChimeraLM}
}
License¶
ChimeraLM is licensed under the Apache License 2.0. See License for details.