Quick Start¶
Get started with ChimeraLM in under 15 minutes! This tutorial will guide you through your first chimeric read prediction.
What you'll learn
- How to run predictions on BAM files
- Understanding ChimeraLM output format
- Verifying your results
Time: ~15 minutes
Prerequisites¶
- ChimeraLM installed (Installation Guide)
- Basic command-line experience
- A BAM file to analyze (we'll provide sample data)
Working with RNA sequencing data?
ChimeraLM is designed for DNA sequencing with whole genome amplification (WGA). If you need to identify chimera artifacts from Nanopore direct RNA sequencing, please see DeepChopper.
Step 1: Get Sample Data¶
ChimeraLM includes test data in the repository. If you installed from source:
If you installed via pip, download the sample data:
# Download sample BAM file with index
wget https://github.com/ylab-hi/chimera/raw/main/tests/data/mk1c_test.bam
wget https://github.com/ylab-hi/chimera/raw/main/tests/data/mk1c_test.bam.bai
# Or using curl
curl -L -o mk1c_test.bam https://github.com/ylab-hi/chimera/raw/main/tests/data/mk1c_test.bam
curl -L -o mk1c_test.bam.bai https://github.com/ylab-hi/chimera/raw/main/tests/data/mk1c_test.bam.bai
# Verify files downloaded correctly
ls -lh mk1c_test.bam*
About the Sample Data
The sample file mk1c_test.bam contains 175 reads, in which 75 chimeric reads and 100 non-chimeric reads, subsampled from PC3 cell line (human prostate cancer) sequenced using Nanopore MinION Mk1C with whole genome amplification.
Step 2: Run Your First Prediction¶
Run ChimeraLM on the sample data:
GPU vs CPU Performance
- CPU: ~15 seconds for 48 SA-tagged reads (batch-size 12)
- GPU: ~3 seconds for 48 SA-tagged reads (batch-size 24, 5x faster!)
Step 3: Understand the Output¶
ChimeraLM creates a predictions file with one line per read:
Output format (tab-separated):
read_name<TAB>label
e5f89040-2898-41d9-9ee4-3022168216f0 1
b76512a7-5a74-405b-8ac3-adde6a7ea5e1 0
5b830fb3-6bb7-42a4-ad18-142b9474ed7d 1
edab7cd5-831c-4f51-8ada-c9b4620307c1 0
...
Labels:
- 0: Biological read (keep for analysis)
- 1: Chimeric artifact (remove from analysis)
Step 4: Interpret Results¶
Count how many reads are chimeric:
# Count chimeric reads (label 1)
cat mk1c_test.predictions/*.txt | grep -c "1$"
# Count biological reads (label 0)
cat mk1c_test.predictions/*.txt | grep -c "0$"
Expected results for test data:
- Chimeric artifacts: 55 (73.3%)
- Biological reads: 20 (26.7%)
Typical chimera rates for WGA data:
- MDA (Multiple Displacement Amplification): 10-40%
- PicoPLEX: 5-20%
- Non-WGA data: <1%
Checkpoint: Verify Your Prediction Worked¶
✅ Success indicators:
- Predictions file created
- File contains tab-separated read names and labels
- Labels are 0 or 1
- Number of predictions matches input reads
Congratulations!
You've successfully run your first ChimeraLM prediction!
Next Steps¶
Now that you've completed the basics:
For Analysis¶
Filter your BAM file to remove chimeric reads:
This automatically creates:
mk1c_test.filtered.bam- Unsorted filtered readsmk1c_test.filtered.sorted.bam- Final sorted output (use this!)mk1c_test.filtered.sorted.bam.bai- BAM indexmk1c_test.predictions/predictions.txt- Consolidated predictions
For comprehensive filtering guidance including verification, troubleshooting, and batch processing, see the Filtering BAM Files Tutorial.
For Learning¶
- Optimize performance: See Performance Optimization
- Integrate into pipelines: See Pipeline Integration
- Use the web interface: See Web Command
For Development¶
- Use as a library: See API Reference
Troubleshooting¶
Encountered an issue? Check our Troubleshooting Guide for common problems and solutions.
Need Help?