Quick Start¶
Get started with ChimeraLM in under 15 minutes! This tutorial will guide you through your first chimeric read prediction.
What you'll learn
- How to run predictions on BAM files
- Understanding ChimeraLM output format
- Verifying your results
Time: ~15 minutes
Prerequisites¶
- ChimeraLM installed (Installation Guide)
- Basic command-line experience
- A BAM file to analyze (we'll provide sample data)
Step 1: Get Sample Data¶
ChimeraLM includes test data in the repository. If you installed from source:
If you installed via pip, download the sample data:
# Download sample BAM file with index
wget https://github.com/ylab-hi/chimera/raw/main/tests/data/mk1c_test.bam
wget https://github.com/ylab-hi/chimera/raw/main/tests/data/mk1c_test.bam.bai
# Or using curl
curl -L -o mk1c_test.bam https://github.com/ylab-hi/chimera/raw/main/tests/data/mk1c_test.bam
curl -L -o mk1c_test.bam.bai https://github.com/ylab-hi/chimera/raw/main/tests/data/mk1c_test.bam.bai
# Verify files downloaded correctly
ls -lh mk1c_test.bam*
About the Sample Data
The sample file mk1c_test.bam contains 175 reads, in which 75 chimeric reads and 100 non-chimeric reads, subsampled from PC3 cell line (human prostate cancer) sequenced using Nanopore MinION Mk1C with whole genome amplification.
Step 2: Run Your First Prediction¶
Run ChimeraLM on the sample data:
GPU vs CPU Performance
- CPU: ~15 seconds for 48 SA-tagged reads (batch-size 12)
- GPU: ~3 seconds for 48 SA-tagged reads (batch-size 24, 5x faster!)
Step 3: Understand the Output¶
ChimeraLM creates a predictions file with one line per read:
Output format (tab-separated):
read_name<TAB>label
e5f89040-2898-41d9-9ee4-3022168216f0 1
b76512a7-5a74-405b-8ac3-adde6a7ea5e1 0
5b830fb3-6bb7-42a4-ad18-142b9474ed7d 1
edab7cd5-831c-4f51-8ada-c9b4620307c1 0
...
Labels:
- 0: Biological read (keep for analysis)
- 1: Chimeric artifact (remove from analysis)
Step 4: Interpret Results¶
Count how many reads are chimeric:
# Count chimeric reads (label 1)
cat mk1c_test.predictions/*.txt | grep -c "1$"
# Count biological reads (label 0)
cat mk1c_test.predictions/*.txt | grep -c "0$"
Expected results for test data:
- Chimeric artifacts: 55 (73.3%)
- Biological reads: 20 (26.7%)
Typical chimera rates for WGA data:
- MDA (Multiple Displacement Amplification): 10-40%
- PicoPLEX: 5-20%
- Non-WGA data: <1%
Checkpoint: Verify Your Prediction Worked¶
✅ Success indicators:
- Predictions file created
- File contains tab-separated read names and labels
- Labels are 0 or 1
- Number of predictions matches input reads
Congratulations!
You've successfully run your first ChimeraLM prediction!
Next Steps¶
Now that you've completed the basics:
For Analysis¶
Filter your BAM file to remove chimeric reads:
This automatically creates:
mk1c_test.filtered.bam- Unsorted filtered readsmk1c_test.filtered.sorted.bam- Final sorted output (use this!)mk1c_test.filtered.sorted.bam.bai- BAM indexmk1c_test.predictions/predictions.txt- Consolidated predictions
For comprehensive filtering guidance including verification, troubleshooting, and batch processing, see the Filtering BAM Files Tutorial.
For Learning¶
- Optimize performance: See Performance Optimization
- Integrate into pipelines: See Pipeline Integration
- Use the web interface: See Web Command
For Development¶
- Use as a library: See API Reference
Troubleshooting¶
Encountered an issue? Check our Troubleshooting Guide for common problems and solutions.
Need Help?