Testing Guide¶

ChimeraLM testing strategy, conventions, and best practices.

Test Framework¶

ChimeraLM uses pytest for testing with these key features:

Fixtures: Reusable test data and setups
Markers: Categorize tests (slow, gpu, integration)
Coverage: Minimum 40% coverage required
Parametrization: Test multiple inputs efficiently

Running Tests¶

Basic Usage¶

# Run all tests
uv run pytest tests/

# Run specific test file
uv run pytest tests/test_models.py

# Run specific test function
uv run pytest tests/test_models.py::test_model_loading

# Verbose output
uv run pytest tests/ -v

# Stop on first failure
uv run pytest tests/ -x

Test Markers¶

# Skip slow tests
uv run pytest tests/ -k "not slow"

# Run only GPU tests
uv run pytest tests/ -m gpu

# Run smoke tests only
uv run pytest tests/ -m smoke

# Run import tests
uv run pytest tests/ -m imports

Coverage¶

# Run with coverage
uv run pytest tests/ --cov=chimeralm

# Generate HTML report
uv run pytest tests/ --cov=chimeralm --cov-report=html

# View report
open htmlcov/index.html

Test Structure¶

Directory Layout¶

tests/
├── data/                    # Test data files
│   └── mk1c_test.bam  # Sample BAM file
├── test_models.py          # Model tests
├── test_data.py            # Data loading tests
├── test_cli.py             # CLI tests
├── test_utils.py           # Utility tests
└── conftest.py             # Shared fixtures

Test File Naming¶

Prefix with test_
Match module name: chimeralm/models/lm.py → tests/test_models.py
One test file per module (generally)

Test Function Naming¶

# Good naming
def test_model_loads_pretrained_checkpoint():
    """Test that model loads from pretrained checkpoint."""
    pass

def test_prediction_returns_correct_shape():
    """Test prediction output shape."""
    pass

# Bad naming
def test_model():  # Too vague
    pass

def test1():  # Not descriptive
    pass

Writing Tests¶

Basic Test Structure¶

import pytest
from chimeralm.models.lm import ChimeraLM

def test_model_loading():
    """Test model loads successfully."""
    # Arrange
    model_path = "yangliz5/chimeralm"

    # Act
    model = ChimeraLM.from_pretrained(model_path)

    # Assert
    assert model is not None
    assert isinstance(model, ChimeraLM)

Using Fixtures¶

# conftest.py
import pytest
from chimeralm.models.lm import ChimeraLM

@pytest.fixture
def pretrained_model():
    """Load pretrained model for testing."""
    return ChimeraLM.from_pretrained("yangliz5/chimeralm")

# test_models.py
def test_model_inference(pretrained_model):
    """Test model inference."""
    import torch
    x = torch.randint(0, 5, (4, 1024))
    output = pretrained_model(x)
    assert output.shape == (4, 2)

Parametrized Tests¶

import pytest

@pytest.mark.parametrize("batch_size,seq_len", [
    (4, 512),
    (8, 1024),
    (16, 2048),
])
def test_model_with_different_shapes(pretrained_model, batch_size, seq_len):
    """Test model with various input shapes."""
    import torch
    x = torch.randint(0, 5, (batch_size, seq_len))
    output = pretrained_model(x)
    assert output.shape == (batch_size, 2)

Testing Exceptions¶

import pytest

def test_invalid_input_raises_error():
    """Test that invalid input raises ValueError."""
    with pytest.raises(ValueError):
        # Code that should raise ValueError
        process_invalid_input()

Test Categories¶

Unit Tests¶

Test individual functions or classes in isolation:

def test_tokenizer_encode():
    """Test DNA tokenizer encoding."""
    from chimeralm.data.tokenizer import DNATokenizer

    tokenizer = DNATokenizer()
    sequence = "ACGT"
    tokens = tokenizer.encode(sequence)

    assert tokens == [1, 2, 3, 4]

Integration Tests¶

Test multiple components working together:

@pytest.mark.slow
def test_end_to_end_prediction():
    """Test complete prediction pipeline."""
    from chimeralm.models.lm import ChimeraLM
    from chimeralm.data.bam import BamDataModule

    # Load model
    model = ChimeraLM.from_pretrained("yangliz5/chimeralm")

    # Load data
    data_module = BamDataModule(
        train_data_path="tests/data/mk1c_test.bam",
        batch_size=8
    )
    data_module.setup("predict")

    # Run prediction
    loader = data_module.predict_dataloader()
    batch = next(iter(loader))
    output = model(batch["input_ids"])

    assert output.shape[0] == 8
    assert output.shape[1] == 2

GPU Tests¶

import pytest
import torch

@pytest.mark.gpu
@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA not available")
def test_model_on_gpu():
    """Test model runs on GPU."""
    model = ChimeraLM.from_pretrained("yangliz5/chimeralm")
    model = model.cuda()

    x = torch.randint(0, 5, (4, 1024)).cuda()
    output = model(x)

    assert output.device.type == "cuda"

Coverage Requirements¶

Minimum: 40% overall coverage
Target: 60%+ for new code
Critical paths: 80%+ coverage

Checking Coverage¶

# Run with coverage
uv run pytest tests/ --cov=chimeralm --cov-report=term-missing

# Output shows uncovered lines
chimeralm/models/lm.py    85%   12, 45-47
chimeralm/data/bam.py     92%   67

Improving Coverage¶

Focus on:

Critical paths: Model loading, prediction, training
Error handling: Exception cases
Edge cases: Empty inputs, max values, etc.

Mocking and Fixtures¶

Mocking External Dependencies¶

import pytest
from unittest.mock import Mock, patch

def test_bam_file_loading(tmp_path):
    """Test BAM file loading with mock."""
    # Create temporary BAM file
    bam_file = tmp_path / "test.bam"
    bam_file.write_text("mock data")

    with patch("pysam.AlignmentFile") as mock_bam:
        mock_bam.return_value = Mock()
        # Test code

Temporary Files¶

def test_with_temp_file(tmp_path):
    """Test using temporary file."""
    # tmp_path is a pytest fixture
    test_file = tmp_path / "test.txt"
    test_file.write_text("test data")

    # Use test_file
    assert test_file.read_text() == "test data"

Continuous Integration¶

Tests run automatically on:

Pull requests: All tests must pass
Main branch: After merge
Nightly: Full test suite including slow tests

GitHub Actions Workflow¶

name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: |
          pip install uv
          uv sync
      - name: Run tests
        run: uv run pytest tests/ --cov=chimeralm

Best Practices¶

DO¶

✅ Write tests for new features
✅ Test both success and failure cases
✅ Use descriptive test names
✅ Keep tests independent (no shared state)
✅ Use fixtures for common setups
✅ Test edge cases

DON'T¶

❌ Skip tests without good reason
❌ Test implementation details (test behavior)
❌ Write flaky tests (non-deterministic)
❌ Mock everything (test real code when possible)
❌ Ignore coverage warnings

Debugging Tests¶

Running Single Test with Debug Output¶

# Run with print statements visible
uv run pytest tests/test_models.py::test_model_loading -s

# Drop into debugger on failure
uv run pytest tests/ --pdb

# Show local variables on failure
uv run pytest tests/ -l

Using pytest's Debug Mode¶

def test_with_debug():
    """Test with debugging."""
    import pdb; pdb.set_trace()  # Debugger breakpoint
    # Test code