cufile-patcher

cuFile-first patching toolkit for large model loading using:

uv (package manager)
Ruff (linting)
pytest (testing)
GitHub Actions (CI + PyPI publish)
plugin-based backend architecture
dedicated stream patchers
- PyTorch
- TensorFlow
- safetensors

Project structure

.
├── .github/workflows/
│   ├── ci.yml
│   ├── coverage-pages.yml
│   └── publish.yml
├── .vscode/
│   ├── extensions.json
│   └── settings.json
├── src/cufile_patcher/
│   ├── __init__.py
│   ├── auto_patch.py
│   ├── bindings.py
│   ├── cufile.py
│   ├── cufile_types.py
│   ├── core.py
│   ├── registry.py
│   ├── safetensor_patcher.py
│   ├── service.py
│   ├── tensorflow_patcher.py
│   ├── torch_patcher.py
│   └── plugins/
│       ├── __init__.py
│       ├── base.py
│       └── system.py
├── tests/test_auto_patch.py
├── tests/test_backend_core.py
├── tests/test_core.py
├── tests/test_safetensor_patcher.py
├── tests/test_tensorflow_patcher.py
├── tests/test_torch_patcher.py
├── AGENTS.md
├── README.md
└── pyproject.toml

Quick start

Install dependencies:

uv sync --all-groups

Install package variants:

pip install cufile-patcher
pip install "cufile-patcher[all]"
pip install "cufile-patcher[tf]"
pip install "cufile-patcher[tensorflow]"
pip install "cufile-patcher[torch]"
pip install "cufile-patcher[pytorch]"

Run lint:

uv run ruff check .

Run tests:

uv run pytest

Package function

from cufile_patcher import hello_world

print(hello_world())

Expected output:

Hello, world!

cuFile API (ported)

The package includes a modernized port of cuFile wrapper features:

CuFileDriver singleton driver lifecycle
CuFile class with mode mapping, open/close, context manager, read/write
low-level binding helpers:
- cuFileDriverOpen, cuFileDriverClose
- cuFileHandleRegister, cuFileHandleDeregister
- cuFileRead, cuFileWrite

Plugin architecture

The backend is plugin-based using OOP boundaries:

CuFileBackend interface in plugins/base.py
SystemCuFileBackend implementation in plugins/system.py
BackendRegistry and default backend switching in registry.py

You can register a custom backend for mocks, testing, or alternate transports.

Framework patchers

The project provides dedicated patchers for streaming large model files:

patch_torch_load(...) for torch.load
patch_tensorflow_load_model(...) for tf.keras.models.load_model
patch_safetensor_load_file(...) for safetensors.torch.load_file

Both patchers support:

configurable file-size threshold (min_file_size_mb)
configurable stream chunking (chunk_size_mb)
optional cuFile stream reader (use_cufile=True)
fallback to original framework loader if streaming fails

Context manager auto patching

Use a single context manager to install and remove framework patchers automatically:

from cufile_patcher import auto_patch

with auto_patch():
    # existing framework load calls can remain unchanged
    ...

Recommended usage

from cufile_patcher import auto_patch

# Recommended default for most projects.
with auto_patch(min_file_size_mb=100, chunk_size_mb=64):
    ...

This keeps migrations small because your existing torch.load, tf.keras.models.load_model, and safetensors.torch.load_file calls can stay as-is.

Selection and strict mode

from cufile_patcher import auto_patch

with auto_patch(torch=True, tensorflow=False, safetensors=True):
    ...

with auto_patch(strict=True):
    ...

Parameters

Parameter	Default	Meaning
`torch`	`None`	`None` auto-detects, `True` requires torch, `False` disables torch patching
`tensorflow`	`None`	`None` auto-detects, `True` requires tensorflow, `False` disables tensorflow patching
`safetensors`	`None`	`None` auto-detects, `True` requires safetensors, `False` disables safetensors patching
`strict`	`False`	Raise if a required framework is missing
`min_file_size_mb`	`64`	Minimum file size to switch from direct load to streaming path
`chunk_size_mb`	`16`	Streaming chunk size
`use_cufile`	`False`	Use cuFile reader instead of pure Python reader
`fallback_to_original`	`True`	If streaming fails, fallback to the original framework loader

Migration guidance

If your current code manually installs and uninstalls patchers, move the lifecycle to one block:

from cufile_patcher import auto_patch

def load_models():
    with auto_patch():
        # old loading calls remain unchanged
        ...

Notes and caveats

with cufile-patcher: is not valid Python syntax.
Use with auto_patch(...): instead.
None means auto-detect and patch available frameworks.
strict=True enforces availability checks for the selected/auto-detected set.
Explicit True for a framework raises if that framework is not installed.
Patches are process-global monkey patches while the context is active.

PyTorch example

import torch
from cufile_patcher import patch_torch_load

patcher = patch_torch_load(torch, min_file_size_mb=100, chunk_size_mb=64, use_cufile=True)
try:
    model_state = torch.load("/path/to/model.pt", map_location="cpu")
finally:
    patcher.uninstall()

TensorFlow example

import tensorflow as tf
from cufile_patcher import patch_tensorflow_load_model

patcher = patch_tensorflow_load_model(
    tf,
    min_file_size_mb=100,
    chunk_size_mb=64,
    use_cufile=True,
)
try:
    model = tf.keras.models.load_model("/path/to/model.keras")
finally:
    patcher.uninstall()

safetensors example

import safetensors.torch as st
from cufile_patcher import patch_safetensor_load_file

patcher = patch_safetensor_load_file(
    st,
    min_file_size_mb=100,
    chunk_size_mb=64,
    use_cufile=True,
)
try:
    tensors = st.load_file("/path/to/model.safetensors")
finally:
    patcher.uninstall()

Publishing

The publish workflow at .github/workflows/publish.yml is configured to use PyPI trusted publishing.

To use it:

Configure this GitHub repository as a trusted publisher in your PyPI project.
Create and push a tag like v0.2.0.
GitHub Actions will build and publish the package.

Resources

Resource	URL
Documentation	https://maifeeulasad.github.io/cufile-patcher/
Coverage	https://maifeeulasad.github.io/cufile-patcher/htmlcov/