Overview
Outerport is a Rust-based registry for model weights, with a Python API for efficient management and deployment of AI model weights across distributed servers. It can hot-swap models in seconds, allowing the deployment of services with mutliple models. It also supports a paradigm of asynchronous model management, which allows for faster checkpointing and loading without the Python process on I/O calls.
Checkout a live demo of the Outerport daemon performing instant model hotswapping. In our studies, model hotswapping for multi-model services give a 40% reduction in GPU costs over having multiple single-model services.
Outerport is designed for 4 things:
- Performance. Model weights should be fast to download, and exist on a cache hierarchy of cloud storage → local storage → CPU memory → GPU memory for fast model swapping.
- Reliability. There should be no single point of failure. The ability to self-host, as well as caching and tiered storage allows for minimal downtime even on service disruptions.
- Security. Model weights can be encrypted if needed, and an access log is available for audits.
- Scalability. Horizontal scaling should be fast, and should be able to accomodate for variable workloads via predictive scheduling.
This is made possible with a highly performant backend built on Rust, model compression technology (quantization, pruning, etc), and low-level GPU programming (CUDA).
See Quickstart to see how to get started.
System Architecture
The following diagram illustrates the high-level architecture of Outerport:
Upcoming Features
- Web app for the registry.
- Orchestrating horizontally scaling deployments.
- Support for inference engines (vLLM, TensorRT, quantized inference).
- Custom inference engine with prompt caching, RAG caching.
- Support for end-to-end confidential computing from storage to GPU.
- Predictive load balancing.
Upcoming Documentation
This documentation is still very much under construction.
Soon we will have:
- Full specification of the CLI interface.
- Full specification of the Python API.
- Guide for setting up self-hosted registries.