Skip to main content

Overview

Outerport logo

Outerport is a Rust-based registry for model weights, with a Python API for efficient management and deployment of AI model weights across distributed servers. It can hot-swap models in seconds, allowing the deployment of services with mutliple models. It also supports a paradigm of asynchronous model management, which allows for faster checkpointing and loading without the Python process on I/O calls.

Checkout a live demo of the Outerport daemon performing instant model hotswapping. In our studies, model hotswapping for multi-model services give a 40% reduction in GPU costs over having multiple single-model services.

Outerport is designed for 4 things:

  • Performance. Model weights should be fast to download, and exist on a cache hierarchy of cloud storage → local storage → CPU memory → GPU memory for fast model swapping.
  • Reliability. There should be no single point of failure. The ability to self-host, as well as caching and tiered storage allows for minimal downtime even on service disruptions.
  • Security. Model weights can be encrypted if needed, and an access log is available for audits.
  • Scalability. Horizontal scaling should be fast, and should be able to accomodate for variable workloads via predictive scheduling.

This is made possible with a highly performant backend built on Rust, model compression technology (quantization, pruning, etc), and low-level GPU programming (CUDA).

See Quickstart to see how to get started.

System Architecture

The following diagram illustrates the high-level architecture of Outerport:

Outerport System Diagram

Upcoming Features

  • Web app for the registry.
  • Orchestrating horizontally scaling deployments.
  • Support for inference engines (vLLM, TensorRT, quantized inference).
  • Custom inference engine with prompt caching, RAG caching.
  • Support for end-to-end confidential computing from storage to GPU.
  • Predictive load balancing.

Upcoming Documentation

This documentation is still very much under construction.

Soon we will have:

  • Full specification of the CLI interface.
  • Full specification of the Python API.
  • Guide for setting up self-hosted registries.