Quickstart
This guide will walk through:
- Setting up the Keep-Warm Daemon which orchestrates the model movement.
- Interacting with the Keep-Warm Daemon and the registry via CLI and Python API.
This will show how you can set up Outerport and use it to manage models on your machine.
Hello, Outerport!
Setting up the Keep-Warm Daemon
The keep-warm daemon is a process that loads models into memory on-demand, allowing for faster access when you need them. Here's how to set it up:
-
First, install the
outerport
package.(Coming soon!)
-
Create a configuration file named
outerport-daemon.toml
with the following settings.[registry]
url = "https://registry.outerport.com"
use_huggingface_fallback = true
[daemon]
auto_start = true
log_level = "info"
[resources]
# Set to "max" to use all available resources, or specify limits
cpu_memory = "128GB"
gpu_memory = "max"
storage = "1024GB"Adjust the settings as needed:
registry.url
: The URL of the Outerport registry.use_huggingface_fallback
: Set to true to use HuggingFace as a fallback if a model isn't found in the Outerport registry.daemon.auto_start
: Set to true to automatically start the daemon when using the Python API.daemon.log_level
: Set the logging level (e.g., "debug", "info", "warning", "error").resources
: Set resource limits for CPU, GPU RAM, and storage. Use "max" for maximum available or specify limits.
See here for the full options for the config.
-
Start the daemon:
outerport daemon start
The daemon will now run in the background, managing model loading and unloading.
Alternatively, the daemon will automatically start if you invoke the Python API.
Interacting with the Registry
You can interact with the Outerport registry using either the CLI or the Python API.
Using the CLI
-
List available models:
outerport models list
This command shows only the models that are available locally. Example output:
MODEL ID TAG QUANT STATUS SIZE LAST USED
outerport/gpt2-small latest fp16 ram 456MB 2 mins
outerport/bert-base-uncased v1.0 int8 disk 420MB 1 hour
outerport/stable-diffusion-v1-5 v1.5 fp16 disk 4.2GB 3 hours
outerport/t5-small v1.1 fp32 disk 950MB 2 daysTo show all models, including those not downloaded locally, use the
--show-all
flag:outerport models --show-all
Example output:
MODEL ID TAG QUANT STATUS SIZE LAST USED
outerport/gpt2-small latest fp16 ram 456MB 2 mins
outerport/bert-base-uncased v1.0 int8 disk 420MB 1 hour
outerport/stable-diffusion-v1-5 v1.5 fp16 disk 4.2GB 3 hours
outerport/t5-small v1.1 fp32 disk 950MB 2 days
outerport/roberta-large v2.0 int4 remote 1.3GB -
outerport/whisper-large-v3 v3 fp16 remote 1.5GB - -
Load a model:
outerport pull outerport/gpt2-medium:latest
or with a specific quantization level:
outerport pull outerport/gpt2-medium:latest --quant int8
-
Unload a model:
outerport rm outerport/gpt2-medium:latest
-
Show model information:
outerport models outerport/gpt2-medium:latest
This will show the model information, including tags, quantization, and resources. Example output:
Model: outerport/stable-diffusion-v1-5:v1.5
Registry: https://registry.outerport.com
Description: Stable Diffusion v1.5 text-to-image model
Created: 2023-04-15 10:30:00 UTC
Tags:
Latest: v1.5
Available: v1.5, v1.4, v1.3
Quantization:
Available: fp32, fp16
Default: fp16
Resources:
Size: 4.2 GB (fp16)
Estimated GPU VRAM: ~4 GB
Using the Python API
-
Import the necessary modules:
from outerport import AutoModelForCausalLM, AutoTokenizer, AutoPipeline
-
Load and use a Transformer model:
model_name = "outerport/gpt2-medium:latest"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer("Hello, Outerport!", return_tensors="pt")
outputs = model.generate(inputs) -
Load and use a Diffusers pipeline:
model_name = "outerport/stable-diffusion-v1-5:latest"
pipeline = AutoPipeline.from_pretrained(model_name)
prompt = "A serene landscape with mountains and a lake"
image = pipeline(prompt).images[0]
image.save("generated_landscape.png") -
Load and use general models:
model_name = "outerport/gpt2-medium:latest"
model = outerport.load(model_name, device='cpu') -
Save the models asynchronously:
outerport.save(model, async=True)
The Outerport API automatically handles model loading, caching, and resource management based on your daemon configuration. This allows you to focus on using the models in your application without worrying about the underlying infrastructure.