Skip to main content

Quickstart

This guide will walk through:

  • Setting up the Keep-Warm Daemon which orchestrates the model movement.
  • Interacting with the Keep-Warm Daemon and the registry via CLI and Python API.

This will show how you can set up Outerport and use it to manage models on your machine.

Hello, Outerport!

Setting up the Keep-Warm Daemon

The keep-warm daemon is a process that loads models into memory on-demand, allowing for faster access when you need them. Here's how to set it up:

  1. First, install the outerport package.

    (Coming soon!)

  2. Create a configuration file named outerport-daemon.toml with the following settings.

    [registry]
    url = "https://registry.outerport.com"
    use_huggingface_fallback = true

    [daemon]
    auto_start = true
    log_level = "info"

    [resources]
    # Set to "max" to use all available resources, or specify limits
    cpu_memory = "128GB"
    gpu_memory = "max"
    storage = "1024GB"

    Adjust the settings as needed:

    • registry.url: The URL of the Outerport registry.
    • use_huggingface_fallback: Set to true to use HuggingFace as a fallback if a model isn't found in the Outerport registry.
    • daemon.auto_start: Set to true to automatically start the daemon when using the Python API.
    • daemon.log_level: Set the logging level (e.g., "debug", "info", "warning", "error").
    • resources: Set resource limits for CPU, GPU RAM, and storage. Use "max" for maximum available or specify limits.

    See here for the full options for the config.

  3. Start the daemon:

    outerport daemon start

    The daemon will now run in the background, managing model loading and unloading.

    Alternatively, the daemon will automatically start if you invoke the Python API.

Interacting with the Registry

You can interact with the Outerport registry using either the CLI or the Python API.

Using the CLI

  1. List available models:

    outerport models list

    This command shows only the models that are available locally. Example output:

    MODEL ID                           TAG     QUANT   STATUS   SIZE     LAST USED
    outerport/gpt2-small latest fp16 ram 456MB 2 mins
    outerport/bert-base-uncased v1.0 int8 disk 420MB 1 hour
    outerport/stable-diffusion-v1-5 v1.5 fp16 disk 4.2GB 3 hours
    outerport/t5-small v1.1 fp32 disk 950MB 2 days

    To show all models, including those not downloaded locally, use the --show-all flag:

    outerport models --show-all

    Example output:

    MODEL ID                           TAG     QUANT   STATUS   SIZE     LAST USED
    outerport/gpt2-small latest fp16 ram 456MB 2 mins
    outerport/bert-base-uncased v1.0 int8 disk 420MB 1 hour
    outerport/stable-diffusion-v1-5 v1.5 fp16 disk 4.2GB 3 hours
    outerport/t5-small v1.1 fp32 disk 950MB 2 days
    outerport/roberta-large v2.0 int4 remote 1.3GB -
    outerport/whisper-large-v3 v3 fp16 remote 1.5GB -
  2. Load a model:

    outerport pull outerport/gpt2-medium:latest

    or with a specific quantization level:

    outerport pull outerport/gpt2-medium:latest --quant int8
  3. Unload a model:

    outerport rm outerport/gpt2-medium:latest
  4. Show model information:

    outerport models outerport/gpt2-medium:latest

    This will show the model information, including tags, quantization, and resources. Example output:

    Model: outerport/stable-diffusion-v1-5:v1.5
    Registry: https://registry.outerport.com
    Description: Stable Diffusion v1.5 text-to-image model
    Created: 2023-04-15 10:30:00 UTC

    Tags:
    Latest: v1.5
    Available: v1.5, v1.4, v1.3

    Quantization:
    Available: fp32, fp16
    Default: fp16

    Resources:
    Size: 4.2 GB (fp16)
    Estimated GPU VRAM: ~4 GB

Using the Python API

  1. Import the necessary modules:

    from outerport import AutoModelForCausalLM, AutoTokenizer, AutoPipeline
  2. Load and use a Transformer model:

    model_name = "outerport/gpt2-medium:latest"
    model = AutoModelForCausalLM.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    inputs = tokenizer("Hello, Outerport!", return_tensors="pt")
    outputs = model.generate(inputs)
  3. Load and use a Diffusers pipeline:

    model_name = "outerport/stable-diffusion-v1-5:latest"
    pipeline = AutoPipeline.from_pretrained(model_name)

    prompt = "A serene landscape with mountains and a lake"
    image = pipeline(prompt).images[0]
    image.save("generated_landscape.png")
  4. Load and use general models:

    model_name = "outerport/gpt2-medium:latest"
    model = outerport.load(model_name, device='cpu')
  5. Save the models asynchronously:

    outerport.save(model, async=True)

The Outerport API automatically handles model loading, caching, and resource management based on your daemon configuration. This allows you to focus on using the models in your application without worrying about the underlying infrastructure.