Skip to content

GPU cloud · Inference platform · Built by a research lab.

Where research becomes infrastructure.

Production B200 and B300 capacity for AI labs, neoclouds, and enterprise platform teams. VPC interconnect into your AWS, Azure, or GCP. The economics and performance are downstream of years of research on inference, long-context engines, and compute architecture — peer-reviewed and co-authored with university partners.

All systems operational
0GPUs available now
7regions online
status.iframe.ai

The four claims behind iFrame

Inference

20× faster on the hardware you'd buy anyway.

A managed serving stack — quantization, optimized kernels, smart batching — productized from years of research on production inference workloads.

See methodology

Infrastructure partners

The same supply chain as the hyperscalers.

Our cluster runs on first-party GPU allocations from NVIDIA, Samsung HBM3e memory, Dell and Supermicro systems at the rack and pod level, and Cologix-operated carrier-neutral colocation. No marketplace resellers. No consumer hardware in the fleet.

  • NVIDIACompute
    B200 · B300 · H200 · H100

    First-party allocations on Blackwell- and Hopper-class GPUs.

  • SamsungMemory
    HBM3e

    High-bandwidth memory across the production fleet.

  • DellSystems
    PowerEdge XE9680

    Eight-GPU servers at the rack level, validated for our workloads.

  • SupermicroSystems
    NVL72-class

    Liquid-cooled, pod-scale Blackwell systems for training and large inference.

  • CologixColocation
    Carrier-neutral

    Operated facilities with direct interconnect to AWS, Azure, and GCP.

VPC interconnect

Your existing cloud — just with affordable GPUs.

iFrame instances peer into your VPC like any other subnet. Same security groups. Same IAM roles. Same audit logs. Same observability stack. Your platform team doesn't have to learn a new cloud — they just have a new region with more affordable GPUs.

  • AWSDirect Connect + VPC peeringGA
  • AzureExpressRoute + VNet peeringGA
  • GCPCloud Interconnect + VPCGA
  • OracleFastConnectBeta
CUSTOMER VPC · AWS / AZURE / GCPapp-subnet10.0.1.0/24 · ECS · Lambda · App servicesdata-subnet10.0.2.0/24 · S3 · RDS · Object storageIAM · security groupsInherited unchanged across the peeringPEERINGDirect ConnectIFRAME · US-WESTgpu-subnetB200 · B300 · H200 · H100bare-metal-subnetDedicated nodes · NVLink · Reservedaudit · observabilityCloudWatch · Azure Monitor · GCP Logging

Diagram: customer VPC on the left (subnets, security groups, IAM) peers via the supported interconnect product to iFrame bare-metal and GPU subnets on the right. Audit and observability flow across.

Developer experience

Two minutes to your first GPU.

Self-serve from the start. The CLI works the way you'd expect. The API is OpenAI-compatible. The console exists for the rare moment you actually need it.

  • OpenAI-compatible inference API
  • Per-second metering, no minimums
  • Bring your own model, automatic quantization
  • All seven regions, latency-routed by default
# Install
curl -sSf https://iframe.ai/install.sh | sh

# Authenticate
iframe auth login

# Launch a B200 in us-west
iframe gpu launch --type b200 --region us-west

# SSH in
iframe gpu ssh

Get started

For developers

Sign up free. First GPU in two minutes.

Per-second metering. No commitment. OpenAI-compatible API. Bring your own model and we'll quantize and serve it.

For enterprise

Talk to our team. VPC interconnect, custom SLAs.

Reserved capacity, named support, BAA, DPA, audit log streaming. Procurement-ready. Quoted to your workload.