GPU cloud · Inference platform · Built by a research lab.

Where research becomes infrastructure.

Production B200 and B300 capacity for AI labs, neoclouds, and enterprise platform teams. VPC interconnect into your AWS, Azure, or GCP. The economics and performance are downstream of years of research on inference, long-context engines, and compute architecture — peer-reviewed and co-authored with university partners.

Talk to sales Start hourly Read the research

All systems operational

0GPUs available now

7regions online

status.iframe.ai

Inference

20× faster on the hardware you'd buy anyway.

A managed serving stack — quantization, optimized kernels, smart batching — productized from years of research on production inference workloads.

See methodology

Infrastructure partners

The same supply chain as the hyperscalers.

Our cluster runs on first-party GPU allocations from NVIDIA, Samsung HBM3e memory, Dell and Supermicro systems at the rack and pod level, and Cologix-operated carrier-neutral colocation. No marketplace resellers. No consumer hardware in the fleet.

NVIDIACompute
B200 · B300 · H200 · H100
First-party allocations on Blackwell- and Hopper-class GPUs.
SamsungMemory
HBM3e
High-bandwidth memory across the production fleet.
DellSystems
PowerEdge XE9680
Eight-GPU servers at the rack level, validated for our workloads.
SupermicroSystems
NVL72-class
Liquid-cooled, pod-scale Blackwell systems for training and large inference.
CologixColocation
Carrier-neutral
Operated facilities with direct interconnect to AWS, Azure, and GCP.

VPC interconnect

Your existing cloud — just with affordable GPUs.

iFrame instances peer into your VPC like any other subnet. Same security groups. Same IAM roles. Same audit logs. Same observability stack. Your platform team doesn't have to learn a new cloud — they just have a new region with more affordable GPUs.

AWSDirect Connect + VPC peeringGA
AzureExpressRoute + VNet peeringGA
GCPCloud Interconnect + VPCGA
OracleFastConnectBeta

See the architecture Talk to sales

Diagram: customer VPC on the left (subnets, security groups, IAM) peers via the supported interconnect product to iFrame bare-metal and GPU subnets on the right. Audit and observability flow across.

Solutions

One platform to handle them all.

All solutions →

Developer experience

Two minutes to your first GPU.

Self-serve from the start. The CLI works the way you'd expect. The API is OpenAI-compatible. The console exists for the rare moment you actually need it.

OpenAI-compatible inference API
Per-second metering, no minimums
Bring your own model, automatic quantization
All seven regions, latency-routed by default

Read the docs Sign up free →

# Install
curl -sSf https://iframe.ai/install.sh | sh

# Authenticate
iframe auth login

# Launch a B200 in us-west
iframe gpu launch --type b200 --region us-west

# SSH in
iframe gpu ssh

Portfolio

The single compute foundation for infinite applications.

iFrame is the compute platform and research lab built in cooperation with EGILAX, MED.REPORT, SEFIROT.AI, and Pulsar.Global - independent operating companies as well as Universities in the US and Europe. The portfolio is also the platform's first reference customer.

The platform

iFrame

Compute platform

GPU cloud and inference platform. Hourly capacity and reserved commitments on Blackwell-class hardware, with VPC interconnect into AWS, Azure, and GCP.

iframe.ai

The portfolio

For developers

Sign up free. First GPU in two minutes.

Per-second metering. No commitment. OpenAI-compatible API. Bring your own model and we'll quantize and serve it.

For enterprise

Talk to our team. VPC interconnect, custom SLAs.

Reserved capacity, named support, BAA, DPA, audit log streaming. Procurement-ready. Quoted to your workload.

Talk to sales Trust packet →

Where research becomes infrastructure.

More cost efficient

Instant access, no procurement queue

Peers into your existing cloud

Faster inference on the same hardware

20× faster on the hardware you'd buy anyway.

The same supply chain as the hyperscalers.

Your existing cloud — just with affordable GPUs.

One platform to handle them all.

Inference at scale

Distributed training

Long-context workloads

Migrate from hyperscalers

Two minutes to your first GPU.

The single compute foundation for infinite applications.

Sign up free. First GPU in two minutes.

Talk to our team. VPC interconnect, custom SLAs.

Where research becomes infrastructure.

The four claims behind iFrame

More cost efficient

Instant access, no procurement queue

Peers into your existing cloud

Faster inference on the same hardware

20× faster on the hardware you'd buy anyway.

The same supply chain as the hyperscalers.

Your existing cloud — just with affordable GPUs.

One platform to handle them all.

Inference at scale

Distributed training

Long-context workloads

Migrate from hyperscalers

Two minutes to your first GPU.

The single compute foundation for infinite applications.

Get started

Sign up free. First GPU in two minutes.

Talk to our team. VPC interconnect, custom SLAs.