First-party allocations on Blackwell- and Hopper-class GPUs.
GPU cloud · Inference platform · Built by a research lab.
Where research becomes infrastructure.
Production B200 and B300 capacity for AI labs, neoclouds, and enterprise platform teams. VPC interconnect into your AWS, Azure, or GCP. The economics and performance are downstream of years of research on inference, long-context engines, and compute architecture — peer-reviewed and co-authored with university partners.
The four claims behind iFrame
Inference
20× faster on the hardware you'd buy anyway.
A managed serving stack — quantization, optimized kernels, smart batching — productized from years of research on production inference workloads.
Infrastructure partners
The same supply chain as the hyperscalers.
Our cluster runs on first-party GPU allocations from NVIDIA, Samsung HBM3e memory, Dell and Supermicro systems at the rack and pod level, and Cologix-operated carrier-neutral colocation. No marketplace resellers. No consumer hardware in the fleet.
NVIDIACompute B200 · B300 · H200 · H100SamsungMemory HBM3eHigh-bandwidth memory across the production fleet.
DellSystems PowerEdge XE9680Eight-GPU servers at the rack level, validated for our workloads.
SupermicroSystems NVL72-classLiquid-cooled, pod-scale Blackwell systems for training and large inference.
CologixColocation Carrier-neutralOperated facilities with direct interconnect to AWS, Azure, and GCP.
VPC interconnect
Your existing cloud — just with affordable GPUs.
iFrame instances peer into your VPC like any other subnet. Same security groups. Same IAM roles. Same audit logs. Same observability stack. Your platform team doesn't have to learn a new cloud — they just have a new region with more affordable GPUs.
- AWSDirect Connect + VPC peeringGA
- AzureExpressRoute + VNet peeringGA
- GCPCloud Interconnect + VPCGA
- OracleFastConnectBeta
Diagram: customer VPC on the left (subnets, security groups, IAM) peers via the supported interconnect product to iFrame bare-metal and GPU subnets on the right. Audit and observability flow across.
Solutions
One platform to handle them all.
- 20× throughputRead more
Inference at scale
Production token serving with predictable tail latency.
- 3× more affordableRead more
Distributed training
PyTorch FSDP, DeepSpeed, Megatron — at one-third the cost.
- 1B+ tokensRead more
Long-context workloads
Million-token-class context on standard hardware.
- Zero downtimeRead more
Migrate from hyperscalers
VPC interconnect lets you move workloads without ripping anything out.
Developer experience
Two minutes to your first GPU.
Self-serve from the start. The CLI works the way you'd expect. The API is OpenAI-compatible. The console exists for the rare moment you actually need it.
- OpenAI-compatible inference API
- Per-second metering, no minimums
- Bring your own model, automatic quantization
- All seven regions, latency-routed by default
# Install
curl -sSf https://iframe.ai/install.sh | sh
# Authenticate
iframe auth login
# Launch a B200 in us-west
iframe gpu launch --type b200 --region us-west
# SSH in
iframe gpu sshPortfolio
The single compute foundation for infinite applications.
iFrame is the compute platform and research lab built in cooperation with EGILAX, MED.REPORT, SEFIROT.AI, and Pulsar.Global - independent operating companies as well as Universities in the US and Europe. The portfolio is also the platform's first reference customer.
The portfolio
- EGILAXR&D services
Custom model architectures, dataset engineering, and joint research engagements for enterprise AI teams.
egilax.com - MED.REPORTHealthcare
Clinical reporting platform for providers and integrated health systems, deployed inside hospital networks.
med.report - SEFIROT.AIConsumer AI
Multi-model assistant for individuals, creators, and small teams. Frontier-model access without per-vendor accounts.
sefirot.ai - PULSAR.GLOBALPolicy
Coalition for healthcare data sovereignty and patient-controlled records, working with regulators and provider networks.
pulsar.global
Get started
For developers
Sign up free. First GPU in two minutes.
Per-second metering. No commitment. OpenAI-compatible API. Bring your own model and we'll quantize and serve it.
For enterprise
Talk to our team. VPC interconnect, custom SLAs.
Reserved capacity, named support, BAA, DPA, audit log streaming. Procurement-ready. Quoted to your workload.