Fast LLM Inference

Fast LLM Inference is a large language models capability available through Groq on Aweb. Ultra-low latency LLM inference optimized for speed via dedicated hardware. Access it through a single unified API with automatic failover and intelligent routing.

Try Fast LLM Inference API docs

Best for

Highest quality

Groq

Premium tier

Most affordable

Groq

Economy tier

Contract

Max Latency500ms

Streaming RequiredYes

Providers (1)

ProviderScoreQualityPricing

GroqDEFAULT

99premiumeconomy

Quick start

Call Fast LLM Inference through Alfred — automatic provider selection, failover, and load balancing included.

cURL

curl -X POST https://api.alfred-ai.app/v1/execute \
  -H "Authorization: Bearer $ALFRED_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "capability": "llm.fast-inference",
    "input": { "prompt": "Hello world" }
  }'

TypeScript

import { Alfred } from '@alfred/core';

const alfred = new Alfred({ apiKey: process.env.ALFRED_API_KEY });

// Alfred automatically selects the best provider
const result = await alfred.execute({
  capability: 'llm.fast-inference',
  input: { prompt: 'Hello world' },
});

console.log(result.output);

Orchestration pipeline

import { Alfred } from '@alfred/core';

const alfred = new Alfred({ apiKey: process.env.ALFRED_API_KEY });

// Multi-step pipeline with automatic failover
const result = await alfred.orchestrate({
  steps: [
    { id: 'step1', capability: 'llm.fast-inference', input: { prompt: 'Hello world' } },
    { id: 'step2', capability: 'llm.chat', dependsOn: ['step1'],
      input: { prompt: 'Summarize: $step1.output' } },
  ],
});

Related Large Language Models capabilities

llm

llm

llm

llm

llm

Getting started →API reference →All providers →All capabilities →