Fireworks DeepSeek-V4-Flash (Global)

DeepSeek-V4-Flash is a streamlined open-source Mixture-of-Experts model optimized for fast, cost-efficient inference while preserving strong reasoning and coding performance at 1M token context scale. It leverages the same hybrid attention innovations as Pro but is tuned for lower latency and higher throughput in real-time applications. It delivers near-Pro reasoning quality under sufficient compute budget, making it ideal for interactive agents and high-volume production workloads.

Provider: All DeepSeek models | Fireworks AI

API Endpoint

https://api.fireworks.ai/inference/v1/chat/completions

Quick Start (Python)

Install: pip install openai

from openai import OpenAI

client = OpenAI(
    api_key="your-fireworks-api-key",
    base_url="https://api.fireworks.ai/inference/v1",
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Hello, how are you?"},
    ],
    max_tokens=1024,
    temperature=0.7,
)

print(response.choices[0].message.content)

Additional examples: Basic invoke, Streaming

Supported Parameters

ParameterTypeDescription
max_tokensintegerMaximum tokens to generate. (≥1)
temperaturefloatControls randomness. (0–2) Default: 0.7.
top_pfloatNucleus sampling threshold. (0–1) Default: 1.
streambooleanStream response chunks as they are generated. Default: false.
stopstringStop sequence or array of stop sequences.

Feature Guides

Serverless Inference

Pay per token for public open models without managing GPU deployments.

Documentation

OpenAI Compatibility

Use OpenAI-compatible client libraries by changing the API base URL.

Documentation