GLM-5.1 is Z.ai's next-generation flagship model built for agentic engineering, with stronger coding capabilities and sustained performance over long-horizon tasks with hundreds of iteration rounds. It's a 754B-parameter MoE model
Provider: All Zhipu AI models | Fireworks AI
https://api.fireworks.ai/inference/v1/chat/completionsInstall: pip install openai
from openai import OpenAI
client = OpenAI(
api_key="your-fireworks-api-key",
base_url="https://api.fireworks.ai/inference/v1",
)
response = client.chat.completions.create(
model="accounts/fireworks/models/glm-5p1",
messages=[
{"role": "user", "content": "Hello, how are you?"},
],
max_tokens=1024,
temperature=0.7,
)
print(response.choices[0].message.content)Additional examples: Basic invoke, Streaming
| Parameter | Type | Description |
|---|---|---|
| max_tokens | integer | Maximum tokens to generate. (≥1) |
| temperature | float | Controls randomness. (0–2) Default: 0.7. |
| top_p | float | Nucleus sampling threshold. (0–1) Default: 1. |
| stream | boolean | Stream response chunks as they are generated. Default: false. |
| stop | string | Stop sequence or array of stop sequences. |
Pay per token for public open models without managing GPU deployments.
Use OpenAI-compatible client libraries by changing the API base URL.