Start building with open source AI models
Property | Serverless | On-demand | Enterprise reserved |
---|---|---|---|
Performance | Industry-leading speed on Fireworks-curated set-up. Performance may vary with others’ usage. | Speed dependent on user-specified GPU configuration and private usage. Per GPU latency should be significantly faster than vLLM. | Tailor-made set-up by Fireworks AI experts for best possible latency |
Getting Started | Self-serve - immediately use serverless with 1 line of code | Self-serve - configure GPUs, then use them with 1 line of code. | Chat with Fireworks |
Scaling and management | Scale up and down freely within rate limits | Option for auto-scaling GPUs with traffic. GPUs scale to zero automatically, so no charge for unused GPUs and for boot-ups. | Chat with Fireworks |
Pricing | Pay fixed price per token | Pay per GPU second with no commitments. Per GPU throughput should be significantly greater than options like vLLM. | Customized price based on reserved GPU capacity |
Commitment | None | None | Arrange plan length with Fireworks |
Rate limits | Yes, see quotas | No rate limits. Quotas on number of GPUs | None |
Model Selection | Collection of popular models, curated by Fireworks | Use 100s of pre-uploaded models or upload your own custom model within supported architecture | Use 100s of pre-uploaded models or upload any model |
Was this page helpful?