Using Document Inlining
Overview
Document Inlining allows any LLM to process images and PDFs through our chat completions API. Simply append #transform=inline
to your document URL to enable this feature. Document Inlining connects our proprietary Fireworks Parsing Service to any LLM to provide advantages including:
-
Improved reasoning (compared to VLMs): LLMs reason better over text than over image and document inlining allows you to use specialized and more recently updated text models
-
Improved input flexibility: Document Inlining enables PDFs and multiple images to be ingested
-
Ultra-simple usage: Use Document Inlining through our openAI-compability, chat completions API. Simply add 1-line to specify to add your file and turn on Document Inlining
Read our announcement blog for more details.
Usage
Basic Example
Note the “#transform=inline” addition to the image URL.
The image_url.url
field supports both direct URLs and base64-encoded data URLs, compatible with VLM API:
Similarly, simply append #transform=inline
to the base64 string to enable document inlining.
Combining with Structured Output
Document Inlining works seamlessly with structured output formats. Here’s how to extract specific fields using JSON mode:
Limitations
Document Inlining is only intended to handle images and documents that contain text. Document Inlining may provide subpar results for highly visual, spatially dependent, or layout-heavy content that does not translate well into structured text.
-
Maximum document size: 50 pages or the model’s context size (whichever is smaller)
-
Maximum document size: ~32 MB if sent as base64 encoded string, ~100 MB if sent as URL
-
Supported formats: PDFs and images
Model Compatibility
Document Inlining works with any LLM on Fireworks, including:
-
Serverless models
-
On-demand models
-
Fine-tuned and custom models
-
Vision models
Simply append #transform=inline
to your document URL to enable the feature with any supported model. Multiple documents are supported. Vision models also support document inlining with images for use cases that require both document processing and non-document vision. Users can control whether to inline a document by selectively appending #transform=inline
to image_url.url of each attachment.
Pricing
During public preview, Document Inlining incurs no added costs compared to our typical text models. For example, let’s say you’re conducting a structured extraction task where you provide: Input: 10 token Prompt + document with 1,000 tokens worth of text Output: 100 tokens You would simply pay for the 1110 tokens worth of input and output token costs but will NOT incur additional costs for document parsing.
Please note that Document Inlining is in Public Preview mode and subject to changes. Please contact us on Discord if you have feedback or questions or at inquiries@fireworks.ai for enterprise inquries.
Was this page helpful?