Using Predicted Outputs
Use Predicted Outputs to boost output generation speeds for editing / rewriting use cases
In cases where large parts of the LLM output are known in advance, e.g. editing or rewriting a document or code snippet, you can improve output generation speeds with predicted outputs. Predicted outputs allows you to provide strong “guesses” of what output may look like.
To use Predicted Outputs, set the prediction
field in the Fireworks API with the predicted output. For example, you may want to edit a survey and add an option to contact users by text message:
In this case, we expect most of the code will remain the same. We set the ‘prediction’ field to be the original survey code. The output generation speed increases using predicted outputs.
Additional information on Predicted Outputs:
- Using Predicted Outputs is free at this time
- We recommend setting temperature=0 for best results for most intended use cases of Predicted Outputs. In these cases, using Predicted Outputs does not impact the quality of outputs generated
- If the prediction is substantially different from the generated output, output generation speed may decrease
- The max length of the
prediction
field is set bymax_tokens
and is 2048 by default, and needs to be updated if you have a longer input and prediction. - If you are using an on-demand deployment, you can set
rewrite_speculation=True
and potentially get even faster output generation. We are working on rolling this out to Serverless soon.
Was this page helpful?