The API provides access to several text-to-image models, each with its own strengths and weaknesses. There are two main categories: generalist models and finetuned models.
This article presents examples of image generation using different models.
Please note that while some models may appear to perform better based on these examples, it is possible that other models may perform better with different prompts.
As one image is not enough to represent the entirety of a model’s capabilities, it is up to you to determine which models are most suitable for your use case based on your own judgment.
Each model is associated to what is called its “native resolution”. This is the resolution at which the model was trained, and the resolution at which it will perform best.
You can request images at any resolution, regardless of the native resolution of the model, smaller or larger.
But the further you stray from the native resolution, the more the image may be degraded.
For example, with very large resolutions, the image may lack coherence and the subject may be duplicated.
That being said, small deviations from the native resolution are usually fine.
Some models were trained to respond to particular “trigger prompts”, which means that to activate their unique capabilities, you will need to
include this trigger in your prompt, preferably towards the beginning.
Since the API offers you direct access to each model, you have the choice whether to include the trigger words in your prompt or not.
openjourney
work well at generating the intended style without any trigger word (and just “amplify” the style if the words are included).synthwavepunk_v2
)But generally speaking, we recommend to always include the trigger words in your prompt, at the beginning.
If the trigger contains *subject*
, it is recommended to replace this by your intended subject.
For example if the trigger is RAW photo, *subject*, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3
,
and you want to generate an image of a dog playing catch
,
then your final prompt should be RAW photo, dog playing catch, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3
You can programmatically request the list of available models and their metadata using the following API endpoint: GET /info
The response will be a JSON object, with the models listed in the models
property, in the following format:
id
: the model ID, which you will need to use in your API requestsname
: the human-readable name of the modellicense
: the license under which the model is distributeddescription
: a short description of the modelcategories
: a list of categories that the model belongs tofunction
: a list of endpoints that the model supports (for example, most models are available with text2image
and image2image
, and the inpainting models are only available with inpainting
)nativeResolution: {"width": ..., "height": ...}
: the native resolution of the modeltriggers
: a list of trigger prompts that the model supports, or null
if the model does not support triggers