For months pks image has pointed at one model: gemini-3.1-flash-image-preview. Every banner on this site came out of it. Then gpt-image-2 showed up on Azure Foundry, and suddenly there was something to compare against.
This post is about two things at once. First: how we added gpt-image-2 support without landing on a --provider switch the user has to remember. Second: what happens when you run the same prompt through both models on four real blog banners from the pks-brain series.
The session started when I noticed the model had landed on our Foundry resource:
My first instinct was the obvious one: add a --provider flag. pks image --provider foundry. That's how half of all multi-backend CLIs do it. The user shot it down immediately:
That prompt is a mess — mixed Danish-English syntax, missing commas, a cant without an apostrophe — but technically it's crystal clear. The user isn't thinking in providers. The user is thinking in models. gpt-image-2 means something. foundry is an implementation detail. If the CLI forces you to translate from the first to the second every time, it's in the way of the work.
Model-driven routing
What we ended up with is a small abstraction: IImageProvider. Three methods, and every provider answers the question "can I serve this model?":
public interface IImageProvider
{
string Name { get; }
Task<bool> IsAuthenticatedAsync();
Task<bool> CanServeModelAsync(string model);
Task<IReadOnlyList<ImageModelInfo>> ListModelsAsync();
Task<byte[]> GenerateAsync(ImageGenerationRequest request);
string AuthHint { get; }
}
pks image iterates over the registered providers (Google, Foundry — and OpenAI direct when that day comes), picks the first one that is both authenticated and can serve the requested model, and hands off. The default model is still gemini-3.1-flash-image-preview — all existing blog tooling keeps working. But if you type --model gpt-image-2, pks finds the Foundry credentials on its own and routes there.
A --provider flag still exists, but it's an escape hatch for the day two providers both claim they can serve gpt-image-1 (Foundry and OpenAI direct). Not something the user thinks about daily.
Auth: no new credential dance
The best part was that the Foundry provider's auth path required no new setup. We already use pks foundry init to fetch and persist an OAuth refresh token in ~/.pks-cli/. That token has scope against cognitiveservices.azure.com. The image generation endpoint is one of the cognitiveservices services. So:
- If the credential carries an explicit API key, use it.
- Otherwise: mint an access token from the refresh token with Cognitive Services scope and send it as
Bearer.
No extra ENV vars. No separate pks image login. The Foundry credential the user already has is the image credential.
The comparison: four banners, two models
To find out whether gpt-image-2 was actually better — or just different — we took the four pks-brain banners that had just shipped from Gemini and ran the same prompts through gpt-image-2. Side by side:
pks-brain (the main post)


Gemini holds the sharpness. The card shingles are legible, the light streak lands exactly where it should. gpt-image-2 is softer — more studio shot, less technical product photo — but it has lost the diagonal amber light that gave the Gemini version its direction.
pks-brain-wiki


Here it's the opposite. gpt-image-2 delivers atmosphere — warmer, deeper shadows, the drawer looks lived-in. But the "labelled tabs" the prompt asks for have disappeared into the shadows. Gemini kept the tab text and brass plate visible.
pks-brain-graph


Both respect the composition — four stacks in a graph layout. But the tone is wildly different. Gemini is archival paper. gpt-image-2 is furnish-your-treasure-vault.
pks-brain-commit-messages


This one is the clearest illustration of the difference. Gemini reads the prompt — it sees Conventional Commits as a concrete reference and stamps it onto the sheet. gpt-image-2 delivers a beautiful press machine but has filtered out the specific. It's a great product photo, but it doesn't tell the story.
The pattern
After four comparisons the pattern is clear:
- Gemini Flash Image is literal. It reads the prompt as a specification and tries to land every detail. If you ask for
labelled tabsorConventional Commits stamp, it shows up. Tradeoff: composition can feel technical and cool. - gpt-image-2 is cinematic. It reads the prompt as a mood and pushes contrast, depth, atmosphere. Tradeoff: small specific asks vanish into the moodboard energy.
For our blog banners — where the metaphor is the content — gpt-image-2 still wins. The extra mood is the difference between "stock-photo-ish" and "editorial cover". We've flipped the make-blog-post skill to use --model gpt-image-2 as default. New posts arrive as .png at 1792×1024 from Foundry; older .jpg banners stay around from the Gemini era.
Where it goes from here
Next up on the list is the third provider: OpenAI direct. It's a single class implementing IImageProvider plus one DI registration line. When there are two providers both claiming to serve a given model name (Foundry and OpenAI share gpt-image-*), the --provider escape hatch actually becomes relevant — or registration order quietly decides who wins, which is also fine.
The interesting part is that we got here without giving pks image a single extra argument type. The model is the name. The model is the choice. Everything else figures itself out.
