Skip to content
DApks-cli-toolboximage-generation

Same prompt, two models: gpt-image-2 lands in pks image

Azure Foundry got gpt-image-2. We added it to pks image — but not as a --provider switch. --model gpt-image-2 finds the backend on its own. Here's how the comparison against Gemini plays out on four real blog banners.

For months pks image has pointed at one model: gemini-3.1-flash-image-preview. Every banner on this site came out of it. Then gpt-image-2 showed up on Azure Foundry, and suddenly there was something to compare against.

This post is about two things at once. First: how we added gpt-image-2 support without landing on a --provider switch the user has to remember. Second: what happens when you run the same prompt through both models on four real blog banners from the pks-brain series.

The session started when I noticed the model had landed on our Foundry resource:

human prompt
3 lines
I just noticed we got gpt image 2 on foundry, could we make our pks image support this together with google as we use now so we can use both going forward

My first instinct was the obvious one: add a --provider flag. pks image --provider foundry. That's how half of all multi-backend CLIs do it. The user shot it down immediately:

human prompt
1 lines
cant we we do the abstraction based on model instead, default google as now and we can do --model gpt-image-2 and then it finds out which auth/provider can provide the model and that could be foundry, because i assume we could also give it a openai directly and thats a 3th provider, so rather have it being model we specify and based on what we have authenticated we pick the token / url ect?

That prompt is a mess — mixed Danish-English syntax, missing commas, a cant without an apostrophe — but technically it's crystal clear. The user isn't thinking in providers. The user is thinking in models. gpt-image-2 means something. foundry is an implementation detail. If the CLI forces you to translate from the first to the second every time, it's in the way of the work.

Model-driven routing

What we ended up with is a small abstraction: IImageProvider. Three methods, and every provider answers the question "can I serve this model?":

public interface IImageProvider
{
    string Name { get; }

    Task<bool> IsAuthenticatedAsync();
    Task<bool> CanServeModelAsync(string model);
    Task<IReadOnlyList<ImageModelInfo>> ListModelsAsync();
    Task<byte[]> GenerateAsync(ImageGenerationRequest request);

    string AuthHint { get; }
}

pks image iterates over the registered providers (Google, Foundry — and OpenAI direct when that day comes), picks the first one that is both authenticated and can serve the requested model, and hands off. The default model is still gemini-3.1-flash-image-preview — all existing blog tooling keeps working. But if you type --model gpt-image-2, pks finds the Foundry credentials on its own and routes there.

A --provider flag still exists, but it's an escape hatch for the day two providers both claim they can serve gpt-image-1 (Foundry and OpenAI direct). Not something the user thinks about daily.

Auth: no new credential dance

The best part was that the Foundry provider's auth path required no new setup. We already use pks foundry init to fetch and persist an OAuth refresh token in ~/.pks-cli/. That token has scope against cognitiveservices.azure.com. The image generation endpoint is one of the cognitiveservices services. So:

No extra ENV vars. No separate pks image login. The Foundry credential the user already has is the image credential.

The comparison: four banners, two models

To find out whether gpt-image-2 was actually better — or just different — we took the four pks-brain banners that had just shipped from Gemini and ran the same prompts through gpt-image-2. Side by side:

pks-brain (the main post)

The Gemini version of the pks-brain banner: a brain made of stacked index cards with a sharp amber light streak from the upper left

The gpt-image-2 version of the pks-brain banner: same base composition but more painterly and muted, brain shape less pronounced

Gemini holds the sharpness. The card shingles are legible, the light streak lands exactly where it should. gpt-image-2 is softer — more studio shot, less technical product photo — but it has lost the diagonal amber light that gave the Gemini version its direction.

pks-brain-wiki

The Gemini version of pks-brain-wiki: a pulled-out card drawer with tabs and a brass handle, clear details

The gpt-image-2 version: same drawer, warmer lighting, more cinematic, but the labelled tabs are gone

Here it's the opposite. gpt-image-2 delivers atmosphere — warmer, deeper shadows, the drawer looks lived-in. But the "labelled tabs" the prompt asks for have disappeared into the shadows. Gemini kept the tab text and brass plate visible.

pks-brain-graph

The Gemini version of pks-brain-graph: four stacks of handmade paper bound with twine in a 2×2 pattern, brass anchors visible

The gpt-image-2 version: same 2×2 layout, darker and more treasure-chest-like, twine reads like cables

Both respect the composition — four stacks in a graph layout. But the tone is wildly different. Gemini is archival paper. gpt-image-2 is furnish-your-treasure-vault.

pks-brain-commit-messages

The Gemini version of pks-brain-commit-messages: an old hand-press over a sheet with 'Conventional Commits' stamped on, narrative readable

The gpt-image-2 version: same press, but the 'Conventional Commits' motif is gone, it's now a pure product shot of the machine

This one is the clearest illustration of the difference. Gemini reads the prompt — it sees Conventional Commits as a concrete reference and stamps it onto the sheet. gpt-image-2 delivers a beautiful press machine but has filtered out the specific. It's a great product photo, but it doesn't tell the story.

The pattern

After four comparisons the pattern is clear:

For our blog banners — where the metaphor is the content — gpt-image-2 still wins. The extra mood is the difference between "stock-photo-ish" and "editorial cover". We've flipped the make-blog-post skill to use --model gpt-image-2 as default. New posts arrive as .png at 1792×1024 from Foundry; older .jpg banners stay around from the Gemini era.

Where it goes from here

Next up on the list is the third provider: OpenAI direct. It's a single class implementing IImageProvider plus one DI registration line. When there are two providers both claiming to serve a given model name (Foundry and OpenAI share gpt-image-*), the --provider escape hatch actually becomes relevant — or registration order quietly decides who wins, which is also fine.

The interesting part is that we got here without giving pks image a single extra argument type. The model is the name. The model is the choice. Everything else figures itself out.