Latey I've been generating recipes and other food related content for my social media platforms. When it comes to generating AI "photos" / "images" I've usually used ChatGPT for both recipes and photos, but more recently I've been comparing them to Grok and Copilot and noticed that given the same instructions, they generate different images.
- Asked
- Updated
- Viewed
- 1.2k times
1 Answer
-
Votes
-
Oldest
-
Latest
- Answered
- Updated
I called ChatGPT out on this:
Grok and Copilot AI tools sometimes seem to make better photos than you. Is there a reason for that? I give them the same instructions I give you
The response from ChatGPT:
That’s a fair observation. The difference usually comes down to how each AI image generator is trained and tuned rather than the instructions you give.
Here’s why you might see differences:
Training Data & Style Bias – Each tool has been trained on slightly different image datasets. Some lean more toward “artistic/illustration” while others are optimized for photorealism. That can make Grok or Copilot occasionally look more “real” depending on the subject.
Rendering Engine & Model Version – Even if prompts are the same, the underlying diffusion model (and how it interprets details like silverware, food textures, or lighting) can vary between platforms.
Prompt Handling – Some tools aggressively re-interpret or add to your prompt to give a “finished” look, while mine is usually more literal, unless you explicitly ask for stylistic adjustments.
Post-Processing – A few platforms quietly apply automatic touch-ups (sharpening, color balancing, contrast enhancements) to make images more polished. With me, the output is closer to raw model generation unless you request enhancements.
That said, if you’re aiming for the most photorealistic food images with perfect table settings, I can push harder on prompt engineering for realism (e.g., camera type, lens, lighting, plating context) so the results match or exceed what you’re seeing elsewhere.
I decided to ask Grok and Copilot a similar question. The answers were more detailed but shared the same essential information. The question I asked was "Why do images generated by different AI tools vary?"
Grok:
Images generated by different AI tools vary because of differences in their underlying architectures, training data, and design goals. Here’s a breakdown of the key factors:
- Model Architecture: Each AI tool uses a unique neural network design, like GANs (Generative Adversarial Networks), diffusion models, or transformers. For example, DALL·E uses a transformer-based approach, while Stable Diffusion relies on a diffusion process. These architectures process inputs and generate outputs differently, leading to distinct visual styles or quality.
- Training Data: The datasets used to train these models vary in size, diversity, and curation. For instance, one model might be trained on a broad dataset like LAION-5B (billions of internet images), while another might use a smaller, curated set focused on specific art styles. This affects the range of images a model can produce and its bias toward certain aesthetics.
- Training Objectives: Developers fine-tune models with specific goals, like photorealism (MidJourney), artistic flair (Stable Diffusion), or conceptual creativity (DALL·E). These priorities shape how the model interprets prompts and renders details, textures, or compositions.
- Prompt Interpretation: Each tool has a different way of parsing text prompts. Some might emphasize literal interpretations, while others lean into abstract or stylistic renderings. For example, a prompt like "futuristic city" might yield sleek, realistic skyscrapers in one model and neon-colored, surreal landscapes in another.
- Post-Processing and Filters: Some tools apply additional processing, like upscaling or style filters, to enhance outputs. This can make images from one model appear sharper or more polished than another.
- Randomness and Sampling: Generative models often incorporate randomness to produce varied outputs. Different sampling techniques (e.g., DDIM vs. PLMS in diffusion models) or seed values can lead to variations even within the same model, amplified across different tools.
- Community and Fine-Tuning: Some models, like Stable Diffusion, are open-source and fine-tuned by communities, leading to specialized versions optimized for specific styles (e.g., anime or realism). Proprietary models like MidJourney are tuned internally, reflecting the developer’s vision.
In short, the unique combination of architecture, data, and design choices makes each AI tool’s output distinct, much like different artists interpreting the same subject. If you want me to dive deeper into a specific tool or aspect, let me know!
Next I asked Copilot the same question:
It comes down to the fact that each image generator is, in effect, its own “artist” — with its own style, influences, and techniques. The differences you see aren’t random; they’re baked into the way each tool is built and trained.
Here are the main factors that make them diverge:
🧠 Model Architecture
Some tools use transformer-based models (like DALL·E), others use diffusion processes (like Stable Diffusion), and some rely on GANs (Generative Adversarial Networks).
These architectures “think” about your prompt differently — a bit like oil painting versus watercolor — so the brushstrokes, so to speak, aren’t the same.
📚 Training Data
The library each model learns from varies wildly.
One might study billions of diverse internet images; another could focus on a smaller, curated set full of moody, high-contrast photography.
That affects style bias, subject matter coverage, and even the way colors or lighting are handled.
🎯 Design Goals
Some are tuned for hyper-photorealism.
Others lean into painterly or surreal aesthetics.
If the goal is conceptual creativity, you’ll get something more interpretive than literal.
🗣 Prompt Interpretation
Each has its own “accent” in how it parses and prioritizes your words.
The same phrase can cue very different visual elements — like “rustic kitchen” meaning cozy farmhouse to one model, but sleek industrial loft to another.
🎨 Post-Processing & Filters
Many tools add their own polish after the raw generation — upscalers, sharpening filters, or style layers that give their work a signature finish.
🎲 Built-In Randomness
Even with identical prompts, internal sampling and random seeds mean that the same tool can give you variations — and different tools amplify that unpredictability in their own ways.
Think of it like commissioning the same scene from three different painters: they’ll all be looking at your brief, but their techniques, past experiences, and creative priorities will make each piece uniquely theirs.