OpenAI Launches GPT Image 1.5: Revolutionizing Image Editing with AI

Throughout most of photography's approximately 200-year history, convincingly altering a photo required either darkroom access, Photoshop expertise, or at least a steady hand with scissors and glue. However, on Tuesday, OpenAI unveiled a tool that simplifies this process down to simply typing a sentence.

While OpenAI has been developing a conversational image-editing model since GPT-4o in 2024, Google managed to reach the market first in March with a public prototype of their Nano Banana image model, and later its refined version, Nano Banana Pro. The AI community's enthusiastic reception of Google's model did not go unnoticed by OpenAI.

OpenAI's latest innovation, GPT Image 1.5, is an AI image synthesis model that is said to generate images up to four times faster than its previous version and reduces costs by about 20% through its API. This model became available to all ChatGPT users on Tuesday, marking another advancement towards making photorealistic image manipulation an effortless process that doesn't require specific visual skills.

With GPT Image 1.5, users can easily integrate features like the "Galactic Queen of the Universe" into photographs, such as a room with a sofa, all within ChatGPT.

Notably, GPT Image 1.5 is a "native multimodal" image model, indicating that image generation occurs within the same neural network that processes language prompts. Unlike DALL-E 3, an earlier OpenAI image generator that relied on a diffusion technique, this new model treats images and text as the same type of data: tokens to be predicted and patterns to be completed. For instance, if you upload a photo of a person and request, "put him in a tuxedo at a wedding," the model integrates language and image data in a unified space to produce new pixel outputs, similar to predicting the next word in a sentence.

By utilizing this method, GPT Image 1.5 facilitates visual alterations more effectively than earlier AI models, enabling modifications like changing a person’s pose, adjusting the angle of a scene, or altering objects and clothing, all while maintaining consistent facial likeness through multiple edits. Users can interact with the AI model about photos just as they might refine a draft in ChatGPT, engaging in a dynamic revision process.