"Multimodal" describes what a model can take in and put out, and the two are not always the same. Most current flagships, including Gemini, ChatGPT, and Claude, accept images and files as input. Output is where they differ: ChatGPT can generate images natively from the GPT-4o model, while Gemini produces them through a dedicated image model in its family (the viral Nano Banana). Either way, a model accepting an image as input does not automatically mean it can create one.
For brands, the practical upshot is that AI can now read the things you publish beyond text: the diagram in your guide, the slide in your deck, the product shot on your page. Clear, labeled visuals and accurate alt text become part of how an engine understands, and potentially cites, your content.