ChatGPT made Large Language Models one of the most cutting edge types of technology out there, but in spite of the fact that this is the case, we are already seeing the rise of MLLMs, or Multimodal Large Language Models, that can process images as well as text. Apple has just released its own MLLM dubbed MGIE, and it might represent the next step forward in the AI race with all things having been considered and taken into account.
The main thing that sets MGIE apart is its ability to edit images based on natural language instructions. Prompts don’t have to be delivered in a way that would be interpreted solely by an AI, but rather with normal everyday language, similar to the instructions one would give to a human image editor.
With all of that having been said and now out of the way, it is important to note that MGIE uses its MLLM to translate plain language into more technical instructions. For example, if a user were to give the instruction to make the sky in a particular picture a deeper shade of blue, MGIE will translate this into an instruction that asks to increase the saturation of a particular region by 20% or so.
On top of all of that, MGIE leverages its distinct end to end training scheme to create a latent representation of the end result that the user is looking for, which is referred to as visual imagination, and it subsequently derives the core of the instructions to edit each and every pixel accordingly. Such precision can be enormously useful because of the fact that this is the sort of thing that could potentially end up allowing edits to be made far faster than might have been the case otherwise.
MGIE can optimize photos, edit them, manipulate them or do anything else that a user might end up requiring. It is currently available as an open source model on GitHub, allowing users from around the world to take advantage of this AI breakthrough that Apple has made in collaboration with the University of California.
Photo: Digital Information World - AIgen
On top of all of that, MGIE leverages its distinct end to end training scheme to create a latent representation of the end result that the user is looking for, which is referred to as visual imagination, and it subsequently derives the core of the instructions to edit each and every pixel accordingly. Such precision can be enormously useful because of the fact that this is the sort of thing that could potentially end up allowing edits to be made far faster than might have been the case otherwise.
MGIE can optimize photos, edit them, manipulate them or do anything else that a user might end up requiring. It is currently available as an open source model on GitHub, allowing users from around the world to take advantage of this AI breakthrough that Apple has made in collaboration with the University of California.
Photo: Digital Information World - AIgen
Comments
Post a Comment