Images that can be interpreted in a variety of ways have existed for many decades, with the classical example being Rubin’s vase — which some viewers see as a vase, and others a pair of human faces.
Where things get trickier is if you want to create an image that changes into something else that looks realistic when you rotate each section of it within a 3×3 grid. In a video by [Steve Mould], he explains how this can be accomplished, by using a diffusion model to identify similar characteristics of two images and to create an output image that effectively contains essential features of both images.
Naturally, this process can be done by hand too, with the goal always being to create a plausible image in either orientation that has enough detail to trick the brain into filling in the details. To head down the path of interpreting what the eye sees as a duck, a bunny, a vase or the outline of faces.
Using a diffusion model to create such illusions is quite a natural fit, as it works with filling in noise until a plausible enough image begins to appear. Of course, whether it is a viable image is ultimately not determined by the model, but by the viewer, as humans are susceptible to such illusions while machine vision still struggles to distinguish a cat from a loaf and a raisin bun from a spotted dog. The imperfections of diffusion models would seem to be a benefit here, as it will happily churn through abstractions and iterations with no understanding or interpretive bias, while the human can steer it towards a viable interpretation.