Hidden traces of humanity: what AI images reveal about our world ...
As generative AI advances, it is easy to see it as yet another area where machines are taking over – but humans remain at the centre of AI art, just in ways we might not expect.
When faced with a bit of downtime, many of my friends will turn to the same party game. It’s based on the surrealist game Exquisite Corpse, and involves translating brief written descriptions into rapidly made drawings and back again. One group calls it Telephone Pictionary; another refers to it as Writey-Drawey. The internet tells me it is also called Eat Poop You Cat, a sequence of words surely inspired by one of the game’s results.
The evolution of AI image generation
As recently as three years ago, it was rare to encounter text-to-image or image-to-text mistranslations in daily life, which made the outrageous outcomes of the game feel especially novel. But we have since entered a new era of image-making. With the aid of AI image generators like Dall-E 3, Stable Diffusion, and Midjourney, and the generative features integrated into Adobe’s Creative Cloud programs, you can now transform a sentence or phrase into a highly detailed image in mere seconds. Images, likewise, can be nearly instantly translated into descriptive text. Today, you can play Eat Poop You Cat alone in your room, cavorting with the algorithms.
Exploring AI image generation
Back in the summer of 2023, I tried it, using a browser-based version of Stable Diffusion and an AI application called Clip Interrogator, which translates any image into a text prompt. It took about three minutes to play two rounds of the game. Stable Diffusion generates four images in response to any prompt; I cheated slightly by just choosing my favourite to proceed. From the centre of the frame, a decently realistic tabby cat stared me down, green eyes glowing wide, mouth hanging open to display a salmon-pink tongue. The background was grungy grey without much detail; some bubbly white text in the image’s lower third read: EAT EAT POOOOP POOP YU NOU SOME YOU!
The complexity of AI image generation
A nuanced syntax for image-generating prompts has emerged alongside the development of generative AI (genAI) tools, and Clip Interrogator’s “prompt” mimicked that accretionary layering of styles, details and descriptors – though this list felt excessive, like a psychedelic extrapolation of the image, which I was glad to know was already a “classic gem”.
The intersection of AI and human creativity
Stable Diffusion makes images by mapping language to a vast set of visual variables, while Clip Interrogator performs the inverse function. The seemingly random strings of proper and phrasal nouns and adjectives are the result of neural networks “reading” the image and assessing sections of pixels for clues that are then correlated with terms, however opaquely.
The emergence of AI artists
Although there were plenty of precursors, it wasn’t until January 2021 that talk of AI artists became big news, as people began to learn of the image-generating platform Dall-E. Back then, descriptions of the “AI artist” still felt like something out of a children’s book: type in a sentence and the computer magically spits out an image!
The future of AI image generation
By 2015, algorithmic processes were able to form simple sentences or phrases to describe an image. Patterns of pixels identified as, say, “cat” or “cup” were matched with linguistic tags, which were then translated into automated image captions in natural language. Quickly, researchers realised they could flip the order of these operations: what would it look like to input tags – or even natural language – and ask the neural networks to produce images in response?