Digital Realism Elicits Apprehensive Designs

Technothlon
6 min readFeb 3, 2024

--

A blog on why AI generated images look so haunting — ly real.

Hey readers!

How many of you have used ChatGPT to complete your Maths or Science assignments? Most likely all of you have. How about we take the help of some AI to complete our pending Art assignment?

That might not actually be such a good idea.

After I had finished reading Harry Potter for the first time, I often wondered what the characters in the book would have looked like if they were real (this was before I watched the movies, of course). I had heard from somewhere that reading is better than watching television because it stimulates your mind to paint a picture by itself. My mind did so, but it nagged me that I had no reference to compare it with.

Recently, I came across an online AI text-to-image tool called Dezgo, and tested it. I used the exact description of Harry Potter from the book — the scar and everything — and this is what it delivered:

SOURCE — dezgo.com

Seems alright? Well, except the fact that the boy has no scar, I was pretty satisfied. How about real-life things? I typed “human being” and “hand” as two different inputs.

The results were… concerning.

SOURCE — dezgo.com

Ok, that’s pretty spooky! What’s up with the shadow on the girl’s face? And those nails on the hand! What is going on?

To answer this, we need to understand what’s really going on behind the scenes.

When you enter the description of the image that you want, the AI model converts the text into a latent representation. It basically means that it converts it into text in a lower dimensional language that the model itself understands. This is achieved by something called a “recurrent neural network”. *
To find out more about this, check out our blog titled B.R.A.I.N.S.

Now in step two, an image is generated from this encoded text using “Generative Adversarial Networks” (GAN). This is a deep learning model that goes through various stages to try and generate authentic images.

How does that work? We’ll go over the process using a simple example.

Picture this: we have an art forger who makes a living out of forging world-famous paintings. He takes a blank canvas along with lots of paint and proceeds to make a painting as similar as possible to the original one. This is the generation step — the generator in the GAN does a similar job of making almost authentic images from random noise.

Now, the art museum has hired a detective to identify which of the paintings are fake (after all, the museum doesn’t want to face losses!). The detective has great knowledge about the authentic paintings and can spot out even minor inconsistencies. This is the adversarial step — and in the GAN, it is done through a discriminator. The discriminator’s role is to evaluate the generated images and distinguish them from real images. Like the detective’s knowledge, it has a dataset of the real images.

SOURCE — GeeksforGeeks

The intelligent forger escapes before the detective arrests him and… well… returns to his antics. This time, he has learnt from his mistakes and proceeds to forge the paintings again, taking care to avoid any variation.

Unfortunately, it is still not perfect; the detective once again spots the differences, and this process continues. The training step of the GAN sees the generator and discriminator competing and improving over time. Eventually, an equilibrium is reached and the generator produces images that look so real that the discriminator cannot identify it as fake.

One thing must be clear to you from this. How long the art forger can keep up with his game depends on the detective’s knowledge of art. If, suppose, the detective is not familiar with the painting, even a terrible try by the forger might fool him.

CATCH THE ART FORGER, SPOT THE FAKE — SPOT THE FAKE MONA LISA (Source — The New York Times)

This is why the GAN model is usually provided with a large training dataset. This is also why AI models “learn” over time — as they encounter more repetitive patterns in the data, they mimic these characteristics and it is observed in the output.

But why are these images so creepy?

Back to our original images — why were they so scary? Take the hand for instance. Why were the nails so weirdly shaped?

Obviously, the AI can’t think like you and me — that is to say, it doesn’t possess a subjective understanding of what is scary.

Most of the AI datasets have realistic photos, and you will rarely find any image with the hands displayed in the large form. We tend to focus on our faces, don’t we? The hands may be balled up or curled around an object, such that no fingers are visible. As a result, when the model sees such images, it’s like: “Okay, so a hand that only has the beginnings of a finger. That’s what a hand looks like!” And then proceeds to deform an image as simple as that. The AI just treats the hand as any other object, it does not understand how the fingers are anatomically connected to the hand.

What about the scary faces? Again, the dataset for the discriminator can include a wide range of disturbing images. As the GAN model progresses through training, the generator learns to generate images that exhibit similar eerie characteristics, such as haunting backgrounds, or creepy textures.

SOURCE — The Nightmare Machine, MIT. An example of a disturbing image in the AI dataset.

By continuously refining the generator and discriminator through training, the AI generates increasingly realistic and convincing scary images.

Some Fascinating AI tools

In recent times, various companies have been interested in developing AI models. The most popular among them is, of course, the DALL-E family, developed by OpenAI — the same organization that made ChatGPT (in fact, you can find some of the DALL-E features in paid versions of ChatGPT).

A more recent AI launched was Midjourney by a developer of the same name. It is a way more powerful tool than DALL-E simply because of the more organic, life-like images developed by it. Despite this, DALL-E is still more popular because it is easier to use and understands the text prompts better.

Hauntingly “Real” Images

SOURCE — New York Post

If you check out some of these images, they look like pictures of real people taken by a normal camera. Surprise, surprise! They don’t exist! They have been “imagined” and developed by Midjourney.

Of course, if you look at them closely, you will see some chilling features like too many teeth, or an off-centered neck. But this is still an excellent result produced by an AI that’s barely a year old.

In fact, many experts are concerned about the number of deep-fakes that can be easily generated by such models. Maybe, even your art teachers might worry that you could get away without doing your assignments!

The world is heading towards a future that is surely going to be dominated by AI. How we let it impact our lives is worthwhile to be seen.

But we need to try to play the role of the responsible policeman who keeps the detective to heel and does not let the art forger get away with his crimes.

SOURCE — knowyourmeme.com

By Team Technothlon

--

--

Technothlon

Technothlon is an international school championship organized by the students of IIT Guwahati. Technothlon began in 2004 with an aim to ‘Inspire Young Minds’