A lot has happened in generative AI this week, and as always, it’s almost impossible to keep up.
The generative train keeps choo-chooing its way down the wormhole toward the singularity and the collapse of reality. If the hyperspeed is blinding your eyes and searing your brain faster than the Woz can say Nope, thou shall not worry my friend, for I’m your generative meat watcher and this is your look into all the new cool that has popped into my feeds to knock my neurons off my skull.
In other words, here are the week’s must-see developments in generative AI, including a cheat mode for learning prompts (and creating your own Starbucks siren), a superbly powerful video creator, and a method to actually copyright yourself in AI space.
Unlocking prompts with MidJourney’s “/describe”
Let’s start with a must try: MidJourney has announced “/describe,” a command that literally describes any image you throw at it in text. Think about this as prompt reverse engineering, a way to learn the magical words you need to type to make something look the way you want.
You just need to type “/describe” and hit enter. MidJourney will ask you to drop a file and then it will give you four different descriptions of what it is seeing in its own convoluted promptydumpty language. You can use any of the descriptions to build your own prompt to get a new image back.
The results could be fascinating, as demonstrated in the thread above by designer and developer fofrAI. That relaxed AI Starbucks logo has the feeling of savoring a real espresso on a sunny terrace on Via Veneto instead of slurping a watery latte bucket on a NYC subway platform at rush hour. I love it because it not only teaches you how to design your own prompts, but it opens the way to a creative back and forth in which you input your own work, ask for a description, and then play with a prompt to explore new creative paths.
Copyright yourself like a Hollywood star
The folks at Metaphysic first amazed the world with their deepfake videos of Tom Cruise, then sent ABBA on tour again, and blew everyone’s minds by resurrecting young Elvis on America’s Got Talent. The company recently signed an agreement with CAA— the world’s largest talent agency—to suck the digital souls of actors and musicians into the deepfake AI realm so they can perform anywhere, everywhere, all at once, and till the end of times without ever being there. Yes, forever young even after death, to the suffering of Hollywood plastic surgeons.
Now, Metaphysic’s cofounder and CEO Tom Graham has been Dorian Grayed using that procedure, becoming the first human to have his AI biometrics profile registered with the U.S. Copyright Office. While this may not seem useful right now, it will be in just a few years, when you will be able to direct your own movie using real-time generative video, starring any actor with such a profile (after paying all the moneys for a license, of course).
Try Gen-2 to make a video with words
Talking about generative video, perhaps you have missed that anyone can now use Runway’s newest generative video platform. Like Metaphysic’s AI profiles, it may not seem very useful for actual production at his point, but if I were you, I would be using the hell out of it in preparation for what is coming very soon.
Just check out what LA-based director Paul Trillo has done with Gen-2—he’s built a completely synthesized hall of mirrors.
Surreal? Yes. Imperfect too. Freaking amazing? Absolutely. It “kinda rattled everything I know about image making,” Trillo says.
We are witnessing history in the making as this is just the seed of what is coming very soon. As Bryan Catanzaro—vice president of applied deep learning research at Nvidia—told me in a video interview last year, you will be able to create fully coherent, fully realistic HD video in about five years. In a decade, he told me, anyone will be able to interactively create full-feature films and series, just like a director does today with a crew of hundreds. It won’t mean that we will have 5 billion Citizen Kanes, but this revolution will enable all good creatives to conjure any story out of thin air and their imagination.
Capture reality to use in a video game
NeRF stands for “neural radiance field,” an AI that can take a few photos and turn them into full-3D scenes that look just like the real thing without knowing anything about 3D. This is what the Luma app does using video captured with your iPhone.
Now, the company has just released a plug-in that allows anyone to drag and drop these environments directly into Unreal Engine, creating instant environments for video games and movies, which will save a ton of time for creative professionals like indie game developers.
(19)