Practical Artificial Intelligence for Stage Design
Artificial intelligence (AI), machine learning, and generative art conjure up images ranging from mesmerizing to downright terrifying. Our popular culture is suffused with adversarial depictions of robots or AI draining all happiness, creativity, and life from humanity.
Since the early 2010s, however, the landscape of AI usage has changed dramatically. The proliferation of easy-to-use generative models–AIs trained on data to create new data–has led to a surge of recognition on social media. From the 2014 “puppyslugs” of Google’s DeepDream to hyper-realistic deepfakes and award-winning artworks, generative AI has firmly stepped onto the world stage.
In theatre, we’re used to new technology being our swift downfall. Here’s the good news: you are not one robot away from replacement, at least with current technology. AI is more comparable to the workflow enhancements and new possibilities that computer-assisted drawing or electric lighting offered theatre. So, just what are these tools that creatives find in their hands, and how might they apply to theatre?
As a projection designer by trade, I was initially intrigued by AI due to its possibilities to create generative video, rapidly respond to new prompts, and perform creativity. As an emerging designer, most of my work has been as an animator on productions. This usually means implementing a designer’s vision and storyboards by parsing libraries of stock footage and creating new visual effects. AI has helped me find shortcuts in the content-creation pipeline by allowing me to create almost fully realized images that fit our exact stylistic needs. I want to model some of the approaches I’ve used for other designers. Let’s start by breaking down one way we can use a small subset of AI-accelerated tools: text-to-image models.
These generative models can complement and expand your toolkit as a designer or theatremaker.
You might have used a program like Photoshop or Vectorworks to create a line before. The commands are simple and direct. Photoshop doesn’t guess what kind of line you want. You prescribe everything about that line: where to start and stop, the size of the line, its color.
The premise of text-to-image AI is that you can now use simple, natural language to describe what you want, and a machine “model” trained to recognize that language makes its best guess at that image. What you lose in specificity and control, you gain in the ability to rapidly iterate almost fully realized images. The machine takes many different steps to do this, engaging in a complex process that, in the case of image generation, pits machines against each other in a contest of generating and spotting machine images.
These generative models can complement and expand your toolkit as a designer or theatremaker. To illustrate these benefits, I’ll show examples from a few simple tools that have various degrees of adherence to open access practices: the text-to-image models available through Midjourney and DALL-E 2. Midjourney is currently in an open beta with free initial generations and a paid subscription system and DALL-E 2 is also in an open beta, with free initial generations and a pay-as-you-use credits system. Numerous open-source implementations and new models are released frequently. For example, Stable Diffusion, an open-source text-to-image model, was just released in August. Craiyon is a free alternative for early explorations.
An AI Toolkit
For the purposes of exploring these tools, I’ll be applying these text-to-image models in a prospective design for A Midsummer Night’s Dream by William Shakespeare. Here’s the gist: lots of hijinks about royals getting married. Then, fairies interfere and cause even more hijinks. It is a classic comedy of mistaken identities, trickery, and mischief often set in idyllic woodlands. Let’s take a look at the play from the perspective of a set designer in the research stage.
The first use of AI may seem most obvious and, for many, the most “acceptable”: generating concepts, mood images, and tone research. For many designers, the initial part of a scenic design begins confronting an array of Google Images, Pinterest, and their own collected visual research. Let’s see how prompting an AI works.
I asked the models to generate the following: The mysterious Fairyland, whose moon glimmers and dewdrops rest on the forested grasses.
This relatively vague description produces a variety of images, but it's helpful to see what it homes in on. Ambiguous concepts like “mysterious” are less tangible than descriptive ones like “moon,” “dewdrops,” and “forested grasses.” This is an interesting difference from typical mood research. More specific prompts decrease your total pool of potential images in a traditional search, but when generating machine images they increase the models’ capabilities to produce a desired result. Greater specificity opens more possibilities. In a traditional search, finding extant images of grasses and dewdrops together may be easy, but with AI you can guarantee the model also places these concepts in the context of a moonlit night. The models can interpret photographic and cinematic conventions, art history, specific artists’ styles, and physical context.
Let’s see what the models generate from the following text: A long shot, studio photograph of a theatrical stage set for A Midsummer Night's Dream. On stage is the mysterious Fairyland whose moon glimmers and dewdrops rest on the forested grasses.
Getting a sense of scale and presence of a potential design through AI can quickly allow designers to decide if a direction is worthwhile. But if I ask for the model to show me an entire design, won’t my design just be what the machine tells me?
To that point, we’re already directly referencing the visual history and digital traces of “A Midsummer Night’s Dream” by using the play title in our prompt. In this way, being too descriptive might actually be detrimental to the design process. Just as designers often avoid images of other productions of a play they’re designing, avoiding a similar specificity in your prompting may be necessary when integrating AI into your toolkit. Try placing the visual concepts you want to explore in contexts outside the theatre instead: a forest in an office, a moonlit night on Jupiter, dewdrops in a desert, etc. Leave room for you and your collaborators to imaginatively fill in the gaps before getting into the prescriptive work of renderings and drafting.
Text-to-image models are trained to recognize billions of different potential parameters, from the colors of “warm autumn” to the sharp edges and grit of “cyberpunk.” While initially trained on images that may not have contradictory concepts, these models can smash contexts together like “fairies” and “discopunk” cities.
Here’s the input text sans the theatrical set descriptor: A long shot, studio photograph of a stage at the mysterious Fairyland, a discopunk nightclub surrounded by the abstract dewdrop lights of a fairy city.
Here we can see already that the architecture of the stage set is not only clearly defined, but suggests conventions like spotlights, curtains, and wide-open floors built for staging.
These models can even be used to explore through the common mediums of sketching and drafting. Take a simple prompt: A sketch by Inigo Jones of a stage design for a nightclub.
These generations don’t need to be so specific, however. The ability to generate textural qualities is also within AI’s grasp, which could be useful to scenic painters. The concerns of a scenic painter closely match those of texturing artists in VFX and game development: creating believable and realistic textural treatments that can be easily “tiled,” or repeated. Being able to iterate a variety of approaches to these scenic treatments using AI models could be incredibly helpful. Let’s say there is a wall unit in our eventual stage design and ask the model to generate the following: A painterly texture of a gray medieval castle wall, overgrown with vines.
While helpful, the model may not interpret “texture” the way we want it to. Suggesting words such as “front-on,” “texture pack,” or “tiling” are helpful in that regard. A more specific style than painterly could help as well. Let’s apply this text: An illuminated manuscript's painted texture of a gray medieval castle wall, overgrown with vines, tiling.
If you were to use these textures flat-out, they do not seamlessly tile. Here is where the work of a scenic artist would come in, taking inspiration from the generations and making them work in actual space.
However, if you have access to DALL-E, the in-painting feature combined with using the Offset Filter in Photoshop would allow you to regenerate the seams of the texture to ensure perfect tiling, for which there are tutorials available online.
I encourage you to explore in your own ways the potential applications of AI models for costuming, properties, lighting, sound, and media design. AI models are not just limited to visuals either. They can be especially useful for object recognition/tracking or unique sound synthesis. Thinking expansively could send you down a variety of intriguing pathways, from generating custom dress patterns to unique gobo textures. Here are some results from quick experiments:
As I’ve explored potential uses of AI, I’ve held off on looking at some glaring issues with AI-accelerated workflows as they stand today. Although skepticism can be a limiting perspective, blind optimism about the use of AI isn’t a solution either. Critical considerations have become necessary when I’ve worked with AI.
Most AI models are inherent products of a digital age and an Internet’s worth of training material. The proliferation of AI cuts across our understanding of artistic intellectual property and rekindled debates about how technology is powered and weaponized. And while matching a specific artist's style or medium (imagining, for instance, a set design inspired by Salvador Dali or Frida Kahlo) is intriguing, two common issues come up. One is that the best results come from commonly recognized and acknowledged artists of the Western canon. They’re easily prompted just by name, and never has “name-recognition” been so literal. This is an example of bias in the training data. Even a prolific and well-known contemporary stage designer’s style might not noticeably transfer into the generated image with a prompt like this: A sketch by Es Devlin of a stage design for a nightclub.
The second issue is that without dramaturgical context, aesthetic choice, and critical context, this direct-reference methodology borders on the appropriative, exploitative, and extractive. An unfortunately common occurrence in contemporary design is appropriation without understanding or lived experience. AI tools, when used without consideration, reinforce this by making a surface-level reproduction of culturally specific styles even easier. The need for cultural understanding and nuanced design remain; AI isn’t a shortcut or escape from that tangible work.
Beyond these issues, the numerous biases of text-to-image models are well documented. As Melvin Kranzberg warned, “Technology is neither good, nor bad; nor is it neutral.” Models reflect the datasets and imagery of the world as it has been generally (mis)represented on the Internet. Generating images of doctors produces grids of primarily white and masculine figures, for instance, underrepresenting people of color, women, and non-binary people. Datasets might include hate symbols, pornography, and artistic works by creators who haven’t knowingly consented to their inclusion in training data. Because of the immense size of model training datasets, filtering content for objectionable or prejudiced material becomes difficult, and holding model authors accountable is difficult as well.
“We have to work in the theatre of our own time with the tools of our own time.”
An issue of power arises, both in terms of powering these expensive computations and powering systems of oppression. The tremendous environmental cost of training models and running them on GPUs (the same kind of hardware as cryptocurrency miners) has raised concerns about environmental impact. AI-accelerated technologies can also power systems of oppression, as in one of the most egregious examples, the vast investment in facial recognition technologies weaponized against ethnic Uyghurs in Xinjiang, China.
However, the COVID-19 pandemic proved that innovative and reflexive theatre responds to technological change by choosing how to balance adopting and contesting that technology with a critical perspective. From making theatre for digital spaces to navigating the logistics of reopening, over the past two years we have affirmed Robert Edmond Jones’ claim that “we have to work in the theatre of our own time with the tools of our own time.” Should we celebrate AI as a way to democratize dreaming up images for the stage? Batten down the hatches and prepare for a tidal wave of appropriative and energy expensive mediocrity? The theatre is as good a place as any to debate these questions. I hope by understanding the workflow for a common way of using generative AI you are more empowered to develop your own positions on its use for stage.
I’ll finish with a brief word on AI’s kinship with theatre, as explored by Fabian Offert in their essay What Could an Artificial Intelligence Theater Be?
“[W]hat theater and machine learning have in common is the setting up of an elaborate, controlled apparatus for making sense of everything that is outside of this apparatus: real life in the case of theater, real life data in the case of machine learning.”
We train ourselves on scripts, designs, and rehearsals to act out a performance. Ideally to replicate what we planned for, but always making something unique. Machine learning undergoes the same process. When collaborating with AI, you’re collaborating with a performer. In theatre, we often explore the tension between rehearsal, devising, improvisation, and interaction. I hope we can do the same with AI.