Produced by

Kat

Mustatea

Theatre Tech Talks: Artificial Intelligence, Science, and Biomedia in Theatre

BodyMouth, a New Instrument Created by a Playwright

25 February 2025

Produced by

Tjaša Ferme: Hey theatre, science, and innovation fans. This is Tjaša Ferme, your podcast host for Theatre Tech Talks: AI, Science, and Biomedia in Theatre—a podcast produced by HowlRound Theatre Commons, a free and open platform for theatremakers worldwide. Tune in.

Kat Mustatea is a transmedia playwright whose experiments with language, live art, and the computational uncanny articulate the absurdities of human being in an increasingly algorithmic world. Her work has been presented at Arts Electronica, New Images Festival, New York Live Arts, and The Cube at Virginia Tech. Today, we talked about her project BodyMouth, an instrument for embodied speech. Okay, great. Kat, it's amazing to see you. I'm excited about this new piece of yours, BodyMouth. Basically, BodyMouth is a performance piece as well as an instrument that you created.

Kat Mustatea: Yes. So I guess it's a multimodal project, and one of the outputs. I never really thought of creating a musical instrument as an actual creative output, but here we go. So I would say that BodyMouth is the instrument. And then obviously because it's an instrument, it lends itself to making interesting.

Tjaša: Yeah, yeah. So you said that you never thought you would make an instrument. So exactly, what was the impetus for even embarking on this journey and starting this? I imagine this was a rigorous research project. This was not an easy load. So how did this start and what was the impetus for it?

Kat: Yeah, it was a lot of research. It was years of research, and it started because mid pandemic, I was involved in a dance piece that had to do with putting sensors on a dancer's body, and the dancer was going to be, because pandemic, was going to be performing alone in a space. And then we were going to take the bio data from the dancer. And there were several of us on this project, we were all working remotely in different parts of the world, and we'd come together and meet and be like, okay, what are we going to do creatively while the dancer is moving with the data he's generated? And they were like, well, we can make interesting visuals. Here's an idea. We can make a sound. But I was like, well, I've seen that before. I've seen both of those things happen. So I come from language. I feel that my primary creative basis is language itself. I'm a playwright. So I was like, I want to make language. How can we make language with it?

And we were all scratching our heads like, oh, okay, interesting. Let's see what we can do with that. And we ended up doing something a little hacky because we only had six weeks to make that happen. And it involved cutting, recording me, and then speaking a poem, and then splicing up the recording to little bits that we would play back as the performer was moving. So that was a little preview. But then I wrote to one of the developers and I was like, "I cannot get this idea out of my head. I want to figure out how to make this happen, and I have no idea how to approach it. I don't know anything about sensors. I don't know anything about speech synthesis. I just can't stop thinking about this." And I was like, "I've given it months where I'm just like, maybe I'll think about it for another month and then it'll go away, but I still couldn't..." So then come 2021 along with that colleague, I just kept talking about it like, "I'm really obsessed with this." And I just ended up talking to two more people, kind of roping them in.

And the four of us would meet every other week or so and be trying to move this question forward like, how in the world do we do this? We were trying to use an open source project called Pink Trombone, which is a speech synthesizer. At some point after a few months, we had gotten somewhere, but I was like, I bet if we all were in the same room at the same time, I bet within a couple of days we could actually figure out how to make a prototype of this or a rough something or rather just to prove that it can be done. This was the time in the pandemic when people were getting vaccinated, so there was the possibility that this could be done, we could do this. Nobody sponsored us. We all just figured out ways to affordably get to Portland where one of the people lived, and we sat around in her studio for three days and knocked out an MVP.

We have these early videos of all of us giggling because I had this very rough version of the instrument on me, and I was trying to work out how to say Mama, and we were all like, "It's alive. It speaks."

Tjaša: You made a baby. You made a baby instrument.

Kat: Yes.

Tjaša: Of course.

Kat: And then off of that very rough, silly attempt, I got my first research residency at Harvest Works in the year 2022, and dot, dot, dot. And so it's been a multi-year research of figuring out which sensors? How does speech work? I'll know a lot more things about how your larynx and your tongue and your lips all function in your mouth to make your voice and your speech happen.

Tjaša: Oh my God, it's a science. I mean, as an actor, I remember that certain teachers wanted us to know stuff like that, and it was just like, why?

Kat: Exactly.

Tjaša: It's so innate, do you know what I mean? I know how to speak. But yeah, it's so interesting. No, I feel like I love where you went, and it's like, yeah, it must've been a really deep and kind of baffling and maybe confusing research at first to find the basic particles like atom units of speech. But it's also interesting because when I was looking at the footage and whatnot, I was thinking that you probably started with the mouth, how does the mouth make the sound and how do we then put them into a computer? And then you have the other layer where you're basically cross connecting that to the human organ with the body. But it sounds like you started with the body though. You started with a choreography.

Kat: Yeah, I mean, it was always sort of mutually arising. So one of my earliest attempts when I didn't understand very much about how the mouth works, I had one sort of primitive insight, which was, well, the way we produce vowels has to do with the width of the tongue in your mouth and the position front or back of the tongue. So most vowels are just those two variables. I'm like, oh, that's an XY axis. Great, I understand XY axes. I can put the entire XY axis on one hand so I can wave my hand around and be like, "Ah, ee, oh, oo, ah." It's like a continuum, vowels are. There's not a fixed vowel. There's just a kind of continuous space of vowels. So that was the first thing we tried, choreographically speaking. We mapped that out and put it on a dancer's body, invite the dancer to try it, and she's just standing around waving one hand, and it was so boring. So from that, it was like, oh, back to the drawing board. That is not interesting.

Because if the goal here was to create something that just looks really dynamic on the body and that really kind of engages the body in expressive ways, that was not going to do it. So the actual design, like how you map all of the variables that go into speech production onto the body in a way that is interesting was its own design challenge. And beyond figuring out the speech synthesis part, I would say that design is a little bit the secret sauce here. And there are now several designs, choreographically, for how the instrument works. So we'll teach performers either one design or another until they sort of master that shape. I call them shapes. But they could learn multiple shapes for producing the same sentence or whatever.

Tjaša: Yeah. I guess I'm curious because when I was looking at the footage, one of the videos was basically a demo that breaks it down how this works, and the other one was actually footage of performance, and there were two people. And it was really interesting when the vocals were crossing, overlaying, it was just all of a sudden it became this beautiful polyphony in some way. But also I was curious if you choreographed with producing a certain text in mind so that when you watch a performance, there's always a particular text that comes out.

Kat: Yeah. So what you saw, that footage is from Atlanta where we performed in the context of being finalists for the Guthman musical prize, Guthman Prize, which is for new musical instruments. And what you're seeing there is something that we realized along the way can happen with this instrument, which is you can put an entire instrument on one performer and then give a different performer a different version of the instrument, so a different pitch, for example, so they could each speak at different pitches and make a kind of chorus. But you can also have two performers use one instrument. So they can't speak by themselves, they have to coordinate to create one voice, which has a real gorgeous poetry to it. So what you're seeing on that video is them coordinating to make the voice. But then, as you said, you can layer different pitches of voices. So one person can control several voices and can pitch them differently.

Each voice pitched differently. So you get these gorgeous polyphonies, right? And the harmonies and dissonances that you can get there become their own... I mean, they have their own musical quality, which is part of studying, what can you do that is particular to this instrument expressively? And that kind of polyphony is part of it.

Tjaša: I'm just a myriad of, I don't know, semi dumb ideas or I don't know if they're dumb, but I'm just thinking, you could have them do a famous monologue from Hamlet, right?

Kat: Yes.

Tjaša: Because the computer generates the voice, is that correct?

Kat: Yes.

Tjaša: So basically you could have two people move and then the computer generates a well-known speech. And obviously it'd be a totally different texture and quality to it because two bodies coordinating create a totally different rhythm of how that's produced.

Kat: So interestingly, it's not generative. When we think of generative, it's like the computer's doing something additional that is...

Tjaša: Yeah, that's not what I meant. It's just the producer, the producer, the maker of the sound.

Kat: It's creating frequencies, right? All it's doing is mathematics, right? It's creating these frequencies. So I didn't really answer your earlier question, but I'll try to answer it now, which is that, yeah, you can pick whatever text you want to render in this way. But I would say that it's always this mutually arising process where I'll get in there in a studio with a performer and we'll study... I'll have some ideas about the kind of text I would like to end up with, and we'll try out different words. So we'll workshop them on their body, and on their body, some things look or sound more interesting than others. So sometimes the creative decision is around what's really working for this dancer? What kind of words are really great for this dancer? Some words are rendered really well by the computer, some sounds, a few of the consonants are a little funky or hard to make out so we tend to stick to words that just seem really beautiful on the body.

Tjaša: Of course.

Kat: So there's a little bit of aesthetics that goes into even the choice of what words we're doing. But of course, yeah, you can choose to try out “to be or not to be.” You can do that if you want.

Tjaša: No, that's very Walt Whitman-ish, using the words that... I mean, any poet really, but just Walt Whitman came to mind. Cool. Yeah. Interesting. So who are your partners? Did you have any universities that you were working with? Who were all the people, the teams that made you do this?

Kat: There's just us. It's just us hamsters spinning in our wheels behind the scenes. So I've had a series of small, little research kind of residencies. So one at Harvestworks, one at the cell theatre, one at NYU's ITP [Interactive Telecommunications Program] program. And each were tiny, tiny little, "Hey, here's a little bit of enablement. Here's a space, or here's a tiny amount of money." So this just has been mostly self-driven. The creative technologist that I eventually found is Yonatan Rozin, and we met during my residency at ITP. He's the one who's effectively built all of the software that is needed here. But it's just us, honestly. There's not a major institution behind us. And so it's just a lot of us getting super nerdy around, "How does the tongue work? How does the tongue press on your teeth to make a certain 'T' sound?" And then we have had some help from high level mentorship from a speech synthesis lab in Dresden. And people get PhDs in speech synthesis, so we found a researcher who his specialty... Backing up.

Speech synthesis is very useful in a lot of therapeutic contexts, which is what this lab is doing. So he ended up, as his dissertation, creating a way to render speech in real time, which is computationally pretty complex. All of the instances of human-like speech synthesis that you see out in the wild, like your Alexa's and your Siri's. It'll take a second for Alexa to compose the sentence that it's spitting out, and that second is computation. There's computation going on in that one second delay. And for this, it seemed very important that you see it done in real time. So that became a whole thing. It has to be in real time. I have to be able to see the gesture and hear the exact phoneme being done. And so this particular researcher, I think it was just great luck, he ended up doing his dissertation on exactly this, real time rendering of phonemes. And so every once in a while, he'll talk us through a hard problem because we don't have PhDs in speech synthesis.

Tjaša: Yeah, yeah. That was my major question. Who was really behind... I mean, there's a lot of theorists that tackle the exact positions and exact coordinates in the mouth and whatnot. This is not soft science. This is almost like a medical field. So I was very curious, who did you get in? Who did you work with?

Kat: So we are using already made speech synthesizers like the ones created in these labs. Here's a fun fact, like a fun nerdy thing. So in order to create this synthesized speech, what they do is put a real person in an MRI and do these really big computations, big calculations around every single geometry related to the vocal cords, the width and shape of the throat, how air goes through from your lungs to your breath. All of that is mapped out, and then they create a kind of virtual model of it. And off of those geometries is how they're creating speech synthesizers. Well, I, from the very beginning said, well, I want the voices that I use to be these beautiful, East European female vocal, like these resonant female, like gorgeous vocal effects.

Tjaša: I know exactly what you're talking about.

Kat: You know exactly what I'm talking about, right?

Tjaša: The Serbian folklore. The Bulgarian folklore.

Kat: Yeah, exactly. I was like, I want that. And how do I get that? Because the speech synthesizers I was finding were all these very awkward male sounds, and I was like, I don't know how to translate this into... What am I.... And it turns out the basis for a speech synthesizer, at least the one we're using, was a guy, and there are anatomical differences.

Tjaša: I'm not surprised at all. Of course.

Kat: So of course there's known sort of broad ranges for a female voice and broad ranges for male voice, and of course there's people who are kind of in between those, they overlap those ranges, but for the most part, more women have higher pitches. And the way that that works is anatomical. It has to do with the length of your cords, the length of your larynx, it has to do with all of these geometries that are anatomical. So if you put a guy in the MRI and you create an entire model of the speech off of a guy, you're going to end up with guy sounds. It becomes then very hard to pitch that voice upward because you're trying to unnaturally extend something in the anatomy there, and it doesn't quite work. So what you need is to put a woman in an MRI machine, but that's not the case with the speech synthesizer we currently have. So we had to do a lot of post effects on the voices to get them to sound beautiful. And that's still a work in progress, that's still, even after all this time.

I think we've made a lot of progress, but it's not yet there. And all this to say, it would've been a lot simpler if we'd had a female voice to begin with.

Tjaša: And I saw that you actually had a singer, like an actual person, singer on stage. What was her function within this BodyMouth?

Kat: Yeah, so I thought it would be really... So one cool thing about this instrument is that it doesn't require breath like we all do. So you can sing forever, you don't need to pause to take a breath, which is really interesting.

Tjaša: It's just that the rendering takes a little bit longer.

Kat: Yeah.

Tjaša: It takes longer to make the sound, but you don't have to breathe, so it evens out.

Kat: So conceptually, that's really beautiful, and I thought it would be so interesting to have this mix of... Okay, so backing up. The reason we even had a singer was because for the competition in Atlanta, the whole thing was they pair you with an Atlanta area musician, and they said you can pick any kind of musician that you're interested in. So I thought, oh, maybe it would be interesting to work with a woodwind because it's also breath related. But ultimately we requested a vocalist, and I thought it would be conceptually so interesting to have it be that the instrument is the one producing the language, and the vocalist is producing the abstracted, sort of breathy things instead of the other way around.

Tjaša: Cool. You said that certain gestures or certain vocabulary looks interesting on one performer's body and does not look appealing at all on a different performer's body. So what do you shift? Do you change your mapping systems so that a different gesture creates the same kind of a phoneme?

Kat: We can, yes. So we definitely fit, even, let's call it, the standard shape. We fit those on every performer's body because, of course, there are shorter and taller people, longer arms, and you want things to sort of be within reach for every performer. But then, of course, we warp things in expressive ways. So it might be that even on a standard shape, we'll want to have a phrase where the gestures are small because that's what's kind of evocative choreographically in that moment, but then we might have a super long, leggy performer, and we want to get them to stretch, so we might extend the field in which they're operating so that they have to run all the way across the stage to do something, and that'll be a choreographic choice. And that's just on the, I would call it, the standard shape. But then, yes, we also can change how does the voice... We can change the mappings around for the purpose of being expressive.

So if in one shape, let's say, the way you make a voice, the way you even make sound is by bringing your wrists together. Maybe we decide, okay, that's a kind of choreography. But for this other scene, we might want it to be that you have to bring your wrist to your foot, and that's a very different kind of shape, and maybe that's a different kind of character, choreographically speaking. And what's really been cool with working with now a handful of different performers is that each performer finds their own way of doing certain things. There was one rehearsal process where the dancer... Usually the performers wear sensors on two wrists and two ankles, but that doesn't have to be where they put the sensors. So the one performer, she had long hair and she took one of the sensors and wrapped it around her ponytail. So she's having to do these really awkward gestures of bringing her head toward her wrist.

It was funny and expressive, and then it became a whole thing. That was an entire kind of scene that you could build off of these contortions that she had to do to say the same things.

Tjaša: Exactly. I was just thinking about that. It'd be so interesting to bring in a contortionist and basically see a difference between a contortionist and somebody with a, I am going to call it, a regular body, in quotes, using the same system, using the same mapping, and see how it sounds different.

Kat: Yeah, and I think what's really beautiful here is that you can custom fit an instrument like this for non-standard bodies. We don't need for you to have both arms. And if you don't have the use of one arm, we can still create a shape for your body that is still beautiful and interesting.

Tjaša: Yeah, it reminds me of ASL, it reminds me of basically blinking or speaking with your eyes and sounds based on eye tracking. It's almost like a new form of coded speech.

Kat: It's interesting to think about ASL. I mean, I am not trying to do ASL, right? I think that that's a different endeavor, but it's interesting to think about how ASL is really... When you think about it's a kind of pictorial language. One gesture means an entire word or an entire concept, but here, every gesture is related to phonetic content, so you're literally spelling out or sounding out everything. So yeah, it's like the difference between a phonetic alphabet and a pictorial alphabet, which is kind of cool.

Tjaša: Thanks for that. I think that's really essential and that's beautiful. I was also thinking, how interesting would it be to place these sensors on people who do Tai Chi or Qigong, or people who do magical gestures of Carlos Castaneda? Can we decode what all these things, these movements mean? What meanings are decoded into these movements? Then, of course, you have a huge question of what language are we even using? Can we approach whatever they spoke when Qigong and Tai Chi were made? But it's an interesting attempt. It's almost like sound archeology.

Kat: Yeah, you can...

Tjaša: It's okay. If you start that business, you can credit me.

Think about how cool it is to have a performer who's speaking something with their mouth and saying something else with their body.

Kat: Sure. Yeah. I think that's why I am going to be working with these ideas for a while because there's so many ways to really delve into what this instrument is doing. I feel like this will be a couple of performances and a couple of very different expressive explorations, and one of them is to start with a known movement vocabulary, a martial art, and then really build a kind of subtext, a kind of secret language that that movement world encapsulates. We're just at the beginning, let's put it this way. So it's very cool to think about how one could talk about subtext or use subtext. I'm East European, so I'm very much about subtext. Everything has subtext. But think about how cool it is to have a performer speaking something with their mouth and saying something else with their body. That already is very exciting.

Tjaša: If you wanted to decode the ancient gestures of Tai Chi or magical gestures of Carlos Castaneda, I guess you would kind of have to go, okay, so what's the language we're working on? And then the question is, how do we map your BodyMouth, your computer generator, to someone's embodiment or sensation and their own perception of which sound lives where? Because I think that's also a little bit of a secret sauce. Because otherwise, maybe it can be pretty arbitrary, and that's totally fine for any of these kind of experiments and playing around, but if we actually try to decode something that's ancient and we assume it has meaning, then we would have to approach it and try to go under its skin and find the people who speak it and their own self-awareness and perception, perception of the body and of the language.

Kat: The thing that I've understood very deeply by having done this project is the extent to which your own voice or voice in general is so intimate, your own attachment to your own voice. We have all these uncanny feelings about hearing our own voice recorded because we have such a very different perception of our own voice inside, resonating inside our own heads, cavity, but then just emotionally, human voice is so imbued with empathetic response. When I hear another human's voice, I immediately respond empathetically to the language they're producing. And so we have this intimacy to voices. And when I first started, I thought, oh, well, if I take the voice out of the mouth and put it elsewhere on the body, that would create a kind of distance. And actually, that turned out to not be the case because you know what? Bodies are intimate. The minute you put anything on a body, you're making a political statement, you're making a social statement.

All of our gestural world, how we move literally is also intensely intimate. And so all you're really doing, you're not removing the intimacy there, you're just sort of discombobulating the expectations of what a voice does and what a body can do. And by that discombobulation, you're forcing both yourself, the performer, but also the audience, to think about voice and bodies even more attentively than we might normally. So that's been my big learning here, that what you're really doing here, even though it's via technology, you're actually really triggering this empathetic response to something that we understand is deeply intimate.

Tjaša: Yeah, I love it. I love it. Like you're saying, these technologies are only enabling us to find a different vantage point, to perceive again, something that's a part of us from a different point of view and learn more about it. And I think that's really, really valuable and a huge and important component to be communicated through your work.

Kat: And I think what I would love next to happen is to start talking to people who are in the sphere of neurology, to actually, even for myself, to understand a little bit more deeply about what am I doing when I'm using this instrument that is so unusual? Because I think you're asking a performer to do something unusual. You're asking a dancer to listen to themselves move in a really specific way, and I think there's some neurological process being engaged here that wouldn't normally be engaged, and I'm very curious what the implications of that are. And I don't know that I know myself, so I would say we're having this conversation at a moment where I feel like I still have so much more to discover.

Tjaša: Yeah, that's a beautiful moment to be in. I love asking all these questions. Synesthesia is when different sensors on your body or senses are connected, right? So yeah, I don't know. I think it'd be so interesting to have a longitudinal study with these dancers that, do a fair amount of this and see how all of a sudden when they start dancing and moving or just walking down the street, they still feel connected to the voice production. What are they creating as they move through the world? And I think that's really interesting for them, but it's also really interesting for us because as we move through space, we're shifting time-space fabric, the continuum. And the movement itself does make some kind of a sound. We're just so used to it that we're not able to perceive it.

Kat: Yeah. Let me just tell you that attaching anything to a dancer that produces any sort of sound is deeply rewarding for them. The minute we started working with pitch and they were like, "Oh my God, I could, aah, ah, aah, ah." Moving through pitches by raising a leg up. That was a very exciting moment for the dancers.

Tjaša: Yeah. Yeah, yeah. I bet. The question begs itself, but did you use any AI and how did you use it if you used it?

Kat: So we didn't, actually. It's computation, but it's not AI in the way that AI is used currently. So it's not generative, right? We're doing mathematics via software. I can see some ways to begin now that we have a kind of basis of this instrument, how to begin to integrate AI in it. Probably for what I was talking about earlier, which is if I would like to start being more sophisticated in how I change the timbre of a voice, it might be that I don't need to do all that mathematics. I might be able to use AI to work on voice quality and timbre. It would be really cool to be able to get a voice that is known, to actually get... There are voice clones, right? So it would be...

Tjaša: Oprah Winfrey.

Kat: Yeah. It would be really great to be refined enough, nuanced enough that you could have an already cloned voice and have that be the voice of the instrument so that you're really speaking as a specific voice.

Tjaša: Oh my God. Yeah. And again, art as a protest or art as political action because choosing a particular voice could be significant.

There's no such thing as a neutral voice. That's been a thing. Realizing the minute, even how you're picking pitches, is cultural and aesthetic.

Kat: Yeah. That has repercussions. Yeah. So this has also been something I've thought about. Right now, these generative... Okay, not generative, but these synthesized voices, that we are producing at least, they're fairly, I wouldn't say... So there's no such thing as a neutral voice. That's been a thing. Realizing the minute, even how you're picking pitches, is cultural and aesthetic, and all those cultural and aesthetic choices have... You have to draw from something, so there's nothing like a neutral voice. If you have a high voice, it reads as feminine. If you have a low voice, it reads as masculine. That's already political. If you are creating certain kinds of Western harmonies, that's a choice. I am attracted to Eastern harmonies, which are different, and they evolve... Historically, harmonies coming from Eastern Europe have been treated in the West as though they're like the weird harmonies. Whenever you want to have a weird, bad character, it's always like these little dissonant harmonies that are East European that are used in musical scores.

So there's always been a bit of an othering of eastern harmonies or harmonic worlds. So making choices like, "Well, I'm just going to center East European harmonies” is already a political and a cultural choice.

Tjaša: Everything's political, and every single culture has a slant, an opinion.

Kat: Yep. Yep. Yeah.

Tjaša: Great. This was awesome. Thank you so much for your time. Thank you for your wisdom, takeaways, and sharing about your journey, and I'm excited to see and hear more about how it goes and what happens next.

Kat: Thank you so much. It's been a wonderful conversation.

Tjaša: This podcast is produced as a contribution to HowlRound Theatre Commons. You can find more episodes of the show and other HowlRound shows wherever you find podcasts. If you love this podcast, I sure hope you did, post a rating and write a review on those platforms. This helps other people find us. If you're looking for more progressive and disruptive content, visit howlround.com. Thanks for listening, and have an amazing day.

Like what you're reading? Join our mailing list!

Comments

Add comment

The article is just the start of the conversation—we want to know what you think about this subject, too! HowlRound is a space for knowledge-sharing, and we welcome spirited, thoughtful, and on-topic dialogue. Find our full comments policy here.

Newest First

More Like This

A promotional graphic for Theatre Tech Talks.

Podcast

Bringing Theatre to the Virtual Town Square

18 March 2025

Creative director and choreographer Brandon Powers takes host Tjaša Ferme on a deep exploration of the merging of extended reality (XR) with theatre. He explains how theatremakers’ knowledge as spatial creatures is exactly what the virtual reality (VR) world is looking for.

Essay

“Everybody Arts,” or How We’re Scaling Arts Practice to Community

27 February 2025

kara lynch and Seema Sueko use their own artistry as a jumping off point for a conversation about methodologies for creation informed by consensus, alternative economies, community organizing, and more.

Podcast

Aerial Performance in a Wheelchair

7 March 2024

Disabled choreographer, dancer, designer, engineer, and founding member of Kinetic Light Laurel Lawson talks about performing aerially in a wheelchair, accessibility as its own artform rather than an add-on, and their app Audimance which includes haptic interpretation and sensory modulation.