SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation

Brade, Stephen; Anderson, Sam; Kumar, Rithesh; Jin, Zeyu; Truong, Anh

Author(s)

Brade, Stephen; Anderson, Sam; Kumar, Rithesh; Jin, Zeyu; Truong, Anh

Download3706598.3714263.pdf (2.969Mb)

Publisher with Creative Commons License

Terms of use

Creative Commons Attribution-Noncommercial-ShareAlike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

Novice content creators often invest significant time recording expressive speech for social media videos. While recent advancements in text-to-speech (TTS) technology can generate highly realistic speech in various languages and accents, many struggle with unintuitive or overly granular TTS interfaces. We propose simplifying TTS generation by allowing users to specify high-level context alongside their script. Our Wizard-of-Oz system, SpeakEasy, leverages user-provided context to inform and influence TTS output, enabling iterative refinement with high-level feedback. This approach was informed by two 8-subject formative studies: one examining content creators’ experiences with TTS, and the other drawing on effective strategies from voice actors. Our evaluation shows that participants using SpeakEasy were more successful in generating performances matching their personal standards, without requiring significantly more effort than leading industry interfaces.

Description

CHI ’25, Yokohama, Japan

Date issued

2025-04-25

URI

https://hdl.handle.net/1721.1/162765

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

ACM|CHI Conference on Human Factors in Computing Systems

Citation

Stephen Brade, Sam Anderson, Rithesh Kumar, Zeyu Jin, and Anh Truong. 2025. SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI '25). Association for Computing Machinery, New York, NY, USA, Article 756, 1–19.

Version: Final published version

ISBN

979-8-4007-1394-1

Collections

MIT Open Access Articles