SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation

Brade, Stephen; Anderson, Sam; Kumar, Rithesh; Jin, Zeyu; Truong, Anh

dc.contributor.author	Brade, Stephen
dc.contributor.author	Anderson, Sam
dc.contributor.author	Kumar, Rithesh
dc.contributor.author	Jin, Zeyu
dc.contributor.author	Truong, Anh
dc.date.accessioned	2025-09-19T17:51:07Z
dc.date.available	2025-09-19T17:51:07Z
dc.date.issued	2025-04-25
dc.identifier.isbn	979-8-4007-1394-1
dc.identifier.uri	https://hdl.handle.net/1721.1/162765
dc.description	CHI ’25, Yokohama, Japan	en_US
dc.description.abstract	Novice content creators often invest significant time recording expressive speech for social media videos. While recent advancements in text-to-speech (TTS) technology can generate highly realistic speech in various languages and accents, many struggle with unintuitive or overly granular TTS interfaces. We propose simplifying TTS generation by allowing users to specify high-level context alongside their script. Our Wizard-of-Oz system, SpeakEasy, leverages user-provided context to inform and influence TTS output, enabling iterative refinement with high-level feedback. This approach was informed by two 8-subject formative studies: one examining content creators’ experiences with TTS, and the other drawing on effective strategies from voice actors. Our evaluation shows that participants using SpeakEasy were more successful in generating performances matching their personal standards, without requiring significantly more effort than leading industry interfaces.	en_US
dc.publisher	ACM\|CHI Conference on Human Factors in Computing Systems	en_US
dc.relation.isversionof	https://doi.org/10.1145/3706598.3714263	en_US
dc.rights	Creative Commons Attribution-Noncommercial-ShareAlike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation	en_US
dc.type	Article	en_US
dc.identifier.citation	Stephen Brade, Sam Anderson, Rithesh Kumar, Zeyu Jin, and Anh Truong. 2025. SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI '25). Association for Computing Machinery, New York, NY, USA, Article 756, 1–19.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2025-08-01T08:16:38Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2025-08-01T08:16:39Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3706598.3714263.pdf
Size:: 2.969Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record