Speech, Signal, Symptom: Machine Listening and the Remaking of Psychiatric Assessment by Beth Michelle Semel M.A., Anthropology Brandeis University, 2013 B.A., Writing, Literature, and Publishing Emerson College, 2010 Submitted to the Program in Science, Technology, and Society In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in History, Anthropology, and Science, Technology and Society at the Massachusetts Institute of Technology September 2019 © 2019 Beth Semel. All Rights Reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. ,,Signaturer edacted Signature of Author: Ijifory, Athropolbgy, and Science, Technology and Society August 22, 2019 Signature redacted_____ Certified by: Graham M. Jones Associate Professor of Anthropology Thesis Supervisor Signature redacted Certified by: MASSACHUETTSINS TrUTE1 Stefan Helmreich Y Elting E. Morison Professor of Anthropology 9i Thesis Committee MemberOCT 03201 LIBRARIE -M Signature redacted Certified by: Amy Moran-Thomas Alfred Henry and Jean Morrison Hayes Career Development Assistant Professor of Anthropology Thesis Committee Member Signature redacted Certified by: Heather Paxson William R. Kenan, Jr. Professor of Anthropology Thesis Committee Member Accepted by: Signatureredacted Tanalis Padilla Associate Professor, History Director of Graduate Studies, History, Anthropology, and STS Accepted by: Signatureredacted Jennifer S. Light Professor of Science, Technology, and Society Professor of Urban Studies and Planning Department Head, Program in Science, Technology and Society 3 Speech, Signal, Symptom: Machine Listening and the Remaking of Psychiatric Assessment by Beth Michelle Semel Submitted to the Program in Science, Technology, and Society on August 31, 2019 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in History, Anthropology, and Science, Technology and Society ABSTRACT This multi-sited, ethnographic dissertation follows teams of psychiatric and engineering professionals collaborating to tackle one of Western psychiatry's longest standing issues: the subjective nature of mental illness. Situated at three different U.S.-based universities, the teams are driven by a conviction that conventional methods of psychiatric screening are fallible if not altogether inaccurate, since they depend upon a mental health care worker's ability to interpret the semantic content of a patient's speech. Through research studies involving human subjects, the teams hope to develop more biologically based and resource-efficient screening techniques that instead analyze paralinguistic, acoustic components of speech-such as pitch, speaking rate, and breathiness-which they argue are more directly linked to the internal mechanisms that drive mental illness. By turning to the expertise of computer scientists and engineers, they seek to build "machine listening" prototypes for psychiatric assessment: technologies that use a microphone to capture sound and artificial intelligence (AI) to analyze sound. While their studies are premised on the notion that AI can listen beyond the human by attending to sounds of speech that have psychopathological significance supposedly set aside from linguistic meaning and human difference, in order to gather and classify the data necessary for building their technologies, researchers must rely on the very components of language that they seek to overcome: its interactional, sociocultural dimensions. I show how the connections between spoken utterances and inner states that researchers design their systems to make "autonomously" depend upon a tightly managed but oftentimes hidden infrastructure of human labor, including the labor of research subjects. The division of labor within the teams replicates hierarchies of value within mental health care professions, which place diagnosis and treatment at the top as expert, biomedically and legally ratified forms of judgment, and place the data entry and triage work of assessment at the bottom, as skilless, para-professional, and mechanized tasks. In describing the vexed status and ethics of listening, language, labor, and care in contemporary U.S. mental health care, the dissertation tells a larger story about the stakes of framing mental illness as a scientific, bureaucratic problem calling for a technological intervention. Thesis supervisor: Graham M. Jones Title: Associate Professor of Anthropology, Margaret MacVicar Faculty Fellow 4 Table of Contents A cknow ledgm ents........................................................................................ . 5 Introduction ............................................................................................... . 7 C hapter 1 ............................................................................................... . . 7 1 ComputationalP sychiatry's Coded Past C hapter 2 ............................................................................................. . . 124 Talking Heads: Brains, Bodies, and Vocal Biomarkers C h apt er 3 ..................................................................................................... 18 8 Do Androids Dream ofElectric Speech? C h apter 4 ..................................................................................................... 2 53 Listening Like a Computer C onclusion ............................................................................................ . 324 An Ironic Dream ofa Common Language 5 Acknowledgments Dissertations are truly collaborative documents, and many people have labored with and alongside me to bring this particular document into the world. First, I acknowledge the funding that made this project possible. Then, I acknowledge the friendship. Research for this dissertation was supported by a Society for Psychological Society/Robert Lemelson Foundation Fellowship in 2015; a Dissertation Fieldwork Grant from the Wenner-Gren Foundation in 2016; and a Doctoral Dissertation Research Improvement Grant from the Cultural Anthropology Division of the National Science Foundation in 2016. Special thanks to Jeffery Mantz for his help and encouragement throughout my fieldwork years. Writing for this dissertation was supported by a Weatherhead Fellowship from the School for Advanced Research in Santa Fe, New Mexico, where I was in residence from 2018 to 2019. Immeasurable thanks are due to my ethnographic interlocutors, whose trust, friendship, conversations, and insight form the basis of this dissertation. I would not have been able to critically read their technologies-and the nature of our collaboration-without their guidance and teachings. To my game-changers and confidants: thank you for giving me the honor of uttering your own critiques for you. Thank you for helping me feel at home while also challenging me to question my surroundings. During my undergraduate years, Roy Kamada and Murray Schwartz went out of their way to mentor me and offer up their precious time to review my (at times overly) ambitious writing. Roy taught me how to write an abstract, and how to love reading in a new way. Murray introduced me to psychoanalytic theory. It was in meetings with both of them after graduating college that I dreamt up the idea to pursue anthropology. I completed an MA in Anthropology at Brandeis University, and several faculty members deserve thanks for training me and building me up into the scholar and thinker that I am today, particularly Elizabeth Ferry, Anita Hannig, Janet McIntosh, and Richard Parmentier. The friendship of Katherine Morely Eramo and Olivia Spaletta likewise played an important role in my time at Brandeis. Katherine's support and encouragement sustains me still. To use my friend Danielle's phrase, thank you to everyone who has continually sent me postcards from the outside world in the years leading up to and during my PhD at MIT: helping me to get outside of my own head and enjoy the world around me, including and especially their company. I thank my cohort-mates, Richard Fadok, Clare Kim, Lauren Kapsalakis, Alison Laurence, and Peter Oviatt, for establishing an atmosphere of collaboration, candor, and compassion from the start. I feel grateful to have made this journey alongside you all. Beyond my cohort, at HASTS, I thank Marc Aidinoff, Rende Marie Blackburn, Ashawari Chaudhuri, Grace Kim, Steve Gonzalez, Shreeharsh Kelkar, Crystal Lee, Jia Hui Lee, Lucas Mueller, Canay Ozden-Schilling, Tom Ozden-Schiling, Luisa Reis Castro, Elena Sobrino, Mitali Thakor, and Claire Webb. Elena and Grace's names are worth repeating-both came to my aid during medical emergencies. Outside of MIT, I have made several colleagues and friends along the way whose thinking and influence is palpable in these pages. I thank especially Marisa Brandt, Danielle Judith Carr, Anar Parikh, Nick Seaver, and Luke Stark. In New Mexico, my fellow fellows and the interns at SAR-scholars, activists, artists, mentors, and friends-made me laugh, made me feel loved, and helped me to find the spirit to keep writing. I thank all of them for sharing their ghost stories, jokes, recipes, scholarship, organizing work, and lives with me during our 9 months of residency: John Arroyo, Monika Banach, Gio B'atz', Ixq'anil Banach- 6 B'atz', William Calvo, Nick Estes, Mayanthi Fernando, Felica Garcia, Frida Garcia, Terran Last Gun, Samantha Tracy, Melanie Yazzie, and Wilma Yazzie-Estes. Several friends and loved ones have been by my side since long before I began studying anthropology. I thank them for helping me to face the world with openness and excitement, and to imagine other, possible futures for myself and for so many others. These people include Meryl Bennett, Kayla, Hillary, and Laurie Fortin, Hannah Nyren, Claudia Kretschmer, Melissa Siebert, Alexandra Tate, Chelsea Thomas, and Tau Zaman. Alexander Kranzusch produced the illustrations used to describe the technologies discussed in Chapter 3. The mentorship of Graham Jones at MIT has fueled me throughout the six years of my PhD. Graham's intellectual creativity, his kindness, his encouragement, and our marathon meetings and phone calls have kept me afloat and have shaped this project in invaluable ways. Likewise, I thank my committee for their scholarship, their input, and for their generosity and openness: Stefan Helmreich, Heather Paxson, and Amy Moran-Thomas. Several other faculty members at MIT played a significant role throughout the PhD, in terms of finessing my project and in producing scholarship that informs my own, including Dwai Banerjee, Erica James, and Robin Wolfe Scheffler. The development of this project has also benefited immensely from generous, generative conversations (in person or otherwise) with several scholars whose work has also inspired this dissertation, including Felicity Aulino, Nick Harkness, Matthew Hull, Alaina Lemon, Michael Lempert, Natasha Schall, and Jason Throop. I had the honor of briefly meeting Chuck Goodwin, whose 1994 article, "Professional Vision," changed the course of my thinking when I read it as an MA student. We spent the day talking and walking together as if we had known each other for years. His immediate faith in and excitement over my project meant the world to me. His passing during the writing of this dissertation impacted me deeply. Much of my dissertation is aimed at honoring work that is vital to the production of scientific knowledge but often goes unnoticed: namely, the work of administrative laborers. I would be remiss, then, to miss the opportunity to thank the various administrative workers who have held things together for me. Thank you to Karen Gardener, an advocate and a friend, for the many big and smalls ways in which you helped me complete the PhD. Many thanks as well to Carolyn Carson in STS, and to Irene Hartford, Barbara Keller, and Amberly Steward in Anthropology. I thank my family for their unending patience and their unending care, in both its real and para-forms. This includes my cat, Millie Semel, for keeping me company during late nights and early mornings of writing and reading. I thank my twin sister, Sarah, for giving me a first-hand experience in theorizing resemblance, and her partner, Sam Levine. I thank my older sister, Hillary, for her artful eye and caring heart, and for pushing me to be the best teacher I can be. To my parents, Donna and Scott Semel, I truly owe everything. My mother, a former speech therapist, taught me how to listen. My father, a lawyer, taught me how to make a good argument and how to love sci-fi. With love and gratitude, I also thank my aunt, Lisa Semel, and her husband, Jonathan Guthart, along with my cousins: Scott, Mercedes, Amanda, and Brandon Holtzman, and Eileen, Jake, and Sara Wasserman. My grandmother, Joan Semel, passed away before I could complete this dissertation. She often bragged that I was going to become the first doctor in the Semel family. I hope to continue to make her proud. Last but certainly not the least, it is difficult to find the words to adequately thank my partner, Ryo Morimoto, for nurturing my ideas and nurturing me, for cheering me on when I needed it the most and when I didn't know I needed it at all. You are my biggest inspiration, my favorite thinker, and my favorite person. 7 INTRODUCTION "'Yes,' said Steamer [...]'we have great plans to use information theory to augment psychiatry. I'm sure you know that the tone of peoples' voices tells a listener a great deal about their emotional state. We have recorded some speech from a psychoanalytic interview and by infinite clipping have been able to remove all the emotional content. By processing what we remove, we expect to be able to identify those characteristics that carry the emotional information.' [...] 'What are you going to do now?' asked George. 'We're going to more sophisticated processing, but we're still looking for ideas,' said Steamer. 'Do you think I could get thesis out of this work?' 'Of course! This stuff really strikes people's imagination. We have working arrangements with several psychiatrists in town, and you could help them in unraveling this business.' 'I would think,' said George, 'that one should know something about the nature of speech before taking on such a project.' 'Maybe so,' said Steamer, 'but remember you're an engineer, not a phonetician or a linguist. You could attack the problem from an engineering viewpoint."' - (David, E.E. Jr. 1962. "Bionics or Electrology? An Introduction to the Sensory Information Processing Issue." Pp. 74) It is early in September of 2015, and I am sitting in a chromatic colored conference hall on the top floor of MIT's Media Lab among rows and rows of folding chairs. One of the room's giant windows offers a view of the Charles River, and across the water, the tops of skyscrapers glint in the morning sun as conference attendees file into the room. Students in flip-flops and cargo shorts share elbow space with technology company executives and start-up employees in business suits and blazers, mostly from the Boston area but some from as far as South Korea. We're gathering in this grey and black room, after having picked up our nametags and a bag of promotional gifts and pamphlets, to listen to the same thing: the opening plenary of the first-ever Emotion and Artificial Intelligence (AI) Summit, sponsored by a company called Affectiva. Affectiva was born out of collaboration between a former MIT Media Lab student and Rosalind Picard, a computer scientist responsible for establishing the field of "affective computing" who runs a research group of the same name. Affective computing is dedicated to building computers and algorithmic systems that can interpret and respond to displays of human 8 emotion (Picard 1995; 1997; 2003). Many consider Professor Picard a pioneer for insisting that emotion is not opposed to reason and instead plays a key role in the "intelligence" that computer scientists seek to replicate in technologies meant to aid and assist in human activities. Affectiva wraps principles of affective computing into the development of software packages that offer, as their company website states, "insight into unfiltered consumer responses to ads, videos, and TV programming," and, most recently, responses to the user interfaces in autonomous vehicles. The cover of Affectiva's promotional pamphlet shows a photograph of a Black woman smiling, framed by a yellow square to indicate that Affectiva is analyzing her resplendent face and capturing proof that whatever product she is viewing brings her great joy. Affectiva specializes in automated image recognition: their software packages rely on a camera to pick up small movements in people's facial musculature as they watch a commercial, interact with a product, or sit at the wheel of a self-driving car. An algorithm-"a sequence of computational steps that transforms the input into an output" (Cormen et al 2009: 5)-calculates the statistical relationship between the movements of the user's facial face and entries in a database of facial expressions that a human has labeled with an emotion, drawn from a set list of possible emotions that another person has assembled. By Affectiva's definition, this means that their products can autonomously "recognize" human emotions. According to Affectiva, the goal of the Summit is to explore "how Emotion Al can move us to deeper connections with technology, with business and with the people we care about." The summit's mix of commercial, corporate, and academic audience members is a familiar one for me. I've just returned from my fieldwork with groups of psychiatric and engineering professionals collaborating to build voice analysis technologies for psychiatric screening, and I attended similar workshops and conferences during my twelve months of sustained participant- 9 observation. It wasn't until the opening plenary began that I realized just how close the Summit, and Affectiva, would come to my fieldwork. The company's founder takes to the stage to announce the reveal of a project that has been years in the making, one that they would be demonstrating live for the first time ever, live: voice analysis. Affectiva hopes to use voice analysis to strengthen their existing prototypes. At the podium, the head of the company explains that the voice adds another layer of emotional data, rich with information about a consumer's response to the world. I have encountered a variety of other voice analysis and detection systems throughout my fieldwork. The designers, funders, makers and stewards of these systems build them to recognize vocally expressed interior states based on how a person sounds rather than the content of what they say. For the demonstration, the head of the company calls her colleague to the podium, and he joins her to narrate a story about attempting to cook a turkey for Christmas dinner, their faces projected on a multitude of screens hung throughout the room so that all in the audience can witness the technology at work. Their faces are also framed by tiny square outlines like the woman in the promotional material, but instead of yellow, the squares are pink and blue: blue for the male speaker, and pink for the head of the company, a woman, apparently indicating that the software can also detect gender. The story is banal, lighthearted, with a twist: the turkey explodes in the oven, startling everyone in the house, especially the cook. As he speaks and the head of the company listens, small script letters appear on the screen next to their faces, emotion words that more or less coincide with the tragi-comedy arc of the turkey tale: happiness, humor, surprise, a flash of humiliation, happiness. These adjectives and their immediate appearance as the story progresses are meant to indicate the prototype at work. Together, the demo gives the impression 10 of immediacy, that their states are known and displayed in real time, almost as if they are being directly translated from words tofeelings, as if insides have been turned out. It is a convincing and persuasive demonstration. But like many of the other demos I have witnessed, it comes without a discussion or explanation of how it all works. What the demo shows-that Affectiva's new prototype could recognize how the man sounded, tracing the emotional contours of his voice as he told the story-is the punch line, the self-evident point of the drama. As Lucy Suchman notes, the demo is a distinct genre of performance in the human- computer interaction (HCI) sector. "Like other conventional documentary productions," she writes, "these representations are framed and narrated and instruct the viewer in what to see" or to hear (2007: 237-238). The demo is one of many rhetorical devices that threads together the analogy between the computational process underlying automated technologies and human behavior, supporting the human-likeness of the technology while strategically leaving out the humans whose judgment, sensing, and choices enabled its functionality. In the process of building human-like technologies-like software that can detect the emotional texture of a person's voice, only better, faster, and more accurately than a human ever could-technologists and their collaborators must articulate and concretize their ideas about what it means to be human. From my vantage point in the audience come a series of questions that I have only learned to ask from the people I have worked among an studied-my ethnographic interlocutors-after having observed and assisted them with building voice analysis technologies for psychiatric screening in the context of academic studies, and after having stood in an exhibition hall alongside them to give strategic performances of our own, showcasing all the prowess-and none of the pitfalls-of their technological prototypes. What dataset did they use S1I to train the algorithm-to determine what counts as a happy sounding voice, a surprised sounding voice, a humiliated sounding voice? Did they build their own corpus, asking paid volunteers to emote vocally, and then have another paid volunteer listen to and label excerpts of this emotive speech? Or did they use one of the many pre-existing "emotional" speech corpuses that have already been labeled, and that usually consist of an actor performing an emotion? Did their dataset only consist of speakers of American English-could the software work just as smoothly and convincingly with a person who did not speak English as their first language? What exactly about the speakers' bodies and voices put them in the categories of "male" or "female"? Who made the call as to what counts as "male" or "female" facial features or vocal qualities? How might the system respond to speakers who do not live inside neat, bounded, binary boxes of gender, like trans or gender non-conforming people, or anyone else who stands outside of what MIT critical computer scientist Joy Buolamwini (2016) might call Affectiva's "coded gaze," beyond the "embedded views [and voices] that are propagated by those who have the power to code systems"? By the afternoon, my questions are still unanswered and I'm beginning to grow sleepy. I try to keep myself alert with a complimentary bar of chocolate laced with espresso beans, wrapped in sky blue paper featuring a drowsy-faced emoticon that declares AWAKE in block letters. I want to keep my eyes-and ears-open for a panel I've been anticipating, entitled "The Future of Al: Ethics, Morality, and the Work Force." For this panel, the moderator presents panelists with a series of ethical conundrums, asking them to explore how these speculative fictions relate to problems that Al and computational technologies currently present. The first ethical scenario takes the form of a trolley problem, a classic thought experiment in ethics that presents the audience with a choice over whose lives to sacrifice to the path of a runaway trolley 12 throttling down one of two possible paths. In this story, an autonomous vehicle with a driver asleep behind the wheel and careening toward a girl chasing a ball takes the place of the trolley. It is only when the moderator begins reading the second scenario that I snap to attention, sitting straight up in my folding chair for a story that is both familiar yet strange: The year is 2024. The country's last 50 remaining truck drivers converge on Washington to protest the loss of theirjobs to robots. They block Connecticut Avenue and they drive their rigs onto the mall, where they arejoinedb y a small army of unemployed accountants, nurses, adjunctp rofessors, Wall Street analysts andjournalists,a ll of whom have been put out of work by Al. The Washington PoliceD epartment considers sending robo-cops or high-pressure hoses to disperse the protesters, but in an uncharacteristic act of compassion, instead send in a phalanx of mental health clinicians, especially trained to be sympathetic to people in distress. The clinicians, of course, are robots. Al's supposed to make as much as 38% of the U.S. workforce obsolete within the next coming decades. Is this outcome inevitable, and if not, how do we prevent it? On the surface, the conundrum involves the impending threat of job loss due to advancements in Al that enable machines to perform tasks that humans used to, like gauging a patient's blood pressure, turning on a high-pressure hose, or delivering the nightly news. The panelists' answers both lean into and contest that fear, offering perspectives that jump across the spectrum of perilous evil and potential good. One panelist argues that human obsolesce is the inevitable conclusion of a society under capitalism. When the bottom line is the expansion of production in pursuit of economic growth, he quips, it is only a matter of time before employers dispose of human labor in favor of more cost efficient automated labor that does not require bathroom breaks, cannot become pregnant or injured, and will never ask for higher wages. On the more optimistic side, another panelist insists that emotional Al can help democratize access to psychiatric care. Where she grew up in the Middle East, there are more patients than care providers, and even the most sympathetic and hardworking of nurses become burnt-out, hardened, and jaded. What if we had nurses or clinicians, she wonders, who oversaw 10 or 20 or 100 mental health robots or avatars or virtual assistants? They could use an interface 13 to control these human-like technologies from afar to conduct triage work, helping the human doctors manage their caseload by determining which patients are in direst need of care. The human nurse, she muses, would only get tapped if the system says it's a big deal-otherwise, the avatar takes care of the patient. The ironic denouement of the story is that the clinicians are not human, suggesting a future in which even therapy-the provision of sympathetic, psychiatric care-can be mimicked and performed by a human-like machine. The slight gasp from the audience and the subtle smile of the panelists indicate the story's success. The plot twist plays with the figuration of the machine as a foil for the human and operates through a time worn Cartesian binary: humans possess the spark of spirit, of the psyche, and therefore, the capacity for intersubjectivity, whereas machines are inert matter, unaware and lacking consciousness. How might it be possible for a robot to quell a political uprising, sooth the angry masses, or mend the wounded psyches of people whose professions are no longer valued? Nevertheless, what stuck out to me about this story, and what set me sitting rigid in my folding chair, was not the robots, but the absence of a particular set of humans from the fabulated, dystopian protest. Nurses-administrative healthcare workers-careen down Connecticut Ave., alongside truck drivers, journalists, and adjuncts, a brigade of professionals that the McKinsey Global Institute projected in 2017 to be the most at-risk for eventual job replacement due to automation in the United States (Manyika et al 2017). But if the clinicians administering sympathetic care are, "of course," robots, then where are the human clinicians? If the robots have been deployed to act like clinicians, then why are the clinicians not protesting? 14 As I will show in this dissertation, the answer to the question relates to the demonstration that came before it: automated speech analysis, especially speech analysis in the context of psychiatric encounters between patients and care providers. These two moments at the Summit connect Affectiva, and the broader contemporary milieu in the U.S. that is replete with machine listening technologies, together with my informants, and with the technological prototypes, sociotechnical imaginaries (Jasanoff and Kim 2015), and ideas about speech, mind, and self that my interlocutors pursue at one turn and contest at another. The two moments represent the major topics that I treat in this dissertation: language, listening, labor and care. My interlocutors' research projects are sites through which these topics intersect. My aim is to sketch out a theoretical framework for thinking through this critical nexus. "DO YOU THINK I COULD GET A THESIS OUT OF THIS WORK?" I begin with Affectiva both because its approach is so different from those pursued by the groups with whom I conducted fieldwork, but also because of the striking similarities they exhibit. These similarities and differences help to illustrate the scope of my fieldwork, the stakes of the research projects it focuses on, and the larger, theoretical themes that studying these projects ethnographically has led me toward. Conducted between 2015 and 2017, my fieldwork followed three interlinked, interdisciplinary research teams of psychiatric and engineering professionals at three different U.S.-based universities collaborating to develop automated listening technologies for psychiatric assessment, sometimes referred to as psychiatric screening. This included a twelve-month span of consecutive fieldwork, during which I spent four months at each of the three universities, 15 working as a research assistant on the teams, conducting interviews, participating in and observing activities that spanned the research and development pipeline while also attending weekly group meetings, courses, conferences, workshop, and symposia with my informants. I played a hands-on role in, among other things, developing experimental stimuli used in the study (even co-writing and acting in a film), creating training manuals and leading training sessions to incoming lab members, revising grant proposals and article drafts. Spread across the United States, the teams at East Coast University (ECU), West Coast University (WCU) and Midwestern University (MWU)' are all part of the same network. They have shared academic pedigrees and individual team members know each other. Some have trained together under the same supervisors, who are also on the teams, and they run into each other at conferences if they are not already speaking on the same panel together. All three teams are working on technologies like cell phone applications and software packages that can be installed in variety of user interfaces, including humanoid robots. While each team focuses on a different diagnostic category-depression (ECU), post-traumatic stress disorder (WCU), and bipolar disorder (MWU)-they share an overarching goal. The teams design their devices in order to connect the sounds of speech with what they take to be people's inner, psychological states, states they hope to access by attending only to the acoustic qualities of speech, rather than its linguistic or semantic meaning. They strive for their technologies to only analyze what are ' I use pseudonyms throughout the dissertation to refer to institutions, people, and technologies. The use of pseudonyms is a common practice in anthropology, employed to respect and protect the privacy and anonymity of their research subjects. This is especially important given the fact that many of my interlocutors expressed critical opinions and attitudes toward their research projects, even as they worked within them. Anthropologists studying science and technology have taken up the convention of anonymizing the names of graduate students as opposed to PIs, given the precarious nature of graduate students' position within the academy, and given that the Pis of the studies tend to be big names and prime movers in their fields that would be pointless to try and anonymize (see Gusterson 1996; Sundar-Rajan 2006; Roosth 2017). While the PIs with whom I worked will be recognizable to each other-the small, shared space of a sub-sub-subfield-they are not as recognizable to the general public as the PIs in some of these other ethnographic studies. For these reasons, I have chosen to keep them anonymous as well. 16 sometimes called paralinguistic components of speech: e.g., pitch, energy, rate of articulation, breathiness. Their aim is to reconfigure the conventions of a crucial practice in U.S. mental health care: psychiatric assessment. Like the entire enterprise of Western psychiatry itself, psychiatric assessment is suspended at the level of semantic meaning. There are no blood tests for mental illness, no thermometers. There are only conversations. Many of my informants argue that the technologies they are developing to do the sorting work of psychiatric assessment will not only make the process more objective by enabling them to change the way that speech is interpreted, transforming speech from personal narratives to neurobiological signs-using Al (artificial intelligence) to circumvent semantic meaning and overcome all of the subjective (and cultural) things about language. They also argue that it will also save money, time, and save mental health care workers from burning out in emotionally laborious jobs, although quite a few other of my informants are cynical or skeptical about the success of their endeavors, even as they work toward this goal. As noted, unlike Affectiva, the teams are all situated in the academic realm, and they develop their prototypes and research findings in the context of academic studies. Unlike Affectiva, the teams must have their research approved by their universities' Institutional Review Board (IRB), organizational forms that oversee and regulate research conducted with human subjects in accordance with federal standards established in 1981 and revised in 2019. The researchers must adhere to the university IRB's ethical protocols and bureaucratic requirements aimed at ensuring safe and non-coercive informed consent and at protecting the anonymity and confidentiality of research subjects, while minimizing the harm that participation might incur.2 2 45 CFR Part 46. 1981. (HSS and FDA 1981.) https://www.hhs.gov/ohrp/regulations-and- policy/regulations/common-rule/index.html 17 Because Affectiva sells consumer products rather than health care interventions, they have no such oversight with which to contend. Under a neoliberal model of consumer choice, it is up to the user to decide whether or not they agree to Affectiva's Terms of Service, and once they click submit, the user (and their data) is at Affectiva's whim. In a scientific, academic study, the goal is not to sell a product or to grow financially. The goal is to produce knowledge, although knowledge production has a price. Researchers must seek out grant money and other forms of funding to sustain their work, and, as scholars such as Scott Vrecko (2010) have observed, trends and changes in funding institutions shape and transform the path of a team's research and the nature of the facts about mental illness that the studies ultimately produces. At East Coast University, the team relies on federal funding, seeking out and securing grants from federal institutes like the National Science Foundation (NSF) or the National Institute of Health (NIH), focusing their efforts on basic science research aimed at contributing to biomedical understandings of mental illness. At West Coast University, in addition to academic federal institutions, researchers rely on military funding, which is abundant but inconsistent, and the team must petition every year to have their funding renewed. Their prototype must have a dual use component-it should serve military and civilian populations alike. By contrast, the Midwestern University team's primary source of funding is philanthropic. They appeal directly to individual philanthropists and non-profit organizations to keep their research going, and as a result, they focus on building technological prototypes with a societal impact (specifically, improving upon the treatment of mental illness), the success of which can be articulated in non-technical terms. My informant's specific focus on mental health care also sets them apart from Affectiva. Studying mental illness and developing technologies to intervene on one of Western psychiatry's 18 longest standing issues-the subjective nature of psychiatric pathology-means that my interlocutors, unlike the employees and technologists of Affectiva, must confront human suffering, sometimes indirectly, and other times, face-to-face. The people who create and curate the team's database (the corpus of audio recorded speech that form the basis of the teams' algorithmic systems) as well as the intended end-users are a vulnerable population: people who live with and alongside mental illness, either with a formal diagnosis or somewhere at the "subclinical" level, between the cracks and gaps in America's conventional diagnostic, nosological infrastructure. Like others citizen-subjects living under conditions in which the retreat of state-sponsored social services (like health care) force them seek out alternate means through which to access resources to sustain their wellbeing (James 2004; Petryna 2009; Nguyen 2010) the research subjects tend to be vulnerable and disenfranchised in multiple ways. Alongside their mental health issues, research subjects tend to be unemployed, on disability leave, veterans, recovering addicts, or people experiencing homelessness-the kinds of people who have the time to spare during the working hours of the weekdays, who are in need of the money (or other resources, like access to an internet-enabled smart phone) that they can make while participating in the study. Like the feature that Affectiva revealed during the Summit's morning keynote, my informants are trying to build automated technologies that can assess a person's inner state, with an emphasis on pathological affective states, not based on what they say, the semantic content of their utterances, but how they say it-the acoustic, formal properties of their utterances. Across the three teams, the same basic principles and premises support their research. The teams seek to treat speech not as a linguistic practice but as a motor activity. They study speech not as a sociocultural narrative but as a biomechanistic output, as a sound that contains information about 19 the source that created it: the speaker's brain or at least their psychological state3 . Put differently, with reference to composer and theorist Michel Chion's three modes of listening (1990: 23-35), the teams approach psychiatric speech through reduced listening (attending to sound's formal qualities and characteristics) in order to enable causal listening (attending to a sound in order to ascertain its source). In so doing, they seek to press the meaning out of speech, circumventing altogether all of the components of speech that linguistic anthropologist assert make speech interactional and cultural. The overarching aim of their studies is to capture something pre- linguistic about the activity of producing oral speech-something that is universal and grounded in the biological realm, rather than something particular to an individual. The Primary Investigators (PIs) of the teams argue that by changing the way speech in psychiatric settings is listened to and interpreted-with attention placed on sound rather than meaning-will aid in the identification of objective indicators of mental health. At the same time, in order to achieve this scaling feat (moving from the most finite, fine- grained scale of language to the most universal possible scale of human nature), just like Affectiva, the teams need to assemble a data set, and they need to classify items in the data set. These are the prerequisites to enabling an algorithmic system that can calculate the statistical similarity between a known item in the dataset, and some unknown, novel item. In other words, in order for an algorithmic system to "recognize" features of speech associated with psychiatric states, at some point, the creators and stewards of that system need to set parameters and definitions for how these states sound. In the case of my informants, their dataset is comprised of excerpts of research subjects' speech. To build and then classify this data set, the teams must 3 As discussed in Chapter 2, their research is genealogically entangled with the history of telephony and telephone engineering in the United States, which is itself indebted to the development of experimental phonetics and d/Deaf education, both of which birthed the assumption that speech "could be exhaustively investigated as a purely mechanical process" (Mills 2010: 38). 20 make a series of interlinked choices. How will they measure-and define-mental illness, and the three different diagnostic categories on which they will focus? What kind of speech do they want to elicit from research subjects, and how will they elicit it? Will they engage the subject in a conversation, or record a conversation that the subject has with someone else? Once the speech is elicited and recorded, how will they qualify-or quantify-the speech? Whose job will it be to determine how the speech sounds? My dissertation showcases the variety of ways in which each of the teams answers these questions, and the stakes of their answers in regards to ideas about being human, being mentally ill, and language that they reify and reproduce. At the same time, I show that regardless of the many different ways these questions can be answered, the teams all found themselves grappling with the same, fundamental issue. While their eventual goal is to build a system that circumvents the semantic notions of speech, in order to build that system, they must engage with the very sociocultural dimensions of language that they seek to overcome The connections between spoken utterances and inner states that their systems perform "autonomously" and automatically, like the system demonstrated at Affectiva's Summit, depend on a tightly managed infrastructure of human labor that includes both research subjects and members of the research team. I follow Ekbia and Nardi (2017) and other scholars of science and technology in asserting that it is not very productive to think of automation as autonomous, as machines doing things without human intervention. Instead, it is much more productive and indeed much more accurate to discuss automation as heteromation-am ixture of human and machine work. Considering automation as heteromation-as humans doing things with machines, although the humans are not always easy to find-allows us to investigate automation anthropologically, and to investigate why these humans are so difficult to find. As Lilly Irani reminds us, "claims about 21 automation are almost always claims about kinds of people" (2017). Heteromation as an analytic can guide us in pulling apart the seams that suture together automation with categories of the human. To a certain extent, the association between qualities of speech and specific diagnostic categories is a part of professional psychiatric wisdom. The mental health care practitioners who I interviewed as part of my fieldwork - people whose opinions I sought in reflection on my primary informants' technological aspirations - agreed for the most part that "everyone knows" depressed people speak more slowly than the average person, while people experiencing mania or under duress speak more quickly. Indeed, audible changes in vocal qualities (especially changes in the pacing of voice) make up the diagnostic criteria for several categories solidified in the Diagnostic and Statistical Manual of Mental Disorders (DSM), until recently American psychiatry and psychology's authoritative classificatory field guide. Calling upon the tools and techniques of their engineering colleagues, my informants seek to use Al-enabled techniques of pattern recognition to distill this wisdom about the connection between sounds and states, and then to automate it. Their research is motivated by epistemological, public health, and personal concerns alike. Like the panelist at the Summit, some of my informants believed an automated triage system could lighten the load of an over-burdened care system in which a handful of practitioners juggle caseloads that number in the hundreds. Many of them expressed a genuine desire to help mental health care workers and their patients, oftentimes because they had suffered, or a friend had taken their own life, or their father had been sick for years, or their brother had to live within the walls of a psychiatric institution. They wanted to offer a hands-on and actionable solution to a heavy, structural issue-the inaccessibility of mental health care with the tools and the disciplinary angle that they knew. 22 As the epigraph with which I opened this chapter-a passage of a 1962 article written by a senior member of the Institute of Radio Engineers (IRE)-testifies, just as the idea that "the tone of peoples' voices tells a listener a great deal about their emotional state" is not novel, neither is the notion that the "emotional content" of speech can be separated out, processed, and distilled into information. In David's article, the author tells the fictional story of George Lance, a graduate student in electrical engineering on the lookout for a doctoral thesis and a way to combine his love for electronics with his interest in biology. His laboratory director refers him to Professor Pseudomorph, the head of the new, interdisciplinary Psycho-Systems Information Center. Pseudomorph waxes with prescient poetics about "the implications of perceiving machines, which can pack the learning of millions of millions of lifetimes into only a few hours," pining for the day "when all the really important decisions will be made automatically by a machine with remote sensors to sample the world" (74). The first of his colleagues that Pseudomorph passes Lance along to is Dr. Steamer, head of the Neuro-psychiatric Information Group, who tells Lance about his collaboration with psychiatrists. Dr. Steamer's research closely resembles my interlocutors', and he says something to Lance that many of my interlocutors often argued: to conduct this research, expertise in language-or even psychiatry-is not a necessary requirement. With "an engineering point of view" that takes all components of human life to be governed by the same essential principles that can be described mathematically, psychiatry and linguistics are merely domains of knowledge that can be read about in a book or an article, and then concretized in the algorithmic system they build. Disciplinary expertise is a feature on which to train a system. David's 1962 article is a parody; the author is displeased with the "hoopla" born through the "cross-fertilization of engineering and the life sciences" (David 1962: 75). Whether or not the 23 melding together of psychiatry and engineering indeed produces hoopla, it is as old as the professionalization of engineering itself4 and these days, the cross-fertilization is more common than ever. Collaborations like the ones I study, among psychiatrists, neuroscientists, psychologists, and engineers and computer scientists, have become increasingly common to the point that some have suggested they make up a new subfield altogether called Computational Psychiatry. The subfield now has its own journal, published by MIT. Proponents and practitioners of Computational Psychiatry integrate the tools and techniques of engineers-such as machine learning, the great-great grand-kin of the "perceiving machines" that Pseudomorph describes-to solve the problems of psychiatry-like the lack of biological markers for diagnosis, which makes it difficult to determine which patients are gravely ill and in need of care versus those who are less ill and those who might not be ill at all. Some of my informants position Al-enabled techniques of pattern recognition as a panacea for readdressing American psychiatry's epistemological problems, and its public health problems, in one fell swoop, while others recognize it to be a temporary, overly optimistic band-aid. Taken together, my informants' research projects offer a case study in one way of doing Computational Psychiatry, in part by illustrating the self-reflexivity and heterogeneous attitudes and affects of the various actors involved. HETEROMATION AND GENRES OF THE HUMAN 4 A year after the article was published, the IRE merged with the American Institute of Electrical Engineers (AIEE) to form the Institute of Electrical and Electronic Engineers (IEEE), the self-identified world's largest technical professional organization. See 24 The fable of the robot clinicians told at the Summit resonates with ongoing conversations about the value of human work that are happening in the public sphere. Conversations with my informants as I watched and worked with them to build an algorithmic system, getting a behind- the-screen look at the conditions that make the basic functioning of these technologies possible, challenged my own assumptions about automation (the mechanization of a process once performed by humans, the supposed removal of human intervention). I decided to open this introduction with the story about the clinician robots because it points to something that became a key feature of my ethnography: there is a hierarchy of value within the mental health care professions, and while much fear revolves around the automation of therapy (a medico-legally ratified form of psychiatric care) there is less fear-and far less ink spilled-about the automation of assessment. If "the clinicians are, of course, robots," what are we to make of the fact that the clinicians who are humans are not disturbed enough by this to attend the protest? On the one hand, maybe the clinicians are, like the panelist from the Middle East had imagined, operating the robots remotely, and will only directly treat a protestor only if they are in "true" need of care. But there is another potential reading of this science fiction: perhaps the human clinicians are not there, because the kind of work the robot clinicians are doing-the work of psychic triage, sorting the ill from the well-is not their territory. The clinicians are not out on the frontlines treating protestors (presumably free of charge) because they are in their offices, sitting by a patient recumbent on a couch who is paying out of pocket for the therapy because the clinician does not take insurance. My dissertation speaks directly to the uneasy position and decreasing value of psychiatric screening and other administrative tasks within psychiatry, especially with regards to the role of 25 listening as a key, interactional feature of psychiatric assessment. I show that the devaluing of psychiatric screening is part of a larger trend toward devaluing gendered, racialized and classed administrative, service labor-like nurses' assistants, medical technicians, custodial cleaners-in the context of health care in the United States (Nakano Glenn 1992). Against technical, skillful, quantifiable work that can only be performed by a credentialed expert after years of training, psychiatric assessment is positioned as custodial, skilless, as depending on "soft" qualities that can't be quantified and that are supposedly part and parcel of the basic equipment with which all humans are born: the capacity to listen empathically. I also show how the very notion of what it means to be empathic-to listen empathically-is wrapped up in ideas about the relationship between speech and self, and mind and language, and torqued by ideas about gender, race, ability, and class. The division of labor within the teams reinforces the low position of psychiatric assessment and its attendant tasks within the hierarchy of mental health care professions. Yet this work is essential to the eventual technological prototypes the teams seek to produce. In order to create technologies that listen beyond the human, they must rely on humans listening. Thus, there is a paradox at the center of their efforts that researchers regardless of their disciplinary training (in either psychiatry or engineering) grappled with: in order to build the algorithmic infrastructures that would make their technologies possible, researchers had to constantly fall back on and rely on the language practices-and the linguistic labor-conventional to psychiatry, the very same practices their technologies were supposed to efface. As I ethnographically tracked the process of gathering the data and building the infrastructure that is foundational to this whole process-the process of using Al to automate assessment-the technologies started to look less and less like a kind of deus ex machina, and I became more 26 attuned to the sometimes subtle critiques my informants were making of their own projects. By following the day-to-day practices of the people who make automation possible (and the people who make it seem automated rather than heteromated) the technologies started to look instead like a microcosm of long-standing dynamics of power and authority within the psychiatric professions and mental health care in the U.S. In this dissertation, tracing the remaking of psychiatric assessment ethnographically will lead us into fundamental questions about what it means to have language-what it means to be human. As Lucy Suchman has contended, the machinic components of human-like machines display "a kind of doubling or mimicry...that works as a powerful disclosing agent for assumptions about the human" (2007: 229). In Lilly Irani's words, "hierarchies of value have long overlapped with hierarchies of gender in the technological imagination" (2013: 733). More often than not, the figure of the human that human-like machines are positioned against is exclusionary rather than all encompassing. In turn, the figure of the machine-either passive servant to human desires, or unruly agent threatening to overthrow its creators-falls along historical, colonial fault lines. Ruha Benjamin (2019), with reference to Sylvia Wynter (2003), writes that "our very notion of what it means to be human is fragmented by race and other axes of difference," and although the category of the human operates as a universal moniker, there are in fact "genres" of the human that include "full human, not-quite-humans, and non-humans" through which "racial, gendered, and colonial hierarchies are encoded" (31). Trying to pin down the image after which the human-like machine is made can help us pull apart and decipher these codes. We can read the "artificial"-the non-human, the mechanized, the inert machine-for what it says about the skills, value, and expertise bundled together with certain kinds of tasks and not others. 27 In this regard, the following questions, posed by Suchman (2007), also motivate this dissertation: "what figures of the human are materialized in these technologies? What are the circumstances through which machines can be claimed, or experienced, as human-like? And what do these claims and encounters tell us about the particular cultural imaginaries that inform these technoscience initiatives?" (229). These questions are particularly pressing to address in the context of communication technologies that are supposed to resemble but also improve upon aspects of communication-technologies that are supposed to listen like humans while also listening beyond the human. Parsing through the logics and techniques of resemblance and likeness in regards to language can help us make sense of the semiotic bundling of the visual and the aural-for instance, "looking like a language, sounding like a race" (Rosa 2019)-to better understand, among other things, the language ideologies underpinning the gendering and racialization of both language and listening (see also Eidsheim 2019). This dissertation draws heavily from feminist and anti-racist STS scholarship that re- centers into analytic view the materially grounded labor practices that make high-tech and flashy and "innovative" technologies possible, like computer chips and cell phones. For instance, scholars like Donna Haraway (1991) and Lisa Nakamura (2009) emphasize that the manual labor of women-especially women of color-has largely fueled and yet remains marginal to the massive manufacturing enterprise that enables the tech giants of Silicon Valley. To keep this marginalized labor in mind is to keep in mind that digital technologies are always the outcome of digital work, meaning, as Nakamura puts it, "the work of the hand and its digits" (2014: 932), and to keep in mind that computation is made possible not only through software but through the alignment of "wet-ware and fleshware" (Philip et. al 2012:19). These interventions are especially crucial when it comes to digital media technologies, precisely because these technologies might 28 otherwise seem so immaterial, with the immediacy of the connections they enable, and with their codes and clouds and their screens that mediate away and make less available for scrutiny the bodies and the work that went into producing them. Drawing attention to this otherwise marginal digital labor and to the fleshware of software dissolves what Astra Taylor (2018) calls "the ideology of automation, and its attendant myth of human obsolescence." When we search for the digital work that undergirds the technologies my informants build, we find what I call linguistic labor: the work of giving the impression that you are listening empathically and carefully, or the work of strategically encouraging the sharing of personal details, or the work of listening for suicidal ideation. It is precisely the erasure of this labor that enables this illusion of machine autonomy, that makes heteromation look like automation, and that allows the notion of "machine listening" to make sense as a mode of listening that is distinct, superior to, and set apart from human listening. As my ethnography will show, in practice, in the effort to make machines listen, the division between human and machine-including human listening and machine listening-wavers and break down. SPEECH, SIGNAL, SYMPTOM This dissertation offers a critical science and technology studies (STS) approach to both psychiatry and the communication sciences in the United States. It shows, processually, how facts about language are contingent, assembled, and require work to be held steady, including the very fact that language can be transduced and distilled down into signals. An ethnographic approach is uniquely capable of locating contingency in the production of scientific facts, while also avoiding a narrowly technological determinist take-that it is the technologies that 29 recognize speech and identify connections between speech sounds and interior states, and that it is the technologies that are capable of replacing human labor tout court. Participant observation, interviewing, and learning alongside and doing things with my interlocutors, allows me to study these efforts to remake and remix psychiatric assessment "as a dynamic practice between human and machines" (Thakor 2018: 9), one that hinges just as much upon gut instincts, structural inequality, tacit knowledge, and longstanding ideas about the nature of language, as it does on the nuances of code, mathematical processes, and psychiatric inventories. With reference to Langdon Winner's (1980) question as to whether or not artifacts have politics, disability studies and STS scholar Mara Mills (2011) poses a corollary question: "do signals have politics?" My dissertation explores this question ethnographically, focusing on people whose central, professional concern revolves around looking for, defining, and transducing signals, moving them from one medium to another, from the oral, to the auditory, to the informatics, to the bureaucratic. Like Mills (2011a), I assert that the signal is a material- semiotic object-and idea and a thing-with its own history and social life, as well as an actor's category wrapped up in disciplinary-specific concerns and epistemologies. On the one hand, my informants use of the term "signal" in their everyday talk and in their public presentations of their work-the speech signal, behavioral signals, and so on-is a reference to signalprocessing. Signal processing is a subfield of engineering concerned with identifying and extracting information "from the sonic environment for transmission down to a limited number of channels" (Mills 201Ia: 332). Forged through the coalescence and overlapping histories of cybernetics, D/deaf education, information theory, and the development of the telephone, the field of signal processing focuses on methods for transforming auditory phenomenon into objects that can be mathematically modeled "over time and through circuits," 30 allowing thus for further modification and modulation (Sterne and Rogers 2011: 32). The signals of signal processing are "electrical 'carriers' of other signs, encoded transmitters of messages (these codes often obtaining from the quantified information content of the message)" (Mills 201lb: 81). For my interlocutors, many of whom are trained in speech signal processing and employ its methods in their research projects, the presence of mental illness is one potential message that "the speech signal" carries. The signal is the smallest kernel of meaning, that which really matters in a stream of sensorial stuff, the component of the message which must be preserved in order for the message to remain meaningful. In this way, the signal of signal processing is wrapped up in questions of value. On the other hand, rather than seek out a definitive definition of the signal (and a definitive answer to the question: if signals have politics, then what are their politics?) I seek to explore this definition ethnographically, investigating the elaboration of the signal's politics in practice. To ask my informants about signals is to ask them what they care about. What are they after? Again, what do they value? As Kockelman puts it, "what is noise for you may be signal (or meaning in place) for me" (2017: 140). Thus, I follow Seaver's (2017) methodological tactics for studying algorithms (another material-semiotic assemblage that has multiple meanings and disciplinary, historical legacies) in place, rather than seeking out stable, sterile, and unitary definitions. In other words, the point of my dissertation is not to propose a theory of what a signal truly is in general and in the specific instance of my informant. As Seaver asserts, technical people (just like us!) "do not maintain the definitional hygiene that some critics have demanded of each other" (2017: 3). Signals mean (and can do) many things (sometimes contradictory, sometimes overlapping) for my informants. This is especially the case given the disciplinary differences between engineering, signal processing, and psychiatry and clinical 31 psychology, the primary fields of expertise that make up my interlocutors' interdisciplinary teams. For some of the researchers working on the team, the end goal of their study is to produce research findings and/or technological prototypes that will remake psychiatric assessment semiotically, re-tuning the interpretive valence of the encounter between speaking patient and listening health care worker. Speech analysis technologies shift the terrain of the signal-noise relationship; semantic content (what the patient says) falls to the foreground, while paralinguistic form (how they say it) takes the place of semantic meaning as the sought-after signal; this shifted terrain torques the normal, hegemonic ideologies privileging the referential function of language that circulate in spaces of power in the United States, from the law, to the church, to science and biomedicine. The trichotomy of speech, signal, symptom, forms an indexical chain. Supposedly, among people experiencing mental illness, vocalized utterances (speech) contain signs of mental illness that are detached from meaning, that exist within the smallest possible grain of speech as sound, even below the level of the phoneme. Speech, signal, symptom corresponds with nodes along the research pipeline: from the elicitation and gathering of data (speech), to processing, discarding, categorizing, and organizing (signal), to analysis and the production of scientific facts about language and mental illness (symptom). My dissertation aims to examine the transductive labor that happens along each of these nodes, with attention to the labor it takes to move the medium of language across multiple media. This includes the labor of research subjects, whose vocalized utterances, brains, bodies, and memories form the substrate from which the researchers draw conclusions and attempt to build interventions. Research subjects, many of whom are actively, mentally ill, produce the "assistive pretext" of my interlocutors' 32 technologies, the "resourcing of disability within technoscience" (Mills 2010: 39). My interlocutors' intellectual ancestors-telephone engineers and information theorists-resourced D/deafness in developing their theories of the signal and building these theories into communication technologies. Likewise, my informants' resource their research subjects' experiences of mental illness. Listening is the common thread that cuts through and across these territories. Speech, signal, symptom are strung together in association with each other through different modes of listening. The conviction that listening can be a form of medical treatment, ethical engagement, and empathic care is a key cultural legacy of North American psychiatry. At the same time, my interlocutors' research is motivated by a sense that listening in the context of psychiatric encounters has failed the discipline, has failed the family members and loved ones of mentally ill people, has failed the mentally ill themselves, and has even failed the nation as a whole, in as much as the treatment and management of mental illness is a concern of the state. Their alternative to the conventional-and fallible-tactics of listening in psychiatric contexts is machine listening which, like the signal, has a variety of definitions and enactments. Sushant, the PI of the research team at East Coast University, once remarked to me that machine listening is "just like human listening." Instead of an ear and a brain, there is a microphone (taking the place of the human ear) and the computer (taking the place of the human brain). Ideas and enactments of machine listening pose a related question: what is human listening? How is the machine of machine listening figured against, and through, ideas about human listening? Thus, exploring the meaning of machine listening in practice brings up questions of epistemology, ontology, and ethics. For instance, what are the ethics of attempts to machinically outsource the decision- 33 making labor of psychiatric assessment-which amounts to decisions about whether or not someone is deserving of more professional attention and care? The dissertation explores what it means to bring together communication sciences, engineering, psychiatry, and, to an extent, social work, and to apply the tools and techniques of computer scientists (signal processing, big data analytics, Al-enabled techniques of pattern recognition) to a psychiatric problem. Like the professor of fictional George Lance, my interlocutors would often reference the unique capacity of their engineering backgrounds to approach the study of mental illness. But at the same time, to study mental illness "from an engineering prospective" requires engineers at varying career stages (from the most novice to the most senior) to contend with things that people in psychiatry typically contend with. They must face the realities of living with mental illness head-on, confronting human suffering in a way that at times hits painfully close to home and in a way that makes questions of professional responsibility, ethics, and care unavoidable. Thus, questions about care-what does it mean to care? Who should care? Who does caring include, but also exclude, or even harm?-became crucial, unavoidable features of my ethnography as well. The arrangement of expert researchers, psychiatric practitioners, and research subjects together within the confines of an academic study creates situations in which caring for and about research subjects seems like the right thing to do but is nevertheless institutionally wrong and disciplinarily incorrect. The purpose of their technological prototypes was to conduct psychiatric assessment rather than provide psychotherapeutic care or even provide an official, medical diagnosis. Likewise, researchers who gathered, listened to, and categorized research subjects' speech were not mental health care professionals-they were incapable of providing sanctioned, official care. The distinction my interlocutors made, and asserted again and again, 34 between therapy and assessment, was in many ways about drawing boundaries around what counts as care, even as they participated in practices that also seemed like care. Indeed, just as quickly as they would assert that they could not provide care in the context of their study, and that their technology could not provide care, they would discuss the extent to which participating in the study afforded subjects the chance to feel caredfor by feeling listened to. Day-to-day research practices were charged with this tension between listening and care, empathy and responsibility. Without totally absolving my informants-and my own-complicity in sometimes ethically hoary practices, the dissertation also suggests that their work points to a broader "ethical soundscape," to use Charles Hirshkind's term, a milieu in which control, surveillance, good intentions, and resistance cannot be easily disentangled. METHODS: RESEARCHERS AS RESEARCH SUBJECTS, FIELDWORK AS HOME- WORK Before delving into some of the larger, theoretical concerns that motivate the dissertation, and describing the trends in U.S. psychiatry that motivate the efforts to develop automated speech analysis technologies for psychiatric assessment, I will review the methods I employed in this study, including justification for my site selection. Additionally, I discuss the organization of labor within the teams and my positionality within them as they relate to my ethnography, along with my own positionality with respect to biomedicine and the health care system in the United States. I selected the three university-based lab groups in order to capture differences related to four variables: the intended use of the technology being developed, the makeup of the research team, the source of funding involved, and the institutional affiliations and academic careers of 35 individual team members. The three fieldsites offer opportunities for contrastive comparison while, taken together, combine to tell a bigger story. Each site is a node in a larger, interrelated network of engineers collaborating with psychiatric professionals to augment the encounter between potential patient and mental health care provider in the context of psychiatric assessment. Because it is concerned with the analysis of interdisciplinary collaboration within groups, and the comparison of practices, ethics, and ideologies across groups, the dissertation is both comparative and multi-sited (Hannerz 2003; Marcus 1995). I use the terms "informants" and "interlocutors" interchangeably through the dissertation. I find "informants" suiting due to its resonances with "information," given their commitments to models of language that have their origins information theory, and given engineering and psychiatry's own entanglements with informatics and computing. Nevertheless, many anthropologists have adopted the term "interlocutor" to describe the people with whom they have worked, lived, and learned from, in an attempt to avoid the associations of espionage and extraction that "informants" comes with-in other words, in attempt to work past and reject anthropology's history of colonial projects of state-making, development, military intervention, and occupation. "Interlocutor" is likewise a fitting term to describe people who are concerned with the nuances of communicative interaction in psychiatric encounters, and of attempting to replicate and simulate them in data collection portions of their studies. "Interlocutor" also implies an exchange-that we were in conversation with and mutually learned from each other. My own ethnographic pursuits and the development of voice analysis technologies for psychiatric assessment require related tactics: establishing rapport, interviewing, recording speech that circulates far beyond the context of its utterance and is analyzed in ways that the initial utterer may never have anticipated. We therefore ran into similar ethical quandaries: how 36 to truly protect the anonymity of our research subjects? While de-identifying data-like the use of pseudonyms-poses some amount of protection to researchers' privacy, in an interconnected world (of researchers who all know each other, and in which ubiquitous data gathering is a feature rather than a bug) how much anonymity could we both really promise? At the same time, the power dynamics of my encounters with my informants was never a settled, established matter. PIs of studies at academic institutions, military officials, and so on, have more power, influence, and far more resources than a graduate student in a social science field. To keep them anonymous them is a means of protecting myself by downplaying my association with them. Yet many of the graduate students and undergraduates with whom I encountered were not U.S. citizens. I conducted my fieldwork in the middle of the Trump administration's travel ban on people from Muslim-majority countries, which impacted the lives and families of many of the people who whom I worked. To keep them anonymous is to avoid meddling with their careers and with their immigration status. I did feel that my fieldwork was extractive-just as my interlocutors questioned the extractive, exploitative nature of their relationship with their own research subjects. I use "informant" and "interlocutor" together, then, to always keep these uneasy, shaky power dynamics and unanswerable ethical dilemmas in view. By using them interchangeably I hold the terms and their various associations always in tension with each other, and as a reflection on the interplay of extraction, transparency, trust and paranoia that effused my fieldsites. Just as anthropology must always contend with its colonial past-and its enactments in the present-to use both these terms at once is to sit with the uneasiness that my fieldwork, and the researcher's own projects, trafficked in. I hope it will shed further light on why exactly these things make us uneasy. 37 My fieldwork followed the day-to-day practices associated with building and testing psychiatric speech analysis technologies, tracking how researchers represent and promote their technologies to media outlets, in grant proposals and journal articles, and at public demonstrations, conferences, and workshops. After undergoing human subjects research training, I was added to research teams' institutional review board (IRB) protocols so that I could participate in and observe, and, when permissible, make audio and video recordings of daily research activities. Activities ranged from planning and preparation (e.g. making experimental stimuli to be used in studies); piloting (e.g. pretending to be a research subject); data gathering (e.g. conducting brain scans, interviewing research subjects); data processing (e.g. sorting, listening to, and labeling audio recordings); data analysis (calculating agreement between data labels, building predictive models); and dissemination (e.g. presenting at academic and public venues). I helped to develop training materials, brainstormed with my interlocutors on how to revise their grant proposals, and socialized with them, both within the space of their labs and without, at local bars, restaurants, birthday parties, and goodbye parties. As I discuss below and in more detail in Chapter 2, a good portion of my fieldwork involved troubleshooting and maintenance tasks. In addition to participant observation, I conducted person-centered interviews with key members of each research team. I began transcribing audio and video recordings and coding my fieldnotes during fieldwork. According to Summerson Carr "a linguistic anthropological method assumes that culture and its many institutional forms and formulas manifest in semiotic interaction rather than simply controlling and containing it" (201Ob: 27). In conducting my fieldwork, in order to explore the linguistic and semiotic ideologies that researchers elaborate in their efforts to build speech analysis technologies for psychiatric assessment, I focused on researchers' talk and 38 metalinguistic discourses about listening, language, mental illness, and care in conversation with me and with each other. When my interlocutors consented, I audio-recorded our day-to-day activities and conversations in lab meetings, within our individualized offices, or after having attended events, talks, and conferences together. I also audio recorded individual interviews if the researchers consented, although quite a few of them did not consent. This led to several, off-record conversations in which the researchers reflected frankly on their own feelings about what it meant to record their own research subject's speech. Their discomfort with having our conversations recorded oftentimes spoke to their disquiet with the ubiquitous surveillance and data capturing that participation in their own studies entailed. Their ethnographic refusal 5 formed yet another critique of the very same research practices that they forwarded and participated in. By consenting to be part of my ethnographic study, I shifted them to the position of research subject. In this way, my awkward meta-position within the team-helping them study other people, but also always studying them studying other people-helped me to heuristically pin down people's ethical limits and beliefs, which they might not have otherwise voiced. This was not quite studying up, and not even quite studying sideways. The terrain between us was constantly in flux, and as I struggled to get my bearings, my interlocutors pushed me to think more self-reflexively-and humbly-about the ethics of ethnography itself. The three teams had the same organizational structure, and as a research assistant, I was embedded on one of the lower rungs within that organizational structure. Thinking reflexively about my position within the teams helped me to better understand the extent to which the labor 5 See (Simpson 2007) and (Benjamin 2016) for discussions of refusal in which the power dynamic between researcher/investigator and researched/investigated are top-down (i.e., either in the context of anthropologists studying native populations, or in the context of research subjects, patients, and tissue donors agreeing to allow others to access their individualized, somatic data). 39 that powered the teams' projects is both specific to U.S. psychiatry (as I'll discuss in Chapter 1), but also gendered, and (as I'll discuss in Chapter 3) raced. Across the three teams, at the top of the hierarchy are the leaders, the PIs. Typically, there is both an engineering PI and a psychology or psychiatry PI. Underneath the PIs are post-docs, who play a supervising role and delegate tasks to the people below them: grad students, undergrads, and research assistants like me. Finally, there are staff members: employees of the university who provide administrative support to the team. Engineering team members tend to be male, and higher up on the team-most of the PhD students and post-docs were men. Psychiatry or psychology team members tend to be women, and lower in the team, such as research assistants and staff. Team members on the psychiatry side of things also tend to be the "face" of the project-the people who interacted face-to-face with research subjects the most. Because I lack training in psychiatry or engineering, as a research assistant, many of the things I ended up doing werethings that higher-ranking team members lacked the resources or time to do, but needed to get done. This included things like listening to and labeling voice data, changing the sheets on the mattress that subjects rest on while getting their brains scanned, or co- writing a script for and acting in a video created for an experiment. My position as a novice working alongside other less experienced or credentialed researchers and staff allowed me to get a better understanding of the tasks that were considered menial or busywork (things that anyone could do). A common thread uniting this busy work is that it is primarily social work, work that involved soft skills-work that is stereotypically feminized, from tasks that were overtly domestic (like making the brain scanner bed) to the more subtly feminized, including tasks revolving around extracting speech data from research subjects and monitoring the content of 40 their speech, a form of work which I call "linguistic labor" and which I will expand upon throughout the dissertation. Due to the custodial and administrative position of these kinds of tasks vis-a-vis other tasks on the team that more established and more senior members conducted, I often felt like the research activities which consumed my day from 8am to 5 or 6pm were non-essential or even tangential to the aims of the teams' projects. I would often think of a scene from Hallam Stevens's historical ethnography on the encroachment of informatics into biomedical research (2013). Stevens describe the slick, glossy, and expensive-looking building in which researchers meet and sit in front of their computers. This is, for all intents and purpose, the public facing image of bioinformatics: impressive buildings with glass walls that give the impression that the science going on inside is both important and accessible, impressive and worthy of sustained funding. Yet there is another building, also connected to the same bioinformatics research project, which offers a different picture: this building is drab and industrial, a worn-out warehouse with flaking paint and small windows. Within the building, Stevens finds assembly line-style and automated machinery, technicians tending to the machinery, and janitorial staff. Stevens argues that both of these buildings are a part of and necessary to the larger research project. I often felt as if my immediate fieldwork took place was more closely related to the warehouse than the glossy building: hidden away from view, monotonous and unglamorous. It was only upon reflecting on my fieldwork, years later, that I began to fully grasp and internalize Steven's argument, and realized that my own devaluing of the work my interlocutors and I performed was playing into the idea that the "real" work of science is mental rather than physical. A performance studies approach to studying science and technology might refer to these two buildings and the categories of work they represent as the front stage and the back 41 stage of science-the back stage, which is less visible, is dedicated to coordinating and managing the image and performance of the front stage (Hilgartner 2000). However, this kind of analysis insinuates that there is some true, authentic place where "real" science is happening; it re- inscribes the less glamorous, custodial kind of labor as merely in service of the more impressive, ethereal realm of thought, rather than recognizing that both are valid and necessary to the production of knowledge. While I did not have immediate access to the processes and procedures of data analysis, the access I did have-to more mundane, domestic work-helped me to better understand the ways in which the making and doing of science is distributed across thinking and practice, machinery and bodies, technical expertise and embodied, tacit experience. Early in my graduate career, a professor at another university once remarked to me that doing fieldwork "at home" is incredibly isolating. I reflected frequently on the ways in which my fieldwork was homey, familiar. I am a U.S. citizen, a settler-this place, sometimes referred to as the United States, is my home. Like many of my informants, I was completing my PhD as I conducted fieldwork. My fieldsites themselves were located at offices, on the top floors of hospitals, in classrooms, in libraries, on campus green areas, cafeterias, and so on-spacesI inhabit as a graduate student, and spaces that feel safe for me as an educated white woman with class privilege. The boundary between "the field" and "home" was porous. With this being the case, I also recognize that the notion of doing fieldwork "at home" threatens to reify this boundary-between the home and the field-which anthropologists since Gupta and Ferguson (1992) have argued is indebted to anthropology's colonial legacy while it perpetuates the siting of fieldwork as somewhere radically other, with radically Other subjects. Moreover, as Kamala Visweswaran argues, searching for the ways in which we feel "at home" while in "the field" lays the groundwork for a feminist method of conducting fieldwork as home-work (2003). Home, as 42 she writes, "once interrogated, is a place where we have never been before" (2003: 113). Indeed, there were uncanny echoes-familiar but strange-that kept my fieldwork and my own life inseparably close. That is to say, as someone who lives with a chronic illness and chronic pain that arises for no good reason and debilitates me when it does, my fieldwork at times felt deeply personal, sliding from ethnography, to auto-ethnography, to ethnography again. Studying biomedicine while also having to lean on it and push myself through its tangled systems (with the support of my family and loved, ones no less) meant that the stopping point of fieldwork was difficult to place. Fieldwork melted into homework, and the two fused together even more tightly while writing the dissertation, and becoming more ill. In my own increasingly frequent encounters with medical specialists, assessments, pain scales, sensors, sometimes hollow attempts "bedside manner," and consent forms that I had to review, correct, and then sign, I have developed a closeness with my informants' research subjects, a wounded affinity. Due to the double bind of my informants' IRB protocols and my own IRB protocol, I was unable to interview or record detailed data about the research subjects with whom I interacted, directly and laterally. Still, they are ghostly present in their absence in my dissertation. The details of their lives and their experiences being subjects-which I heard and listened to but which I abstain from writing down-have continually pushed me to keep the more violent dimensions of care in view. I take my illness and the biomedical zones of authority, surveillance, and uncertainty it brings me through to be a form of feminist praxis, one that is central to my theorization of diagnosis, language, and the body. Being ill, in pain, and studying diagnostic systems afforded me a kind of sixth sense about bureaucracy of the health care system and hierarchies of clinical labor that sometimes escaped the hand of language. For instance, moving through my fieldwork, 43 I wouldjust have afeeling that there was something going on about gender or race, detect a taste of ableism, even though it took me years to be able to articulate the evidence underlying this sensation. My own medical experiences have helped me to develop tactics for reading in between the lines of the cultural myth of biomedicine, namely, that biomedical illness categories correspond directly and completely with lived experiences, that they name finite things and can offer finite, tangible, and linear solutions to sickness. They have also taught me to read IRB protocols as pragmatic documents, as one kind of way of doing ethics, rather than all- encompassing protective measures. Maya J. Berry, Claudia Chavez Argilelles, Shanya Cordis, Sarah Ihmoud, and Elizabeth Velisquez Estrada's (2017) poignant, co-authored essay reflects both my own experiences, and the tactics I have deployed in crafting this ethnography: "we are not merely conducting research, but are connected to the places where we work through familial ties, diasporic relationships, and investments in political struggles, all of which hold us accountable even after our departure. Our relationship to our research thus subverts the assumption that the field inhabits an/Other time-space, as well as the masculinist notion that the time-space of the Other is to be instrumentally penetrated and evacuated. Our entrances and exits do not hinge on geographical border crossings. In a sense, the field travels with and within our bodies" (Berry et al 2017: 540). THERAPEUTIC TALK AND LINGUISTIC LABOR Linguistic anthropologists have argued that the enterprise of Western psychiatry has been a privileged site for the enactment of cultural assumptions about the nature of self, mind, language, and health, reflected most legibly in the verbal practices of "talk therapy" (Carr 2010; Perdkyl 1995, 1998; Wilce 2009). In its most Freudian form, American psychiatric practice of treatment, diagnosis, and assessment operate under the assumption that absolute, linguistic transparency is impossible, and that the therapist is uniquely capable of arriving at the patient's secret, occluded desires through symbolic analysis of speech (Reik 1948, 1964; Vehvilainen 2008). Marsilli- 44 Vargas (2014) suggests that psychoanalysis constitutes a "genre of listening" that circulates as a framework for setting an interpretive context and guiding how expert interpreters "tune" their ears. On the other hand, and under the influence of Carl Rogers's "client-centered" approach, countervailing tendencies in specifically American psychiatric practices emphasize self- realization through therapeutic talk, suggesting that a speaker can agentively locate, articulate, and actualize a true self in therapeutic discourse (Smith 2005; Carr & Smith 2013), sometimes even at odds with the clinician. Thus, Carr has shown that, at least in the context of American addiction treatment centers, while clinicians wield a considerable amount of power in the psychiatric encounter, patients can subvert and strategically flout the interpretive frameworks clinicians seek to impose (201Ob:23; 2010). Talk-based psychiatric encounters in the United States reflect connections that linguistic anthropologists have established between Euro-American language practices and ideologies of mental transparency (Jones & Schieffelin 2009) that contrast sharply with ideologies of mental opacity prevalent in Pacific societies (e.g., Rosaldo 1982; Schieffelin 2008; Throop 2010). Hegemonic Euro-American language ideologies have been found to privilege the denotational function of language (Silverstein 2012) and to imagine that speech signifies by referring to a speaker's intentions (Duranti 1993; Keane 1997; Silverstein 1998). Paradoxically, speech analysis technologies appear to pursue an ideal of transparent inner reference by circumventing the semantic, referential dimension of language altogether, emphasizing indexical properties that, linguistic anthropologists (e.g., Silverstein 1985) contend are often minimized in spaces of power in the United States, such as the legal arena (Mertz 2007), in Christian missionary encounters (Robbins 2008; Keane 2008), and in the sciences (Gordin 2015).6 As Stasch (2008) contends, 6 While these ideologies are dominant and linked to institutions of power, they are not the only ideologies in circulation in the United States. See, for example, Claudia Mitchell-Kernan's paper on "signifying and marking" within African-American 45 claims to linguistic opacity or transparency are always political in nature. The technology the three teams hope to produce entail a rearrangement or perhaps intensification of the asymmetrical terms and power dynamics of the patient-clinician encounter, in which the patient's speech is made more transparent than the health care workers. The development of the technologies in the context of research studies also requires that research subject's speech be scrutinized in ways that the subject themselves may have never anticipated. Lower-level researchers within the teams in particular are tasked with listening to research subject's speech intently while also trying to remove, downplay, or strip away the semantic dimensions-the narrative, personal content of their utterances. At the same time, while the purpose of their research is to identify markers of mental illness in the sounds of speech that are wrapped up in biological univeralism, linguistic anthropologists have shown that qualities of the human voice like pitch and intonation, beyond the denotative function of speech, have a variety of culturally elaborated meanings (Harkness 2013; 2015). The three teams' attempts to develop vocal diagnostic technologies promise to be particularly rich case studies in this regard, since in the process of designing their technology and conducting research on it they attribute value to specific vocal qualia. Altogether, my informants' research projects are potent sites at which language ideologies-ideas about how language works-are assembled and ratified, even as they are contested and bent and pushed to their limit. communities of speakers (1972). Shaka McGlotten (2016) also discusses readinga nd shade as distinctly Black, queer signifying practices. Reading and throwing shade involve stridently yet subtly insultingone's interlocutor. The sting emanates from what need not be said about a person. The referential function of speech is poetically, torqued and twisted, and the speaker recruits other paralinguistic cues to make their point. Writes McGlotten, "In Paris Is Burning, Dorian Corey describes it this way: "Shade is, 'I don't tell you you're ugly, but I don't have to tell you because you know you're ugly'...[throwing shade] does not require any specific enunciation to deliver an insult; rather, it uses looks, bodily gestures, and tones to deliver a message" (McGlotten 2016: 265, 279). 46 Moreover, dominant ideologies play out much messier in the on-the-ground practice of psychiatry. Ethnographies of psychiatric diagnosis in context have shown that clinicians tend to take a pragmatic approach to language in psychiatric encounters. Rather than using diagnosis and assessment as lights that illuminates the inner truths of a patient's psychosis and corresponds one-to-on with their symptom expression, Lorna Rhodes (1995) has shown that, especially in resource-low public health contexts like emergency psychiatric hospitals, clinicians diagnosis patients with bureaucracy in mind, strategizing on how to move a patient through the health care system in a way that grants them access to resources they need, whether it be medication, psychotherapy, or confinement in a ward bed. My dissertation focuses on the development of psychiatric technologies prior to their distribution in clinical settings. That being said, while I do not study clinical encounters, I am interested in probing my interlocutors' imaginaries about the lives and afterlives of their prototypes, and how these ideas about their prototypes' potentials motivate the very models of language, mind, and human difference that get built into them (see Taussig, Hoeyer, and Helmreich 2013). Moreover, describing moments in which research subjects refuse, subvert, and jam the data collection process, offers a kind of speculative fiction of how automated psychiatric assessment might be resisted and/or reformulated to better serve the patients and people they are designed to interpolate. Just as psychiatry in the United States reproduces dominant language ideologies, so does speech signal processing. These two ideologies coalesce in my interlocutors' research, sometimes in competing, conflicting ways. Speech signal processing forwards a motor theory of speech, in which speech is one arch of a circuit connecting the brain, the muscles involved in the production of speech, and the sound of speech itself. The three teams attempt to grasp hold to this part of the loop-speech-using an assemblage of recording technologies and human labor 47 in order to follow it the brain, which they frame as thel ocus of all human experience, especially the experience of mental illness. That is to say, the researchers are not setting out to prove that acoustic features of speech can be read as signs of mental illness, so long as they are listened to using the right technoscientific mediation. Rather, they are trying to figure out what these features might be, and how they might be located in clinical encounters so as to render psychiatric assessment-determining which patients are mentally ill, and which are not-more efficient. Although they require research subjects' speech to build their data sets, in the context of the studies, the semantic components of speech are a decoy. They are after something more fundamental than the what of speech, something more foundational that hardly even looks like speech at all: sounds that can be described with reference to waveform analysis. Yet while the research projects turn on the notion that referential semiosis is fallible, and mental illness cannot be represented linguistically, they must rely on language in their search for these acoustic, pan-human signs. They must participate in practices of rapport and trust building, depending on culturally legible ideas about speech and the self. This is where questions of labor also come into play. Whose job is it to elicit data-speech-from research subjects, and who works to render speech (sounds) transparent (or trans-sononant)? Should elicitation unfold in the form of a communicative interaction between two humans? Between a human and a machine? Between a human and a machine that is, secretly, controlled by a human? Even though psychiatric assessment-the very genre of interaction they seek to automate-is figured against diagnosis as less technical, and even though my informants' attempts to automate assessment might seem to suggest that the work of assessment is less skillful than diagnosis, my fieldwork revealed that conducting assessment is indeed skillful-albeit undervalued-work. It's skillful because it requires the performance and display of a certain kind of listening subject position 48 an active, empathic listening subject who is attentive to the meaning and emotional impact of a patient's speech. This attentive listening is displayed through verbal and non-verbal practices aimed at maintaining social bonds, at sustaining trust and rapport, and at managing the emotional wellbeing of the speaker. Displays of active listening that manage the speaker's impression of how the listener is listening, and what they are listeningfor are a crucial component of what I refer to as linguistic labor. While much of linguistic anthropology has focused on the production of speech, the reception of speech is just as viable of an object of linguistic anthropological concern (Erlmann 2004; Feld and Brennis 2004; Hirshkind 2006; Faudree 2012; Feld 2012, 2015). A larger intervention of my research is to take listening seriously as an ethnographic object, emphasizing that listening is not the passive uptake of speech but an agentive communicative practice. This means pointing out that language ideologies always contain within them listening ideologies- ideas about how speech should be auditorily attended to and ethically attuned toward, especially with regards to the speaking subject. Several other forms of labor that linguistic anthropologists have described fall under the umbrella of what I am referring to as "linguistic labor," which has both semiotic and linguistic components. For instance, Miyako Inoue (2018) uses the term "verbatim labor" to refer to the work involved in ensuring the faithful correspondence between word and text, spoken utterance and graphic (or otherwise) representation, such as stenographers, medical transcriptionists, and oversees call center operators. 7 Linguistic labor also relates to what Wilf calls, with reference to Garfinkel's theory of interaction, "interactional homeostasis," or "the idea that participants in an 7 For a historical account of the gendered dimensions of early telephone operators, the "human switches" who manually connected calls through the removal and insertion of wires, see Kenneth Lipartito's 1994 article, "When Women Were Switches." 49 interaction strive to maintain interactional order and compensate for interactional noise and disorder through negative feedback mechanisms such as 'repair work"' (Wilf 2019: 203). This includes efforts to ensure that the conversation unfurls "naturally," with a feeling of ease, along with attempts build rapport, to avoid interactions that make the interactional partners feel uncomfortable (or ensuring that they feel so comfortable that they share private, intimate details about their lives). My interlocutors' projects are fascinating case studies in this regard, since part of what they strive to do is engineer-craft, fabricate, and sustain-a sense that all interactional partners are playing a symmetrical role in the encounter. The burden of making a conversational partner feel comfortable and feel an affinity for one another is the interactional duty of the listener in the case of psychiatric assessment (the mental health care practitioner). In the clinical encounters that my informants simulate for the purpose of gathering their data, research personnel must likewise maintain the illusion that they are listening for linguistic content, even though this is not the interpretive locus of their technologies. Linguistic labor is a kind of social repair work involving social reproduction: an interactional practice of custodial maintenance and reproducing the dynamics between active speaker and passive listener. Linguistic labor can involve an interactional partner enacting an emotional status-or intersubjective engagement with the semantic content of speech-through bodily gestures, positive minimal responses, or carefully crafted questions. My ethnography suggests that this work has both gendered and racialized dimensions. When they engineer rapport-building conversational agents, or deploy trust-building interactional strategies, my interlocutors draw on race and gender as resources for tuning the interactional partner's impression of how their speech is being taken up ad interpreted. 50 THE CLINICIANS, OF COURSE, ARE ROBOTS Popular discourse surrounding automation and human job loss often posits psychotherapy and other "caring" professional practices as the hard case against automation, the final stronghold. If humans leave the tending and mending of the psyche to robots and computers, the story goes, then this is a sign that humanity has collectively lost its ethico-moral way. The figure of the robot therapist heralds the end of intimacy, empathy, and thus, the end of authentically "human" care, because a robot can only provide a cheap parody of "the real thing" (see for example Turkle 2006; Turkle 2018). On the other hand, historian Elizabeth Wilson (2010) argues that people have long had therapeutic experiences that are artificial, simulated, and hinge on machinic mediation. She argues that the psychoanalytic encounter itself is an artificialo ne. The analyst works with the patient to simulate the relationships they hold in the outside world. The therapist is an avatar-a playable character, a stand-in-of authoritative figures in the patient's "real" life. Throughout the dissertation, and in Chapters 3 and 4 especially, I take up Wilson's invitation to explore the artificial dimensions of mental health care. I do so by searching for the technological and the mechanical as distinct features of care rather than its abject doubles. This will help to clarify why it is that certain caring professions, like social workers and medical technicians, are not as "professional" as others, like licensed clinical therapists and scientific researchers. Even if robot mental health care workers are imitations of the real thing, this begs the question: what exactly is the "real thing" that humans have designed them to imitate? What is the political economy of "real" (call it human-to-human, or face-to-face) psychotherapeutic encounters in the United States? Who is in a position to receive "real" mental health care? 51 Likewise, why exactly would the mental health services that encounters with an automated system might provide be such poor substitutes? If an automaton can never be a therapist, then whose care-care that is not quite enough-would an automaton in a mental health care context stand in forKeeping Suchman's in mind arguments about the figure of the machine as a disclosing agent for the human, my dissertation seeks to explore how intimacy, empathy, and care, have always been artificial-that is, crafted, fabricated, and animated by broader, intersecting histories and lines of power, rather than sentimental, individually motivated, and inherently good. The care of mental health care is a form of labor, but there is a hierarchy of value within the caring professions: at the top of the hierarchy is virtuous, empathic, and expert work, with mechanical, skilless, drudgery work at the bottom. 8 The organization of labor within the research teams, and the nature of the linguistic labor their technological prototypes required and performed, reflect and reproduce this hierarchy of labor. My interlocutors could not legally provide therapy to their research subjects as part of their participation in the study. Their participation was not meant to be a form of care. Likewise, the end goal of the study was not to produce a replacement for therapy, but an assistive technology to determine which patients should be getting therapy. Curiously, however, research subjects often reported something cathartic and soothing about their participation-something comforting in feeling listened to-even while they recognized that they were not participating in a ratified therapy session, and even while they understood that the interaction was machine-mediated (i.e., 8 Here, I am indebted to Evelyn Nakano-Glenn's analogous observations regarding the raced and classed dimensions of social reproduction in institutionalized service work. Feminist scholars use social reproduction "to refer to the array of activities and relationships involved in maintaining people both on a daily basis and intergenerationally" (1992: 1). Nakano-Glenn argues that, as the conditions of capitalism move social reproduction outside of he home, creating the "service sector," white women's ascension to more masculinist modes of production (i.e., gainful employment) was indebted to and only made possible by Black women and women of color taking up the lower levels of the ranks, i.e., taking white women's place in the home as domestic care-takers. This division is replicated in the service economy with white women holding managerial roles over Black women and women of color, who take up the "dirtier" work (data entry, cleaning, collecting blood samples, etc.) 52 that their weekly phone calls with staff member were being recorded for further analysis, that the animated character interviewing them through a screen was not a human but also not entirely a computer). For some, participation in the study opened up a small space for healing if not a momentary suspension of suffering. In other words, though they were not receiving therapy, there was something therapy-like about the encounter that impacted them (whether positively or otherwise). After all, given the demographics of the research subject population-veteran, homeless, disabled, living at or near poverty-participation in the study might have been the closest thing (or at least resembled most closely) the kind of mental health care resources to which they had access: psychiatric assessment. Psychiatric assessment involves sorting potential patients into categories: people who are well or not sick enough to warrant further medical attention, and people who might be showing signs of psychiatric distress and are therefore in need of diagnosis, which is the official, medico- legal designation of an illness category. Because diagnostic categories are embroidered into the U.S. health care system, diagnosis is, in historian Charles Rosenberg's words, a "bureaucratic passcode" that grants one access to insurance-covered treatment, from medications to psychotherapies. Diagnosis is therefore a more authoritative and supposedly more technical form of clinical judgment that requires more training, credentialing, and licensing than conducting psychiatric assessment-only certain kinds of medical professionals can make a diagnosis. Nevertheless, diagnosis is not the first gate of entry into the U.S. mental health care system. Assessment is. Though assessment is a more informal triage process, it is a necessary precursor to diagnosis, an obligatory point of passage. In this sense, assessment is just as much about directing people away from mental health services as it is directing people toward them, filtering out the "high priority" cases from the low ones. Therefore, psychiatric assessment (and 53 the people who conduct it) play an important role in resource-low public health settings, like emergency psychiatric hospitals, and an important role for those who cannot pay out of pocket for their treatment. If diagnosis is a passcode, then assessment (sometimes called "screening"), within the hierarchy of clinical labor and medical judgment, is a CAPTCHA. An acronym for Completely Automated Public Turing Test to Tell Computers and Humans Apart, a CAPTCHA is a challenge response that verifies a user is a human, rather than an autonomous piece of malicious software, before they can access a screen where they enter in more sensitive and usually more personal information. CAPTCHAs can take a variety of forms, but the tasks are designed to be simple (though debates about their accessibility abound). The user copies down a warped series of letters and numbers, or they must select squares that contain pictures of a storefront awning from an image overlaid with a quadrant, or they click a box that says I'm not a robot. Similarly, even before a potential patient can come face to face with a clinician who punches in the Diagnostic and Statistical Manual code that enables them access to insurance-covered care, they must submit to another coded game of matching, but one that is positioned as requiring less expertise on behalf of the administrator: fill out a form, circle a number between one and nine, provide a short answer to one of their questions. I'm not a robot becomes I'm not a malingerer. I understood the technical differences between assessment and diagnosis, and in many ways, I continued to take these definitions for granted during my fieldwork. My informants spoke of the difference between assessment and diagnosis in a mundane way, in their conversations about their career plans to pursue more schooling, or in the life history interviews we conducted when they described to me their years of interning and training. The difference between the two also came up often when attempting to correct public misunderstandings of their 54 technologies, or misrepresentation in the popular media by journalists. When they presented their prototypes to the general public, as in a demonstration similar to the one I witnessed at Affectiva's Summit, they were constantly defending themselves against outraged audience members who accused them of trying to replace human therapists with machines. Their alibi was straightforward and made sense to me at the time: they were not trying to automate therapy. Automating therapy would be impossible. Only a licensed, trained, credentialed human therapist could-and should-conduct therapy. They were merely trying to automate aspects of psychiatric screening, the triage process. In conversations with my informants and during showcases of the technology, I would provide an alibi of my own: their technologies were essentially computerized psychiatric inventories, a form of survey based on DSM categories that patients fill out in order to determine if they are in need of care. It was only upon reviewing my fieldnotes, even after the Affectiva Summit, that I noticed the erasure both my informants and I had made: psychiatric screening tools do not assess a patient on their own. A human, as part of their job, psychiatrically assesses a patient using the tool. How had I been unable to remember the person on the other side of the assessment encounter, the person-typically a social worker or a psychiatric nurse-whose job it is to interpret the patient's responses to the assessment questions or to calculate their score? I had forgotten that conducting psychiatric assessment is a professional practice because, in the context of my fieldwork, the people occupying the position and conducting the work that a social worker or nurse might were typically not professionals. The members of the research team who interacted directly with research subjects, conducting interviews with subjects, managing their production of speech, or listening to and qualitatively judging their speech, were typically the 55 members with the least amount of skill and training. Indeed, these are the kinds of tasks that PIs often assigned to me when I joined them as a research assistant: work that anyone, regardless of their credentials (or lack thereof) could conduct. Trying to automate psychiatric assessment honors the work of people doing psychiatric assessment, recognizing that what they do is draining and overwhelming. At the same time, to suggest that it is possible-necessary, even- for a machine to replicate the linguistic labor involved in conducting assessment devalues it and insinuates that it does not require the type of skilled, tacit knowledge that automated systems are incapable of capturing. What is care is a central question in my analysis. But insofar as psychiatric assessment- and the process of gathering data to automat assessment-is carefully bracketed from treatment, perhaps an even more central question is: what isn't care? My interlocutors' projects and people's responses to them-that they can provide cathartic release, but also, that they are morally unconscionable-help me to parse through the different modes and meanings of care, "unsettling care" as a stable analytic, troubling the notion that care is always innocent, and always affectively motivated, and essentially human (Aulino 2012, 2016; Murphy 2015). In so doing, I call attention to what I call "para-care," practices that are care-like, care- ful, but cannot be medically or legally ratified as care. Para-care is work that occurs at the margins and edges of biomedicine writ proper, even while it has (both affirming and harmful) impacts on its recipients, and even as it seems to closely resemble the official, formalized and sanctioned care of biomedicine. To closely follow one's IRB protocol and avoid administering treatment, or to avoid intervening when a research subject expresses suicidal ideation, are both care-ful practices. Researchers could "take care" with or without "caring for." As described in Chapters 3 and 4, for instances, some researchers ignored the details of subjects' lives as a means 56 through which to refuse dehumanizing them. Para-care as an analytic can help better capture and describe this peripheral work, recuperating informal care-like practices that happen beyond and outside of the umbrella of credentialing or the bureaucratic structures and strictures of professionalization. Thus, I am less concerned about the hypothetical dawning of a techno-dystopic future in which automated systems "replace" human-lead treatment or triage work, although I hope my dissertation shows the limitations of treating mental health care as a narrowly scientific, technological problem that can be "hacked," rather than a structural problem that has as much to do with power and capitalism as it does about facts, numbers, and knowledge. I am much more concerned with where the line between therapy and therapeutics-technical and mechanical, human and machine, care and not-care-is drawn. I am much more concerned with who this line crosses over and erases in the here and now. Para-care helps to illuminate this in between space, holding its occupants accountable while also uplifting their work as meaningful. For the boundary work of what care is and isn't has threatens to devalue (and justify the defunding) of administrative professions within mental health care and threatens to devalue (and dissolve resources for) the people who live on para-care as their primary form of treatment. By refusing to take care as "other to technology" (Mol 2008: 5), I attempt to repair the rupture between the two. BIG DATA AND COMPUTATIONAL PSYCHIATRY Attempts to use vocal qualities of speech to better understand the neural mechanisms of mental illness falls under a broader research paradigm, called Computational Psychiatry. As Chapter 1 will discuss, Computational Psychiatry is more connected to prior epistemological eras in 57 American psychiatry then it might at first appear. It is helpful, however, to briefly review a dominant narrative about the emergence of Computational Psychiatry with regards to trends in American psychiatry to move away from the DSM and develop "novel" methodologies for studying mental illness. In this final section, I give a brief account of one such federally funded project, the Research Domain Criteria (RDoC), aimed at integrating technics and technologies from engineering and computer science in order to conduct research that will help psychiatry move away from DSM. The rise of Computational Psychiatry is, in many ways, a response to the RDoC project, and to increasing demands for alternative methods for understanding the connection between pathological brains and pathological states and behaviors. In September 2015, Thomas Insel announced that he would end his 13-year term as director of the National Institute of Mental Health (NIMH) and join the Life Sciences unit of Google, a position he subsequently left in 2017 to begin his own startup. Insel's tenure at NIMH had been marked by his controversial efforts to unseat the Diagnostic and Statistical Manual of Mental Disorders (DSM) as the field's paramount reference, the text through which American psychiatrists are trained to interpret their patients' symptoms or even identify what constitutes a symptom. DSM is the brick and mortar of a far-reaching "diagnostic infrastructure" in the US (Lakoff 2005: 256). Research centers and entire journals have been established for the express purpose of exploring a single diagnostic category (schizophrenia, bipolar disorder, depression, anxiety, etc.) and these categories, for many patients, have come to define the way they live their lives and understand themselves. Nevertheless, like a growing number of researchers, Insel insisted that DSM's categories are insufficient because they are not based on any kind of biological measures linked to the underlying mechanisms of psychopathology, which remain poorly understood. In justifying his 58 jump from the public to private sector, Insel contended that the engineers of Silicon Valley possess exactly the skills that institutions like NIMH lack: the ability to capture and process behavioral data at an unprecedented scale, especially data that has never been studied in tandem with the presence of mental illness. According to Insel, if anything, DSM had fortified the gap between basic science research and applied research, causing more harm than good in the process. Subsequent editions of DSM, including the most current edition (DSM-5) more or less resemble DSM-III in terms of their structure and due to their focus on observable symptoms rather than disease etiology. Editions of the DSM have been published with little to no recourse to research findings in neuroscience. Many point to the failure of DSM to absorb or reflect neuroscience findings as a mark of contemporary psychiatry's need for another, paradigm- shifting overhaul. For instance, in a February 2014 post to his Director's Blog on the NIMH webpage, in which he summarizes the new RDoC funding announcements, Insel declared that "Industry has reduced investments in medications for mental disorders and payers are raising questions about the quality of evidence for psychosocial treatments. We hope that this new approach to clinical trials [RDoC] will set us on a course to having the science base necessary for generating effective new therapeutics and validating those we have now.", In a co-authored article published the same year, Insel pointed to a number of other studies indicating that, despite advances made "in modem biology, especially contemporary cognitive, affective, and social neuroscience," along with advances in neuroimaging technology that enables the observation of brain activation and electrical brain activity, The American Psychological Association, which publishes the DSM, has consistently been unable to incorporate these findings into DSM, including in DSM-5 (Insel and Cuthbert 2015:499). They cite World Health Organization statistics on morbidity caused by mental health disorders-over 59 800,000 suicides each year globally, most of which were linked to mental illness (2014: 499)- and suggest that these fatalities could have been avoided if there were more effective treatments available. The inefficacy of treatments, they argue, is due to the fact that DSM does not describe biologically based pathologies. There is no guarantee, then, that treatments developed in studies conducted using DSM categories target any kind of biological mechanism, because there is no guarantee that people who share the same diagnosis also share some kind of biological likeness. For these reasons-among many others-Insel concluded that DSM is both an outdated research tool and an unethical one, insisting that "patients with mental illnesses deserve better" than what DSM can give them (Insel 2014). RDoC is the latest attempt to make it more objective, anchored in biology. RDoC is not itself a diagnostic nosology. Rather, it is a template listing domains of research that investigators can use to design and test hypotheses about the mechanisms of psychopathology, with, as Insel says in a 2012 post to his Director's Blog on the NIMH website, the "near-term goal" of restructuring and refocusing research away from DSM, despite the primary role the manual plays across multiple sectors of contemporary life9. In a commentary piece on the difference between RDoC and DSM published in Nature Reviews Neuroscience, B.J. Casey, developmental psychobiologist, and Francis S. Lee, research psychiatrist, clarify that the purpose of RDoC is to "facilitate the translation of basic neuroscience research findings to clinical diagnosis and treatment," although the translation stage is expected to come much farther down the line (Casey et al 2013:812). Instead of "working backwards" by trying to describe the neurobiological basis of DSM-dictated diagnostic categories, "the RDoC approach uses our current understanding of brain-behavior relationships as the starting point and relates these to clinical phenomenology" (ibid.) ' http://www.n1inh.nih.gov/about/director/2012/research-domain-criteria-rdoc.shtii, accessed on Janurary 12, 2015. 60 In a 2017 commentary piece in Nature, Insel invites researchers to "join the disruptors of health science" and leave academia for the technology industry sector, where there are fewer regulations and greater financial incentives (and resources) to move fast and produce results- focused interventions. Moreover, there are engineers. Engineers have the expertise for developing methods to capture and analyze vast quantities of data, data which is inaccessible in academic where restrictions regarding privacy and confidential place limits on how much (and what kinds) of data can be gathered and stored. Within his start-up, Insel is a champion of "digital phenotyping," also known as searching for "digital biomarkers" (Carey 2019; Dagnum 2018). Digital phenotyping is one instantiation of Computational Psychiatry. Rather than build research studies, with an aim of producing an intervention, based on previous studies about the nature of mental illness, the data-driven approach of Computational Psychiatry turns on gathering as much data as possible, regardless of the data's relationship to conventional ideas about mental illness. Hence, Insel and others, including the Midwestern University research group described in Chapter 4, have turned to mobile phones as a research tool, focusing their analysis on the data users inadvertently transmit simply by using their phones (see also Brandt and Stark 2018). At surface level, Computational Psychiatry and all its various iterations rehearses the field's enduring biological essentialism and its positivist longings, with a twist. Its advocates strive to stabilize mental illnesses with an appeal to a mode of objectivity committed to reproducing the truths of nature passively, with as little intervention from the scientist's hand, mind, or heart as possible. Champions of Computational Psychiatry like Insel, who warmly accept its "biotechnical embrace" (DelVecchio Good 2002) seek to achieve mechanical objectivity through a "big data" approach. In theory, if data is gathered at a high enough 61 volume-if it's "big" enough-patterns will emerge from the data, and correlations that have always been there, under our noses (or our ears) will become evident, correlations that might even cut across the conventional boundaries between diagnostic categories set down in DSM. Another larger aim of my dissertation is to challenge this notion of patterns emerging "on their own," a discourse that many people who participate in big data projects are critical of. I show the gap between the discourse of Computational Psychiatry and how things operate on the ground; this includes giving voice to research practitioner's heterogeneous beliefs and attitudes toward their work and its discursive promises. STRUCTURE OF THE DISSERTATION Chapter 1, Computational Psychiatry's Coded Past, historicizes the preconditions and pre- occupations that foreground the ethnographic case studies to follow. I review, and then read against, primary and secondary source material narrating the story of American psychiatry's infamous "paradigm shift" in order to locate connections between previous movements to re- make psychiatry and the efforts surrounding Computational Psychiatry in the present. Tracing ideas about the supposed stability of biomedical things back to a pivotal point of change in Western psychiatry underscores how ideas about the biomedical resemble and are co-constituted by their ideas about the computational. I show how the exchange of metaphors between the biological and the computational, an exchange that other scholars have observed in the history of the life sciences, is a key feature of the history of North American psychiatry as well. This metaphorical exchange continues to rhetorically inform the design and development of 62 Computational Psychiatry research, and the division of labor within research teams, in the contemporary moment. Chapter 2, "Talking Heads: Brains, Bodies, and Vocal Biomarkers," is the first ethnographic chapter in the dissertation. It follows the interdisciplinary team at East Coast University, which is situated in a neuroscience department. This team attempts to better understand the neural mechanisms underlying depression through micro-level features of the voice, in their words, "using the voice to understand the mind." In theory, a "vocal biomarker" of depression is a sound that cuts directly to biological processes, so directly that its mere presence stands in for and is commensurate with a pathological brain state. By focusing on the maintenance work involved in data collection-especially efforts to discipline the body and speech of research subjects-I show how the necessary pre-condition for studying speech as a "natural object" is difficult (if not impossible) to maintain in practice. The search for vocal biomarkers, a radically im/mediate sign, requires hyper-mediation. In Chapter 3, "Do Androids Dream of Electric Speech?" I move to West Coast University, with a team of researchers working to build a Virtual Human Interviewer (VHI) system supposedly capable of conducting psychiatric assessment in a way that a human never could: with far more accuracy, and without ever burning out. Fueled by military funding, this team seeks to build a tool that can address post-traumatic stress disorder (PTSD) among veteran populations. The virtual human's rapport-building, interactional infrastructure and its real-time interactions with research subject, are propped up, sustained and animated by human work, and not just the classificatory work of labeling data and meta-data. The VHI is also sustained by and made possible through the stitching together of culturally and historically specific ideologies of 63 language, mind, interaction, race, and gender, which researchers embed in its infrastructure as they develop, test, and maintain it. Chapter 4, "Listening Like a Computer," focuses on a team of researchers attempting to build a cell phone application that can predict when a person with bipolar disorder will have a manic episode based on changes in the quality of their speech. In addition to a case study of the infrastructural arrangements, categorizing practices, and labor required to make digital phenotyping possible, in this chapter, I focus on the figure of bipolar disorder as a mood disorder that causes audible changes in the quality of speech. The engineers and clinical team members butted up against the limits of what listening can capture from the voice, implicitly challenging the biological essentialism of the project in their day-to-day dealings with the research subjects' voice data. Part of what is at stake in the BPU's work is the semantic ambiguity and polysemy not only of emotional terms like "mania" and "depression," but of the term "listening" itself, especially with respect to agency, responsibility, and professional codes of ethics. In the Conclusion, I suggest that the ethico-moral frameworks and conundrums that characterize the teams' interventions and enactments of listening, language, assessment, and care, are strange but also familiar. They bear an uncanny resemblance to the broader milieu of mental health care and digital media in the United States. Altogether, the proceeding ethnographic chapters illustrate the work that Euro-American language ideologies are doing for psychiatry and the mental health care sector in the age of digital reproduction. I conclude by exploring how the ideologies themselves may (or may not) be shifting in the current moment. 64 References Affectiva. (accessed July 12, 2019). Aulino, Felicity. 2012. "Senses and Sensibilities: The Practice of Carein Everyday Life in Northern Thailand." Doctoral dissertation, Harvard University. Aulino, Felicity. 2016. "Rituals of Care for the Elderly in Northern Thailand: Merit, Morality, and the Everyday of Long-Term Care." American Ethnologist 43(1):91-102 Benjamin, Ruha. 2016. "Informed Refusal: Toward a justice-based bioethics." Science, Technology, and Human Values 41(6): 967-990. Benjamin, Ruha. 2019. Race After Technology: Abolitionist Toolsfor the New Jim Code. Cambridge, UK: Polity Press. Berry, Maya J., Claudi Chivez Argilelles, Shanya Cordis, Sarah Ihmoud, Elizabeth Veliscuez Estrada. 2019. "Toward a Fugitive Anthropology: Gender, Race, and Violence in the Field." Cultural Anthropology 32(4): 537-656. Brandt, Marisa and Luke Stark. 2018 "Exploring Digital Interventions in Mental Health: A Roadmap," in Interventions: Communication Research and Practice (International Communication Association 2017 Theme Book). Adrienne Shaw and D. Travers Scott, eds. Pp. 167-182. Bern: Peter Lang. Buolamwini, Joy. 2016. "InCoding-In the Beginning." Medium, May 16. (accessed 13 July, 2019). Carr, E. Summerson. 2010a. Scripting Addiction: The Politics of Therapeutic Talk and American Sobriety. Princeton: Princeton University Press. Carr, E. Summerson. 201Ob. "Enactments of Expertise." Annual Review ofAnthropology 39:17- 32. Carr, E. Summerson and Yvonne Smith. 2013. "The Poetics of Therapeutic Practice: Motivational Interviewing and the Powers of Pause." Culture, Medicine and Psychiatry 38:83- 114. Carey, Benedict. 2019. "California Tests a Digital 'Fire Alarm' for Mental Illness." New York Times, June 17. (accessed August 4, 2019). Chion, Michael. 1990. Audio-Vision: Sound on Screen. Claudia Gorbman, trans. New York: Columbia University Press. 65 Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms. 3 rd Edition. Cambridge: MIT Press. Dagum, Paul. 2018. "Digital biomarkers of cognitive function." NPJ Digital Medicine1 (10). David, E.E. Jr. "Bionics or Electrology? An Introduction to the Sensory Information Processing Issue." IRE Transactionso n Information Theory 8(2): 74-77. DelVecchio Good, Mary-Jo. 2002. "The Biotechnical Embrace." Culture, Medicine, and Psychiatry 25(4): 395-410. Duranti, Alessando. 1993. "Truth and Intentionality: Towards an Ethnographic Critique." CulturalA nthropology 8(2):214-245. Eidsheim, Nina Sun. 2019. The Race of Sound: Listening, Timbre, and Vocality in African American Music. Durham: Duke University Press. Erlmann, Viet, ed. 2004. Hearing Cultures: Essays on Sound, Listening, and Modernity. New York: Bloomsbury. Feld, Steven and Donald Brenneis. 2004. "Doing anthropology in sound." American Ethnologist 31(4): 461-467. Feld, Steven. 2012. Sound and Sentiment: Birds, Weeping, Poetics, and Song in Kaluli Expression. 3 0 th Anniversary Edition. Durham: Duke University Press. Feld, Seven. 2015. "Acoustemology." In Keywords in Sound Studies. David Novak and Matt Sakakeeny, eds. Pp. 12-21. Durham: Duke University Press. Faudree, Paja. 2012. "Music, Language, and Texts: Sound and Semiotic Ethnography." Annual Review ofAnthropology 41: 519-536. Gordin, Michael. 2015. Scientific Babel: How Science Was Done Before and After Global English. Chicago: University of Chicago Press. Gupta, Akhil and James Ferguson. 1002. "Beyond 'Culture': Space, Identity, and the Politics of Difference." CulturalA nthropology 7(1): 6-23. Gusterson, Hugh. 1996. Nuclear Rites: A Weapons Laboratorya t the End of the Cold War. Berkeley: University of California Press. Hamid, Ekbia R. and Bonnie Nardi. 2017. Heteromation and Other Stories of Computing and Capitalism. Cambridge: MIT Press. Hannerz, Ulf. "2003 Being there...and there...and there! Reflections on multi-sited ethnography." Ethnography 4(2):201-216. 66 Haraway, Donna. 1991. Simians, Cyborgs, and Women: The Reinvention ofNature. London: Routledge. Harkness, Nicholas. 2013. Songs of Seoul: An Ethnography of Voice and Voicing in Christian South Korea. Berkeley: University of California Press. Harkness, Nicholas. 2015. "The Pragmatics of Qualia in Practice." Annual Review of Anthropology 44:573-89. Hirshkind, Charles. 2006. The Ethical Soundscape: Cassette Sermons and Islamic Conterpublics. New York: Columbia University Press. Inoue, Miyako. 2018. "Word for Word: Verbatim as Political Technologies." Annual Review of Anthropology 47: 271-32. Insel, Thomas. 2017. "Join the disruptors of health science." Nature 551: 23-26. Irani, Lilly. 2017. "'Design Thinking': Defending Silicon Valley at the Apex of Global Hierarchies of Labor." Catalyst 4(1): 1-9. James, Erica. 2004. "The Political Economy of 'Trauma' in Haiti in the Democratic Era of Insecurity." Culture, Medicine, and Psychiatry 28: 127-149. Jasanoff, Sheila and Sang-Hyun Kim. 2015. Dreamscapes of Modernity: Sociotechnical Imaginariesa nd the Fabricationo fPower. Chicago: University of Chicago Press. Jones, Graham M. and Bambi B. Schieffelin 2009. "Enquoting Voices, Accomplishing Talk: Uses of Be + Like in Instant Messaging." Language & Communication 29(1): 77-113. Keane, Webb. 1997. "Religious Language." Annual Review ofAnthropology 26:47-71. Keane, Webb. 2008. "Others, Other Minds, and Others' Theories of Other Minds: An Afterward on the Psychology and Politics of Opacity Claims." Anthropological Quarterly 81(2):473-482. Kockelman, Paul. 2017. The Art of Interpretationi n the Age of Computation. Oxford, UK: Oxford University Press. Lipartito, Kenneth. 1994. "When Women Were Switches: Technology, Work, and Gender in the Telephony Industry, 1890-1920." The American HistoricalR eview. 99(4): 1075-1111, Manyika, James, Michael Chui, Mehdi Miremadi, Jacques Bughin, Katy George, Paul Willmott, and Martin Dewhurst. 2017. A Future that Works: Automation, Employment, and Productivity." McKinsey Global Institute Executive Summary. McKinsey&Company. < httos://www.mckinsev.com/-/imedia/mckinsev/fcatured%/20insi2hts/Di]ital%/20Disrution/Harn 67 essing%20automation%20for%20a%20future%20that%20works/MGI-A-future-that-works- Executive-summary.ashx> Marcus, George E. 1995. "Ethnography in/of the World System: The Emergence of Multi-Sited Ethnography." Annual Review ofAnthropology 24:95-117. Marsilli-Vargas, Xochitl. 2014. Listening genres: The emergence of relevance structures though the reception of sound. Journal ofPragmatics( 69):42-51. McGlotten, Shaka. 2016. "Black Data." In No Tea, No Shade: New Writings in Black Queer Studies. E. Patrick Johnson, ed. Pp. 262-286. Durham: Duke University Press. Mills, Mara. 2010. "Do Signals Have Politics? Describing Abilities in Cochlear Implants." In The Oxford Handbook ofSound Studies. Trevor Pinch and Karin Bijstervled, eds. Pp. 320-346. Oxford, UK: Oxford University Press. Mills, Mara. 201Ia. "Deaf Jam: From Inscription to Reproduction to Information." Social Text 28(1): 35-58. Mills, Mara. 201lb. "On Disability and Cybernetics: Helen Keller, Norbert Wiener, and the Hearing Glove." differences 22(2-3): 74-111. Mitchell-Kernan, Claudia. 1972. "Signifying and marking: Two Afro-American speech acts." In Directions in Sociolinguistics. John J. Gumperz and Dell Hymes, eds. Pp. 161-179. New York: Holt, Rinehart and Winston. Mol, Annmarie. 2008. The Logic of Care: Health and the Problem ofPatient Choice. London: Routledge. Murphy, Michelle. 2015. "Unsettling Care: Troubling transnational itineraries of care in feminist health practices." Social Studies ofScience 45(5): 717-737. Nakamura, Lisa. 2009. "Don't Hate the Player, Hate the Game: The Racialization of Labor in World of Warcraft." CriticalS tudies in Media Communication 26(2): 128-144. Nakamura, Lisa. 2014. "Indigenous Circuits: Navajo Women and the Racializaton of Early Electronic Manufacture." American Quarterly 66(4): 919-941. Nakano Glenn, Evelyn. 1992. "From Servitude to Service Work: Historical Continuities in the Racial Division of Paid Reproductive Labor." Signs 18(1): 1-43. Nguyen, Vinh-Kim. 2010. The Republic of Therapy: Triage and Sovereignty in West Africa's Time ofAIDS. Durham: Duke University Press. Perakyla, Anssi. 1995. AIDS Counseling. Cambridge, UK: Cambridge University Press. 68 Petryna, Adriana. 2009. When Experiments Travel: Clinical Trials and the Global Searchfor Human Subjects. Princeton: Princeton University Press. Philip, Kavita, Lilly Irani, and Paul Dourish. 2012. "Postcolonial Computing: A Tactical Survey." Science, Technology, & Human Values 37(1): 3-29 Picard, Rosalind. 1995. "Affective Computing." M.I.T. Media Laboratory Perceptual Computing Section Technical Report 321. Picard, Rosalind. 1997. Affective Computing. Cambridge: MIT Press. Picard, Rosalind. 2003. "Affective computing: challenges." InternationalJ ournalo f Human- Computer Studies 59: 55-64. Reik, Theodor. 1964. Voicesfrom the Inaudible: the Patients Speak. Farrar, Straus, New York. Rhodes, Lorna. 1995. Emptying Beds: The Work of an Emergency Psychiatric Unit. Berkeley: University of California Press. Roosth, Sophia. 2017. Synthetic: How Life Got Made. Chicago: University of Chicago Press. Rosa, Jonathan. 2019. Looking Like a Language, Sounding Like a Race: Raciolinguistic Ideologies and the Learning ofLatinidad. Oxford, UK: Oxford University Press. Rosaldo, Michelle Z. 1982. "The things we do with words: Ilongot speech acts and speech act theory in philosophy." Language in Society 11(2):203-237. Schieffelin, Bambi B. 2008. "Speaking Only Your Own Mind: Reflections on Talk, Gossip, and Intentionality in Bosavi (PNG)." Anthropological Quarterly 81(2):432-442. Seaver, Nick. 2017. "Algorithms as culture: Some tactics for the ethnography of algorithm systems." Big Data and Society 1-17. Seaver, Nick. 2019. "Knowing Algorithms." In digitalSTS: A Field Guidefor Science & Technology Studies. Janet Vertesi and David Ribes, eds. Pp. 412-422. Princeton: Princeton University Press. Silverstein, Michael. 1985. "On the pragmatic 'poetry' of prose." In Meaning, Form and Use in Context. D. Schiffrin, ed. Pp. 181-199. Washington: Georgetown University Press. Silverstein, Michael. 1998. "The Uses and Utility of Ideology: A Commentary." In Language Ideologies: Practicea nd Theory. Bambi B. Schieffelin, Kathryn Woolard, and Paul Kroskrity, eds. Pp. 123-145. New York: Oxford University Press. Silverstein, Michael. 2012. "Denotation and the pragmatics of language." The Cambridge Handbook ofLinguistic Anthropology. N.J. Enfield, Paul Kockelman and Jack Sidnell, eds. Pp. 69 128-157. Cambridge: Cambridge University Press. Simpson, Audra. 2007. "On Ethnographic Refusal: Indigeneity, 'Voice,' and Colonial Citizenship." Junctures 9: 67-80. Smith, Benjamin. 2005. "Ideologies of the speaking subject in the psychotherapeutic theory and practice of Carl Rogers." Journal ofLinguistic Anthropology 15:258-72. Stasch, Rupert. 2008. "Knowing Minds is a Matter of Authority: Political Dimensions of Opacity Statements in Korowai Moral Psychology." Anthropological Quarterly 81(2):443-453. Stevens, Hallam. 2013. Life Out of Sequence: A Data-DrivenH istory ofBioinformatics. Chicago: University of Chicago Press. Suchman, Lucy. 2007. Human-Machine Reconfigurations: Plans and SituatedA ctions. 2d Edition. Cambridge, UK: Cambridge University Press. Sunder-Rajan, Kaushik. 2006. Biocapital:T he Constitution of Postgenomic Life. Durham: Duke University Press. Taussig, Karen-Sue, Klaus Hoeyer, and Stefan Helmreich. 2013. "The Anthropology of Potentiality in Biomedicine: An Introduction to Supplement 7." CurrentA nthropology 54(S7): S3-S14. Taylor, Astra. 2018. "The Automation Charade." Logic 5: < https://ogicnag.io/failure/the- automation-charade> (accessed August 5, 2019). Thakor, Mitali. 2018. "Digital Apprehension: Policing, Child Porn, and the Algorithmic Management of Innocence." Catalyst: Feminism, Theory, Technoscience 4(1): 1-16. Throop, Jason. 2010. Suffering and Sentiment: Exploring the Vicissitudes ofExperience and Pain in Yap. Berkeley: University of California Press. Turkle, Sherry. 2006. "A Nascent Robotics Culture: New Complicities for Companionship." AAAI Technical Report Series, July. Turkley, Sherry. 2018. "There Will Never Be An Age of Aritifical Intimacy." The New York Times, August 11. < https://www.nytimes.com/2018/08/ I /opinion/there-will-never-be-an-age- of-artificial-intimacy.html> (accessed August 4, 2019). Vehvilainen, Sanna. 2008. "Focus on the patient's action: identifying and managing resistance in psychoanalytic interaction." In ConversationA nalysis and Psychotherapy. Anssi Perdkyld, Charles Antaki, Sanna Vehvilainen, Ivan Leudar, eds. Pp. 120-38. Cambridge, UK: Cambridge University Press. 70 Visweswaran, Kamala. 2003. Fictions ofFeminist Ethnography. Minneapolis: University of Minnesota Press. Vrecko, Scott. 2010. "Birth of a brain disease: science, the state, and addiction neuropolitics." History of the Human Sciences 23(52):52-67. Wilce, James M. 2009. "Medical Discourse." Annual Review ofAnthropology 38:119-215. Wilf, Eitan. 2019. "Separating noise from signal: The ethnomethodological uncanny as aesthetic pleasure in human-machine interactions in the United States." American Ethnologist 46(2): 202- 213. Wilson, Elizabeth. 2010. Affect andArtificialI ntelligence. Seattle: University of Washington Press. Winner, Langdon. 1980. "Do Artefacts Have Politics?" Daedalus 109(1): 121-136. Wynter, Sylvia. 2003. "Unsettling the coloniality of being/power/truth/freedom: Towards the human, after man, its overrepresentation: An argument." New CentennialR eview 3(3): 257-337. 71 Chapter 1: Computational Psychiatry's Coded Past "Perhaps in parasitology, in orthopedics, and in computer technology one can escape from humanism, but not in psychiatry...it has more in common with the inevitable ambiguity of great drama than with the DSM-III's quest for algorithms compatible with the cold binary logic of computer science" - (Vaillant 1984: 544). "We used laugh and kind of say, okay, we'll stop trying to teach the computer to act like a clinician...we're trying to teach the clinician to apply logical rules, kind of more like a computer." - Jean Endicott, DSM-III Task Force member, to Jackie Orr (2006: 245) By popular and scholarly accounts, and according to the psychiatric researchers and practitioners I spoke to during my preliminary fieldwork, the publication of the third edition of the Diagnostic and Statistical Manual of Mental Disorders (hereafter DSM-III) in 1980 marked a significant turning point in the history of North American psychiatry. As the dominant narrative goes, its publication both represented and catalyzed a radical break from the old epistemological guard of psychoanalysis: once DSM-II went out into the world, psychiatry as it was practiced in the U.S. and exported elsewhere had changed (Spitzer 2001; Sanders 2011). In the subtitle of her book dedicated to telling the story of DSM-II1's creation, medical historian Hannah Decker (2013) goes so far as to equate the diagnostic manual's third, official revamping with a "conquest of American psychiatry" (my emphasis). Indeed, the roughly 450-page text-330 or so pages longer than its predecessor-initiated dramatic, far-reaching change in the United States. DSM- III was a powerful document because of how seamlessly it fused with and reinforced the bureaucratic logics and logistics of biomedicine, which insurance companies and pharmaceutical manufacturers had increasingly come to dictate. As Jackie Orr puts it, "in the entangled realms of psychiatry and psychotherapy, medicine, the pharmaceutical industry, the legal system, the 72 insurance industry, social and self-identity, and popular discourse," DSM-III birthed "a new order of things" (Orr 2010: 354). This was accomplished, in part, because DSM-I1I standardized the language of psychiatry in a way that it had never been before, linguistically aligning the interaction between clinician and client (Semel 2013) across the triad of the psychiatric encounter-from assessment/screening, to diagnosis, to monitoring. The symptom criteria in DSM-III and in subsequent editions structured the questions that clinicians asked patients, and determined how to interpret the content of patient responses. Previous iterations of DSM (like DSM-11, published in 1968) grounded the diagnostic criteria of mental illnesses in psychoanalytic theories of disease causality-thwarted libidos, overly cathected egos, and so on. One of the crowning achievements and most controversial moves of the DSM-1II task force, fronted by Dr. Robert Spitzer, was to expel psychoanalysis from the manual as much as possible in favor of defining and grouping illnesses according to symptoms that cohorts of research subjects seemed to share (APA 1980). The task force wanted to categorize mental illnesses based on the symptoms that patients expressed and that any and every clinician, regardless of their theoretical training, could identify (Spitzer and Sheehy 1976). For Spitzer and his colleagues, this emphasis on symptom phenomenology put their work in lock step with Emil Kraepelin, the 1 9th century German physician who was both a foundational figure to North American psychiatry and a foil to Freud (Decker 2007). Kraepelin drew his classificatory schema from long-term observations of patients' suffering: detailed descriptions of his patient's hallucinations, glossolalia, the contents of their obsessional thoughts, bodily ticks, attempts at self-harm, and so on. If the goal of Freud's disciples was to identify sublimated connections between the known and unknown selves of the analysand, the goal of the 73 Kraepelinians was to scrutinize and catalogue the various behavioral manifestations, grand and minute, of psychic pathology. Task force members deemed their neo-Kraepelinian, phenomenological approach an "atheoretical" one (Feigner 1979; Bayer and Spitzer 1985). Their ideological approach aimed to realize a genre of objectivity that Daston and Galison (2007) call "truth to nature," featuring images of what symptoms of mental illnesses looked like in a way that was as faithful to the biologically processes possible, putatively unmediated by any dogma or theory. This was, at least, their ideal, the form for which the neo-Kraepelinians strived. The success of their endeavor may be debated. For instance, the task force's decision to include "psychogenic pain disorder" and "ego dystonic homosexuality"" in the manual not only bespeaks the lingering presence of psychoanalytic etiology but also signals that DSM-II1 remained a technology for policing social deviance rather than, as the task force had wished, a magnifying glass for identifying biologically based pathology. Moreover, as others then and now have pointed out, the task force's "atheoretical" approach in and of itself posited a theoretical orientation toward making sense of the world (see Klerman 1977; Rosenberg 2007; Orr 2010). Indeed, task force members drew from very specific interpretations of how other biomedical practices (like orthopedics and oncology) conceived of and investigated their objects of study: namely, as stable, discretely defined, material phenomena that could be extracted from their contexts of occurrence. " The task force did indeed remove the category of "sexual orientation disturbance" in 1973, a diagnostic criterion that overtly pathologized homosexuality. Unlike "sexual orientation disturbance," "ego dystonic homosexuality" corresponds with the distress one feels upon the realization of one's attraction to same-gendered people, or due to feelings of shame after participating in shame-gendered sexual acts. Spitzer and company's reasoning was that this category would allow people to seek counseling for these distressing, shameful feelings. Critics, however, argue that this diagnostic criterion leaves room for the pathologization of non-heterosexuality, while also keeping the door open for conversion therapy as one method of treatment (i.e., eliminate shame and distress by converting sexual and romantic desires from same-gendered to different-gendered). 74 In this chapter, I historicize the preconditions and pre-occupations that foreground the ethnographic case studies to follow. I do so by reviewing and reading against secondary source material narrating the story of American psychiatry's infamous "paradigm shift," along with research articles produced during this time. Tracing ideas about the supposed stability of biomedical things back to a pivotal point of change in Western psychiatry underscores how task force members' ideas about the biomedical resemble and are co-constituted by their ideas about the computational. In other words, the rise of what Jackie Orr calls "biopsychiatry"-which "embraces a medicalized model of mental disorders while claiming a scientific status for contemporary psychiatric practices of diagnosis and treatment" (2010: 345)-coincides with the introduction of computers and other machines into psychiatry in the U.S. Thus, I show how the exchange of metaphors between the biological and the computational, an exchange that other scholars have observed in the history of the life sciences (Fox-Keller 1995; Helmreich 1998; Hayles 1999; Kay 2000; Erickson et al 2013) is a key feature of the history of North American psychiatry as well (see also Martin 2007). This metaphorical exchange continues to rhetorically inform the design and development of Computational Psychiatry research, and the division of labor within research teams, in the contemporary moment. My analysis does not focus on the practice of psychiatry, so I treat neither the administration of psychiatric care, nor acts of assessment or diagnosis themselves. Instead, I trace out the ideological strands and rhetorical shifts within U.S. psychiatry, articulating them in order to illustrate how they continue to motivate research projects that fall under the domain of Computational Psychiatry, such as my informants' research. My aim in this chapter is two-fold. First, I seek to clarify how the figure of the machine (especially the computer) operates within primary and secondary sources surrounding the publication of DSM-III, underscoring the way 75 that efforts to introduce machines into the diagnostic encounter serve as switching points at which ideas about the computational and the biological come together, or are co-produced (Jasanoff 2004). Throughout the 2 0th century up until the present, psychiatric researchers and practitioners recognized that mental illnesses are frustratingly slippery "moving targets" that "emerge in the encounter between patients' subjective reports and a clinician's interpretive schemes" (Lakoff 2005: 2), thereby resisting rigid, definitional boundaries and consistency across person, place, or time. Running in tandem with this frustration have been attempts to overcome it using the discourse of the machines (and often, real machines) with the notion in mind that machines have an inherent, essential capacity for locating the biomaterial underpinnings of mental illnesses and settling them into discrete, specific units. Machine reason (as disinterested, binary, and less costly) is positioned against human reason (as haphazard, infinitely varied, and expensive). Upon closer examination, however, rhetorics and enactments of machine logics rely on the work of para-professional administrative laborers, from typists, to assistants, to technicians (see Schaffer 1994). Secondary source material in particular tends to shift focus away from this supportive, administrative labor and the crucial role these actors played in the computerization of psychiatry. Recovering such persons and their labor as well as underscoring both their presence and importance can help to intervene on contemporary debates about the capacity for machines to "replace" human labor-debates in which my informants often participate. Even in the earlier years of the history of computing in the U.S., what looks at first like human "replacement" was human re-placement: instances in which humans are situated to a less visible positions in the production pipeline, or, due to the status of the labor they perform or the jobs they hold, assimilated themselves to the figure of machine. 76 The second aim of this chapter is to historically situate the so-called paradigm shift that "Computational Psychiatry" enacts. Popular discourse, such as the language used in a 2017 article in the MIT Technology Review, deems Computational Psychiatry an "emerging science" facilitating a "quiet revolution" that turns away from the past. As discussed in the Introduction, Computational Psychiatry involves the use of artificial intelligence-enabled analysis methods to pin down signs of psychopathology in biological processes, especially neuronal activity. Researchers and journalists tend to define Computational Psychiatry against Western psychiatry's traditional hypothesis or theory-driven approach, in part as a way to signal its novelty (and thus its lack of historicity). In conventional research contexts, researchers use existent hypotheses or theories about mental illness based on prior scholarship to structure their research questions. Conversely, with the data-driven approach that is the hallmark of Computational Psychiatry, researchers apply "theoretically agnostic data-analysis methods from machine learning (ML) broadly construed (including, but extending, standard statistical methods)" to structure their research (Huys et al 2016: 404; my emphasis). I argue that Computational Psychiatry, rather than marking a clean break from the past, represents a re- instantiation, retrenchment or even reiteration of the epistemological goals and the infrastructural requirements of the years leading up to its ascendency in the present, namely, the use of machines and computational processes to "delete the social" from practices of studying and identifying mental illness (Leigh Star 1991; Forsythe 1993). Part of this historicizing work includes challenging the dominant narratives in many secondary sources of DSM-III's own paradigm shift, which represent the publication of DSM as a totalizing, top-down transformation away from "the inevitable ambiguity of great drama" and toward empiricism, theoretical neutrality, and objectivity. I do so by highlighting Lempert's 77 (2019) groundbreaking article describing the models of empiricism and the use of recording devices in psychiatry in the years prior to DSM-III (although I suggest that Lempert downplays the important role that administrative workers played in making these projects possible). Freud initially defined psychoanalysis, and the psychoanalytic therapist, against biomedicine, arguing that psychoanalysis does not deal with the biophysiological realm; this is the definition of psychiatry that the neo-Kraepelinians worked to overturn. Nevertheless, as Lempert shows, there were concerted efforts in the 1930s and 1940s in the U.S. to render psychoanalysis into an empirically grounded science, precisely through the use of machines to capture the ineffable presence of the unconscious in speech, isolating a material trace that could help scientists and practitioners track the efficacy of therapy, thereby rendering it more objective. While the secretaries, typists, and other "verbatim laborers" (Inoue 2018) situated at the edges of Lempert's archives may seem to play a neutral role in transforming text from spoken utterance to graphic trace, I argue that they played an active role in projects of making the unconscious material and legible. Despite these efforts, psychoanalysis fell out of favor in the U.S. But it did not fall out of favor because researchers studying psychoanalysis rejected empirically driven methods tout court. Rather, the sun set on psychoanalysis in the U.S. because its supporters and practitioners could not fit the mysteriously coded actions of the unconscious into the actuarial frameworks of evidence and efficacy that pharmaceutical companies, insurance companies, and regulatory apparatuses increasingly pushed in the U.S. Contemporary efforts to stabilize mental illness using computational methods thus satisfy epistemological and bureaucratic longings alike. While Computational Psychiatry is part of a larger movement in U.S. psychiatry to dispose of the DSM altogether and develop a novel, biologically based nomenclature, this movement nevertheless 78 operates within the same, entangled premises that drove the publication of DSM-III and that psychoanalysis could not satisfy. Altogether, there is connective tissue between Computational Psychiatry and DSM-III, spun by the work of people like Spitzer, and supported by assertions about the abilities of machines to restrain so-called theoretical biases from coloring psychiatric research. The notion that computational techniques might standardize the order of observational and interpretive operations in psychiatry-thus rendering psychiatry into an objective science and saving costly human resources in the process-has much deeper historical roots than the popular discourse about Computational Psychiatry's novelty suggests. Computational Psychiatry today holds the same rhetorical promise that DSM once did; its application in the context of research studies requires research investigators to pursue the same doomed enterprise of trying to shore up the division between objectivity and subjectivity. In historicizing Computational Psychiatry, I interrogate the taken-for-granted division between the machinic and the human, pushing back against the notion that psychiatry was a humanistic practice prior to DSM-III and has become more technical (and less human) since. Scholars writing about the role DSM-III played in the history of psychiatry tend to segment the time leading up to its publication, and the years following it, in dichotomous terms: subjective and objective, immaterial and material, psychic and organic, personal and general, psychoanalytic and neo-Kraepelinian, and so on. For instance, Lakoff (2009: 3) phrases the divide in terms of the difference between recognizing and treating mental illness "through purely technical means" (on the neo-Kraepelinian side) or by accounting "for the particular life trajectory of the subject" (on the psychoanalytic side). These binaries are mapped on to the division between the machine and the human. In my critical reading of secondary sources, I 79 show how the dominant narrative about the re-making of U.S. psychiatry reifies this divide and calcifies these binary categories in the process. Foreshadowing a tactic I employ in analyzing my ethnographic material, this chapter focuses on moments during which the divide between the computational breaks down and refracts, in which the computational and the humanistic (or their corollaries, the objective and subjective) dance together. MINDING THE BODY: PSYCHOANALYSIS, MATERIALITY, AND TRANSDUCTIVE LABOR From the mid-1960s onward, psychiatrists in the U.S. have tended to answer what Lakoff (2005:3) says is the field's most fundamental question-do we locate mental illness in the organism, or in the psyche?-by seeking the biological mechanisms of mental illness at greater levels of specificity. To situate mental illness in the body, the claim goes, would be to stabilize it as an object of knowledge, to free it from the specificities of its context-to render it objective. But what has it meant to locate illness in the psyche? Moreover, how have proponents of psychoanalysis prior to the 1960s searched for mental illness in the psyche? The project of making psychiatry into what it is today began as an attempt to turn away from psychoanalytic models of disease and treatment methods. I give a brief review of how psychoanalytic theory positioned its object of study-mental illness-against other forms of pathology and "organic" medicine in order to give a clearer picture of what the neo-Kraepelinians were working against. I then draw from secondary literature covering efforts to locate evidence of the unconscious using audio recording devices and transcripts, in order to clarify the claim that psychoanalysis is opposed to objectivity or materiality. These projects to track therapeutic processes through subtle signs in the body and the voice anticipate my informants' projects, both in their aspirations, 80 linguistic ideologies, and in their reliance on transductive labor-in this case, the work of secretaries and transcriptionists whose job it was to create and annotate transcripts of patients' speech, transforming audio recordings into orthographic representations of verbal and non-verbal communication. These projects also suggest a more nuanced framing of the body, objectivity, and evidence prior to DSM-III than the narrative of the neo-Kraepelinians conveys, while also suggesting continuity between the pre-and post-DSM-1II eras with respect to labor and machines. In Freud's The Interpretationo fDreams ([1899] 1998), psychoanalysis' foundational text, Freud analyzes the content of his own dream in order to exemplify that dreams enact the fulfillment of an inappropriate wish or desire (151) while also diagramming his theory of the tripartite structure of the self (the id, the ego, the super-ego). As an analyst caught up not only in crystallizing psychoanalytic theory but also in curing his patients, the central wish of his own inappropriate dream is that he be "acquitted" (151) of the responsibility of curing his patient, Irma, whom he had diagnosed with hysteria but had been unable to cure. In both the dream and waking worlds, Irma continues to suffer from physical symptoms (chest pains, difficulty breathing) that Freud had hitherto determined as psychological in origin, arising not from some underlying physiological issue that could be treated by a physician, but from some disturbing and yet to be reckoned with event in the past that a psychiatrist should treat. Decker (2013) points out that hysteria was "the disorder that first led Freud to develop psychoanalysis and the theory of 'unconscious' conflict" (203). The perennial anxiety that hysteria causes the analyst-the fact that it involves both the mind and the body, the psychic and the organic-thus lies at the origins of psychoanalytic thought. 81 In his analysis of his own dream, the way in which Freud reasons himself out of being "responsiblefor the pains [Irma] still had" (141) reveals something about the tenuous and delicate boundary between psychiatry and biological medicine in his time. Remarks Freud: I was alarmeda t the idea that I had missed an organic illness. This, as may well be believed, is a perpetual source of anxiety to a specialist whose practice is almost limited to neurotic patients and who is in the habit of attributing to hysteria a great number of symptoms which other physicians treat as organic. On the other hand, a faint doubt crept into my mind-from where, I could not tell-that my alarm was not entirely genuine. If Irma's pains had an organic basis...I could not be held responsible for curing them; my treatment only set out to get rid of hysterical pains. It occurred to me, in fact, that I was actually wishing that there had been a wrong diagnosis; for, if so, the blame for my lack of success would also have been gotten rid of (141-142, emphasis original). Since, according to Freud's theory, hysteria manifests itself and is experienced by the patient as physiological distress, the analyst would rightly be fraught with anxiety regarding its diagnosis. Accurate diagnosis of disturbances like hysteria, which appeared to blur the divide between mind and body, challenged the expertise and diagnostic resources of both psychiatric clinicians and medical physicians alike. Biological illnesses were the responsibility of physicians, and physicians could not be expected properly to identify or treat "hysterical," psychological disturbances. Such illnesses exceeded the limits of a physician's prowess. The dividing line between mental illness and biological illness relied on the distinction between organic and psychogenic, and between the physiologically grounded in the body and the psychologically grounded in experience. The presumed space between these categories was what separated psychiatric clinicians from other kinds of biomedical physicians. Rosenberg (2002) notes that the way in which biomedical physicians conceive of disease is a historical achievement of the 1 9 th century, fomented by the proliferation of imaging techniques for apprehending and knowing the body's internal processes. In many ways, then, the dividing line between the psychoanalyst and the physician also revolves around the concept of 82 "disease specificity," or the notion that diseases "can and should be thought of as entities existing outside the unique manifestation of illness" in a person (Rosenberg 2002: 237) rather than a fluid phenomenon that shifts according to environment, relationship, circumstance, or individual life course. DSM-III task force members hitched together their pursuit of disease specificity with the pursuit of making psychiatry into a more biomedically oriented field. Because disease specificity was not a primary concern for Freud and his predecessors, neither was diagnosis. Under a psychoanalytic paradigm, there is no clear distinction between the well and the unwell. Psychopathology is sewn into the fabric of being human; to be born and continue to live is to rupture psychically, and forever pursue repair. In its most classic, Freudian iteration, the psychoanalytic subject is wracked by the tension between the socially unacceptable, erotic and violent urges of the id, and the ego and superego's drive to combat, thwart, conceal, or convert these urges into something more acceptable. The psychodynamic analyst's job is to help the patient make sense of the myriad ways in which these inner conflicts and forbidden desires re-substantiate and re-code themselves in one's interpersonal relationships, dreams, slips of the tongue, and so on. Hewing closely to a person's singular life history, diagnosis under psychoanalysis resists standardization. The nature of the problem-the diagnosis-differs from person to person. At the same time, as psychoanalysis caught fire (and met challenge) in the United States from the 1920s onward, there were several concerted efforts to grasp hold of the unconscious, via the body and the voice, in order to demonstrate it existed and that the efficacy of therapy could be accounted for. Michael Lempert (2019) traces the work of Earl Zinn and Harold Lasswell, two researchers experimenting with recording psychoanalytic sessions in the early 1930s. While Lempert is primarily focused on mapping out the relationship between these 83 attempts to "spy on the mind through the aperture of the body" (35) and the flourishing communication sciences and studies of face-to-face interaction in the U.S., his article provides a vivid picture of the role researchers hoped machines could play in making the mind's inner workings more material. Particularly useful is Lempert's suggestion that these early projects to distill the data of psychoanalytic encounters were driven by a wish to downplay and "bypass the human in order to let nature speak as truly as possibly" (29), a mode of ethical orientation toward one's object of study that Daston and Galison (2007) term "mechanical objectivity," or the use of machinic technologies to downplay and restrain the introduction of the scientist's self into the pursuit of scientific knowledge. In the experiments of Zinn and Lasswell we find the seeds of Computational Psychiatry's reoccurring theme: that machines-here gramophones", aided by wax records and transcripts-are media that can downplay (and obviate) the interference of human subjectivity and allow the secrets of the mind's inner life to shine through, unadorned. At the same time, the transductive work of Zinn and Lasswell's secretaries-in transcribing the audio recordings to be available for analysis-suggests that mechanical objectivity entails not only the removal of human subjectivity, but also depends upon the labor of actors who are not firmly placed within the category of the human subject. In other words, the scientific self that is nobly restrained in pursuit of mechanical objectivity depends upon a strict, exclusionary guidelines for who can count as a scientific subject versus object, dependent upon liberal conceptualization of the person as an individual endowed with the spark of intellect and inalienable rights, such as the right to own property, to participate in liberal democracy, and so on (Haraway 1997; Herzig 2005). 11 Laswell and Zinn's use of the gramophone corresponds with and almost perfectly embodies the argument Kittler makes in Gramophone, Film, Typewriter (1999), which explores the role of media in the remaking of the psyche and subjectivity: Kittler argues that phonography was a technology for rendering the psyche objective-for creating "non-subjective" inscriptions of subjectivity. 84 Zinn was the director of the Committee for the Study of Personality, a New York-based subcommittee of the Social Science Research Council (SSRC). Lasswell, his contemporary, was a "psychoanalytically inclined political scientist" at the University of Chicago (Lempert 2019: 33). Avid supporters of psychoanalysis, both men sought to push back against those who denigrated the paradigm for the "subjectivity of the reported data" (Lempert 2019: 35). Zinn himself often complained that the psychiatrists and psychoanalysts he encountered showed no interest in the "scientific validity of their data" about their patients and proposed a conference dedicated to establishing uniform research methods in psychoanalysis (Lempert 2019: 31). There was also growing interest within the SSRC to explore psychoanalysis-"'a difficult field as yet virgin to rigorously controlled scientific exploration'-in formalized, experimental situations (SSRC annual report, quoted in Lempert 2019: 31). In the Midwest, Laswell was in obsessive pursuit of "somatic indicators of psychological states that could be measured quantitatively" (Lempert 2019: 34). He developed elaborate laboratory setups that had patients connected to bands, sensors, and wires for tracking galvanic skin response, pulse and heart rate, breathing, and fidgeting limbs as patients underwent analysis. Laswell had a hunch that these somatic signs might reveal the latent content of a patient's psyche in a way that denotational speech content alone could not express. For Laswell, attending avidly to the body's semiotic output during analysis could finally provide "evidence of otherwise gauzy, abstract claims about mind-claims that behaviorists dismissed as backward and unscientific" (Lempert 2019: 34). Zinn and Lasswell were faced with a dilemma. They wanted to record the entirety of the session, initially for the purpose of obtaining "verbatim" transcripts." However, inserting a " Verbatim, for Lasswell and Zinn, corresponded with a representation of the what-is-said of speech: the content, or the denotational substance alone (Lempert 2019: 29). Lempert argues that as Lasswell and Zinn's experiments with recording sessions progressed, their interpretation of the semiotic potential of the transcripts expanded. Narrow 85 human note-taker into the session was out of the question. Their presence would wrinkle the dyadic analyst-analysand relationship, sending the crucial process of transference askew. That the analyst themselves take detailed notes was also out of the question. Freud dictated that the analyst should remain receptive and responsive without attending too closely or consciously to the analysand's speech, avoiding the risk of mapping their own (subjective) meaning onto the analysand's free associations." How, then, to capture the interaction? Zinn and Lasswell resolved to abandon "human stenographers and note-takers" altogether and instead "repurposed wax cylinder dictation machines that had been marketed for business," creating audio recordings of sessions (29). Zinn partnered with Alexander Graham Bell's Dictaphone Company. He hid the presence of microphones throughout the session room, embedding at least one microphone in the head of the couch where the analysand reclined (Lempert 2019: 36). It was through this unobtrusive, invisible recording (and subsequent transcribing) that Lasswell and Zinn began to codify and seek out what Lempert calls the "communicative unconscious": bodily signs and vocal blips that the men interpreted to be the output of the unconscious, the encoded signals of its response to analysis. In his 1935 article, "Verbal Reference and Physiological Changes During the Psychoanalytic Interview," Lasswell posited a interest in the content of speech morphed into an interest in the indexical components of the communicative interactions inscribed in transcripts. 13A s Elizabeth Wilson (2010) points out, the psychoanalytic encounter offers a space of simulation: an opportunity for the analysand to simulate with the therapist the relationships they have elsewhere in life, so as to better understand the contours, nuances, and unaddressed tensions of these relationships. This simulated relating- transference-is therefore vital material for analysis in and of itself, and must not be disrupted by the introduction of additional parties. " Freud recognized that analysts faced a difficult task in keeping track of the innumerable personal details-the memories, phobias, and life histories-of scores and scores of patients over months, if not years, of analysis. His technique to avoid over-saturation and to keep the analyst's own memory and therapeutic faculties as attuned as possible was to avoid recording or detailed note taking. His technique, in his words, "consists simply in not directing one's notice to anything in particular and in maintaining the same 'evenly-suspended attention'...in the face of all that one hears...For as soon as one deliberately concentrates his attention to a certain degree, he begins to select from the material before him...and in making this selection he will be following his [own] expectations or inclinations" (1912: 110-111). 86 direct relationship between vocal quality and unconscious content. He found that slowed speech rate corresponded with increased psychophysiological tension, eventually arguing that "somatic measures reveal what speech 'means,' clinically speaking" (Lempert 2019: 39; 34). Both researchers began to realize that there was additional semiotic substance running alongside speech content that the loosely attentive analyst might not pick up on: false starts, words cut off before completion, or hitches in the voice that occurred at certain, significant points in analysis. Sometimes, these signs even contradicted semantic content. A patient might insist that his dream was not about his father, but upon reviewing the recording, Lasswell and Zinn would find a tremor in the heart-or the voice-caught by a sensor or transcribed by one of their secretaries that suggested an alternative interpretation. In other words, Zinn and Lasswell began to suggest that the recording revealed the presence of indexical signs in psychoanalytic interactions: signs that bear an existential, causally contiguous relationship with the objects for which they stand. In this case, the fidgets and sighs, according to the two researchers, emanated from and expressed the unconscious." But these indexical components did not unfurl from the audio-recorded speech on their own. The legibility of these signs depended upon the work of the secretaries and typists employed under Zinn and Lasswell, who inscribed these signs into existence, listening to and then transcribing the speech that played from the wax records, following the notational conventions that their bosses prescribed.' 6 " To put it differently, the recording devices played a critical role in the "indexicalizaiton" of therapeutic interactions, or the process by which indexical relations come to be interpretively treated as indexes (Lempert 2019: 25). 1 Archival records indicate that Zinn instructed his typists to mark speech for pauses as well as "kinestic behavior," like the lighting of cigarettes and the opening and closing of doors (Lempert 2019: 38). 87 While the act of transforming speech from spoken word to written trace may seem like a passive copying, pure mimesis1 7, it was through the very rendering and graphic notation of text that the unconscious became available for analysis. Not just the recording, then, but the transcription of spoken words to text was pivotal to the researchers' findings. Zinn failed to train his typists uniformly, and because he re-used the wax cylinders of his Dictaphone, the inconsistent transcripts left consequential gaps in his research, gaps that I argue speak to the pivotal role Zinn's administrative team played in his research. Lempert points to an unevenness in the transcripts, particularly for Zinn, who trained as an analyst so that he could personally conduct the therapy that his surreptitious microphones recorded. In the transcripts that still exist from Zinn's experiments, his patient's speech is marked with metacommunicative, symptomological pauses and quips. But Zinn's own stream of speech, reproduced on the written page, "seemed suspiciously fluid; pauses were seldom marked-and never in a context that might reveal something psychological about him" (38). While Lempert interprets this to be a clerical error, I suggest once again that it underlines just how powerful-and crucial-was the typists' transcribing work. Zinn and Lasswell's administrative teams' may have lacked psychoanalytic training, but to make these traces legible in their supervisors' speech would have put them in a position of illuminating the men's own unconscious impulses. Although it is unclear if Zinn and Lasswell explicitly instructed their typists to keep the transcripts asymmetrically opaque, this lack of detail nevertheless kept the power asymmetry between (expert, scientific) employer and (inexpert, administrative) employee in place. Zinn and Lasswell's projects to uncover the communicative unconscious, then, resemble my informants' 7 See Miyako Inuoe (2011) for a historical ethnographic account of the shifting engenderment of stenography work in Japan, from agentive and creative act (when it was the domain of men's labor) to imitative, passive verbatim mimesis (when it was the domain of women's labor). 88 work on two counts. First, we see a surprising continuity between these pre-DSM-III efforts and the basic logic and language ideologies that drive my informants' research (the existence of a non-referential series of signs in a patient's speech that, if refracted through the proper machinic media, can reveal otherwise opaque interior states). Second, we see that the machinic mediation scientists call upon to render these indexical signs transparent, indelible, and legible, requires transductive labor, despite the fact that the organization of hierarchy of labor within the scientific research team situates this work at the bottom, as non-agentive. The use of machines to let the voice of either the unconscious, or the body, sing forth, full throated, relies on objectified human labor; the humans in this loop melt away and meld with the media of the various technologies (especially recording devices) giving the impression that it is the machines that autonomously "find" and transduce these indexical signs. THE GREAT DRAMA OF COLD LOGIC The death and violence of World War II boosted the status of psychoanalysis as analysts begin to grapple with, study, and theorize the relationship between participating in and witnessing acts of violence and the experience of trauma (Young 1995). At the same time, dissent against psychoanalysis was brewing again, this time not only from behaviorists but from budding researchers and practitioners in training-like Robert Spitzer-who found themselves disappointed with the epistemological tools handed down in their classrooms and clinical internships. Spitzer and his colleagues began to articulate these critiques-and envision alternative models for psychiatry and the DSM-through experiments with computerized diagnosis. 89 Robert Spitzer's position at the helm of DSM's "atheoretical" reform was consistent with his own training and his past experiences. Fourteen years prior to DSM-III's publication, Spitzer had left the Columbia Psychoanalytic Institute with his degree-barely, by his own reports (Decker 2013:94)-and with a deep dissatisfaction for psychoanalysis. Spitzer was part of a growing group of psychoanalysis dissenters at universities and hospitals primarily concentrated in the northeastern United States. Like Spitzer, these researchers and practitioners rejected psychoanalysis for the primacy it placed on "wisdom" over "empiricism," or "debate" subjective convention that could not be reified in laboratory studies-over "data"-which they figured as durable, quantifiable evidence, ideally rooted in human biology (Edelman 1969; Feighner 1979). Spitzer's attraction to this model of empiricism had as much to do with his personal tastes and talents as it did with his ideas about the medical sciences. In a 2013 interview, Spitzer disclosed that he neither enjoyed nor excelled at conducting psychotherapy. "I was always unsure that I was being helpful," he confessed, "and I was uncomfortable listening and empathizing...I just didn't know what the hell to do" (Decker 2013:94). While psychotherapy repelled Spitzer, the diagnostic interview enticed him. During his time at Columbia, he committed himself to creating uniform interviewing guides, testing and developing apparatuses that set standardized procedures for assessing a patient's mental state and making a diagnosis. For example, in the late 1960s, he published guidelines for New York State Department of Mental Hygiene personnel on how to diagnosis patients using DSM-II's freshly published nomenclature (Spitzer and Wilson 1968). All in all, it was the standardization of diagnostic procedures-which Spitzer and his fellow neo-Kraepelinians saw as key to uniting psychiatry with the rest of biomedicine-that 90 kept Spitzer in the field, rather than the hermeneutics of analysis or the conversational arts. In lieu of more rigorous psychotherapeutic training, the young Spitzer sought out expertise in "technical" fields, dreaming up ways to bring these skills to bear on psychiatry. Most notably, Spitzer took several courses in data processing, general computing, and the coding languages FORTRAN II and IV at IBM's New York-based Data Processing Division. The courses "opened up for him the world of algorithms" (Decker 2013:94)-a clean, idealized world of stable correspondences between inputs and outputs, and clear-cut binaries rather than psychoanalysis's winding and ever widening spectrum of individualized pathologies. At least, this is the image of the algorithm that Spitzer chased after and that his forays into melding programming with psychiatry would reproduce. Just as DSM-II was published, Spitzer collaborated with like-minded Columbia psychologist and eventual DSM-III task force member Dr. Jean Endicott and produced the first of what would be three, interlinked papers in the Archives of General Psychiatry. Uniting his commitment to standardization and the hope he invested in algorithms, the papers presented three iterations of a computerized diagnostic program written in FORTRAN for the IBM 7049: DIAGNO-I (1968), DIAGNO-II (1969) and DIAGNO-III (1974). Together, Spitzer and Endicott aimed to establish a computer program that a clinician could use to diagnose patients with as little human decision-making work as possible. In each of their papers, they set up a series of "Man [sic] versus Computer" (1968:749) experiments testing the DIAGNOs' diagnostic prowess against technicians and clinicians with varying degrees of experience. In offloading the decision- making work of diagnosis to a computer, Spitzer and Endicott aimed to show that the various DIAGNOs had the potential to reduce the time a clinician would need to spend with a patient, and the money that the patient (or their insurance provider) would need to spend on the clinician. 91 Schematic flow chart for DIAGNO computer program. Start BRAINSYNDtOME AFCTIVPSHoS PERSONALITYDIORDER NEUtOSIS OTHE 9 ARERAOCLSCUIZHENIAF § RACTIc uEs WI BEHACT RE AT ---10 Arch GenPsychiat--Vol18,June 1968 Internal decisiontreestructure ofDIAGNO-I. In amore polarizing move, Spitzer and Endicott argued that computerized diagnosis could address aproblem that cut to the heart ofthe mounting tension between Freudians and neo- Kraepelinians: diagnostic reliability. The neo-Kraepelinians often pointed out that, when using the conventional, psychoanalytically oriented nomenclature, two clinicians could examine the same patient and come up with completely different evaluations of the patient's psychiatric state (Edelman 1969). One ofmy informants from West Coast University, apsychoanalytically trained therapist from Argentina who was trying to fold engineering perspectives into his psychotherapeutic practice, once referred to classic psychoanalysis as "the ultimate black box." A patient might input their symptoms and stories to the receptive therapist, who would then output adiagnosis or interpretation, but there was no way of knowing or replicating the procedures atherapist was following to arrive at that output. Likewise, Spitzer and Endicott attributed the "well-documented unreliability of psychiatric diagnoses" to variability in the order 92 of "operations by which clinicians use the raw data of observations to make a diagnosis" (1968: 746). Psychoanalysis could not provide clinicians with a universal web of associations between symptom and disease, or a flowchart dictating that if a patient expresses x, then their diagnosis is more likely to be y and never z. Through their Man vs. Computer experiments, Spitzer and Endicott concluded that "this source of unreliability is completely eliminated by the use of a computer program which will always arrive at the same diagnosis when given the raw data describing a subject" (1968: 746). They built DIAGNO-I to implement a "logical decision tree model similar to the differential diagnostic procedure employed in clinical medicine"-a series of true/false questions that would follow a different pathway depending on the answer (ibid). The three papers convey that using DIAGNO requires little prior experience. All a DIAGNO operator need do is input the patient's gender, age, number of previous hospitalizations, and symptoms - which the operator should describe using the Psychiatric Status Schedule (PSS), a scale for assessing social role and mental state (Spitzer and Endicott 1968: 746). Spitzer created-and never published-the PSS while at Columbia; he intended for clinicians to use the scale much in the same way that DSM- III would eventually be used. The PSS provided guidelines for how a diagnostician might ask questions and elicit information about the patient's mental and social role functioning, as well as guidelines on associations between their answers and DSM-II diagnostic categories. After entering this data, DIAGNO-I would spit out a diagnosis, using "diagnoses and qualifying phrases as well as two unofficial diagnoses: not ill and nonspecific illness with mild symptomology" (Spitzer and Endicott 1968: 747). These two categories would eventually make their way into DSM-III. Therefore, in addition to a foray into computerized diagnosis, the DIAGNOs were a testing ground for DSM-II's epistemological finer points. 93 Despite the 1968 paper's promissory overtones, by the time Spitzer and Endicott were writing with several other colleagues about DIAGNO-II in 1974, they concluded that the computerization of diagnosis had reached a stopping point. However, the authors assured readers that "all constraints on computerized diagnosis are of a partial nature and are inherent neither to the kinds of information that computers can process, nor in the nature of the algorithms available to them" (Spitzer et al 1974: 202). Indeed, other psychiatrists saw great promise in DIAGNO in terms of its diagnostic acuity and its ability to save time and money. Orr notes that in 1975, DIAGNO-II was "fully operational" and "installed for use at Rockland State Mental Hospital [in New York], home of Nathan Kline, cyborg psychiatrist and founder of U.S. psychopharmacology" (Orr 2010: 367). The authors of the 1974 paper contended instead, "the constraint [on computerized diagnosis] lies...in the traditional diagnostic system itself' (202). To them, the problem lay with the current state of psychiatry as a whole. It was at this moment that members of the recently-formed DSM-III task force, like Endicott, decided that the answer to psychiatry's reliability issue was not to train a computer to reason like a clinician, but to teach the clinician to reason like a computer by revising psychiatric nomenclature altogether (Orr 2006:245). But a closer look at the DIAGNO papers reveals a crucial caveat to Endicott's characterization. Spitzer and Endicott qualify in the second DIAGNO paper that "a computer program can...yield a diagnosis, eliminating the costly use of experienced clinicians" so long as "specifically trainedt echnicians can be used to collect accurate data on subjects" (1969: 12 my emphasis). In other words, clinicians cannot simulate the logical procedures of a computer alone. To reason "like a computer," clinicians require human assistance. In order for DIAGNO to do its job-to diagnose reliably and economically-DIAGNO requires a fleet of technicians trained in 94 standardized methods of data collection, which, in the case of psychiatric diagnosis, includes the elicitation of details about a patient's symptoms. Throughout the six years they worked on DIAGNO, Spitzer and Endicott recognized that "any system that relies on routinely collected data must instate training and administrative procedures...to ensure high quality data" (Spitzer et al 1974: 202). In their 1974 paper, the authors even suggest that many of the instances in which the human clinician and DIAGNO gave conflicting diagnoses were "due to sheer blunders in the ratings made by the clinical staff," such as data entry errors (ibid). In this way, the DSM's psychoanalytic nomenclature and the status of psychiatry as a whole was not the only constraint on computerized diagnosis. The success of computerized diagnosis depended, like any A application or machine learning, upon a para-professional labor force of data custodians, and their capacity to carry out uniform procedures." Implicit in this admission is that while computerized diagnosis might scale back the costly use of clinicians, it scales up the presumably less costly (and, by proxy, less valuable) labor of technicians gathering the data to be fed into the program. At the germinal moment of North American psychiatry's empiricism, then, we find another familiar refrain: the more computational psychiatry becomes, the greater the need for cheap, mechanized labor. Moreover, in this asterisk about the necessity of a para-professional labor force whose job it is to gather the "raw" data to be delivered to the person-or machine-responsible for diagnosis, we find the stirrings of psychiatric screening as a sub-species of psychiatric judgment, a kind of sorting that is necessary yet inferior to the act of diagnosis. For the work of gathering and inputting patient " Scholarship across STS and the history of science has indeed affirmed that this is the case for many scientific disciplines. The making and doing of science relies on delegated and distributed human labor, whether it be in the production of maps (Turnbull 2000), accounts of the administrative labor that transforms objects found out in the world into specimens for display in museums (Star and Griesemer 1989), in the seemingly flashy and high-tech field of bloinformatics in which the "wet work" of laboratory sciences has been scaled up and in-sourced to heteromated warehouses (Stevens 2013) or through neo-colonial configurations of labor in bio-prospecting exhibitions to develop novel medications (Hayden 2004; Soto Laveaga 2009). 95 data in order to determine whether or not they require the (costly) time and attention of a clinician is the work of psychiatric assessment. DIAGNO's technicians pre-figure the administrative position of psychiatric assessment with respect to diagnosis. Daston (1992) deems the ideological production of a neutral, scientific gaze, in which the idiosyncrasies of the observing scientist(s) identities are wiped away from the written page, "aperspectival objectivity." This view from nowhere-which is, in the case of DSM-III and Computational Psychiatry, also an ear from nowhere-is achieved through a distributed network of observers doing "technical" work. In many ways, Spitzer and Endicott attempted to use the computer to achieve aperspectival objectivity-a mode of interpreting the signs of mental illness that was supposedly set aside from the clinician's theoretical dispositions-in addition to mechanical objectivity. Daston notes that aperspectival objectivity was only "imported and naturalized into the ethos of the natural sciences, as a result of reorganization of scientific life...when science came to consist in large part of communications that crossed boundaries of nationality, training, and skill" (600). Daston's theorization of aperspectival objectivity is instructive, here, because it links together a mode (and an ethic) of observation with the arrangement of labor that is required to sustain it. Like Zinn's secretaries, the para-professional technicians feeding DIAGNO its data all suggest that mechanical objectivity-which entails not just the removal of the idiosyncrasies of perspective, but of the human altogether -requires its own arrangement of labor. The "mechanic" of mechanical objectivity includes not only non- human machines, but machine-like human labor, work that is a mechanized feature of a computer or recording devices' attendant infrastructure. 96 FILLING A FINANCIAL BOTTOMLESS PIT Spitzer, Endicott, and the other detractors of psychoanalysis who would eventually form or become associated with the DSM-II1 task force all took issue with the lack of diagnostic reliability that DSM-II and the psychoanalytic conventions of the day offered. A psychoanalytic paradigm emphasizes individual life history and circumstance and, as a result, does not offer clear-cut definitions of illness, wellness, or the distinction between the two. Because psychoanalysis also does not require strict boundaries between and criteria for disease categories, it cannot provide techniques for grouping patients into homogenous populations, nor enable an outside observer to track the impact of therapy on a patient over time (Zinn and Lasswell's efforts notwithstanding). The problems of diagnostic unreliability are thus tied to other conundrums beyond the epistemological, and research-focused psychiatrists like Spitzer were not the only ones who took issue. While Spitzer and his collaborators were publishing increasingly forceful critiques of psychiatry and the task force-which officially formed in 1974-began to coalesce, the rest of the country was undergoing a series of interlinked, top-down reforms. These changes concerned the bureaucratic and administrative management of mental illness as a public health issue. Task force members and their associates capitalized on these changes by transforming DSM-III into a "boundary object" (Star and Griesemer 1989). Boundary objects-like the map of a state-are simultaneously general and specific, concrete and abstract. Some component of the object must be stabilized or structurally tenacious, yet the object also must be loose and plastic enough so that it can be made to speak for different things, or put to different uses, "reconcile[ing] meaning" across different and even sometimes contradictory views (Star and Griesemer 1989: 388). As a boundary object, DSM-11 was a mediating interface that drew together a diverse constituency of actors, becoming a salient tool 97 not only for therapists, but for research investigators, insurance providers, and pharmaceutical companies alike. The task force drew these constituencies together with their revisions, coordinating the stabilization of the methods of diagnostic procedures with a stabilization of the methods of conducting research on diagnostic categories and on mental health care interventions. They rebuilt a manual that fit into the country's fluctuating health care services infrastructure. In recounting these top-down changes, I want to suggest that the "machinic" in late twentieth century psychiatry is not limited to the object of the computer itself but also encompasses ideas about why things ought to be made computable to begin with. During this time period, psychiatry found itself squeezed by "consumerist demands," with the grip tightening year after year (Rosner 2005: 135). The pressure came from regulatory bodies, federal funders, and private insurance companies, all of which were placing increasing emphasis on statistical calculations as the most valid form of evidence. Measuring the impact of an intervention in a quantifiable way made it easy to tie questions of efficacy with questions of cost effectiveness, and concerns for objectivity with concerns for economy-the same coupling that played out across the DIAGNO papers, and that continues to drive Computational Psychiatry research today. Statistical analysis became another means through which the ineffable stuff of mind (here, psychiatric rupture and the movement toward repair) could be made material, defined this time in terms of the type and price of care. Human subjectivity and its role in the diagnostic process was framed as an impediment to the production of measurable therapeutic outcomes, and to the process of rendering people diagnosed using DSM's categories into subject populations containing individuals who can be equated with one another under the moniker of their diagnosis. 9 In this way, the so-called empiricism of DSM-III and the neo-Kraepelinians was a " With reference to Cronon (1992), Lakoff (2005b) refers to the equalizing capacities of DSM's standardized diagnostic categories as "diagnostic liquidity." 98 pragmatic one. Diagnostic categories in DSM-I1I described statistically real entities, even while the biological underpinnings of mental illness remained an admittedly unanswered question, and even while task force members spoke openly about the manual's provisionary status. Thus, against the standard narrative that depicts diagnosis and psychiatric under psychoanalysis as un- empirical against the neo-Kraepelinians absolute empiricism encapsulated in DSM-11, I suggest that the neo-Kraepelinians simply pursued a certain form of empiricism that suck better than the empiricism of psychoanalysis, due to its association with calculation and its ability to render people, experiences, and treatment responses numerically commensurate. The groundwork for the changes that took place during the revision of the DSM was laid years prior. Throughout the 1940s and 1950s, scientists in the U.S. and Western Europe began to discover that certain drugs led to positive outcomes in the management of specific symptoms. In 1949, a scientist found that lithium significantly diminished the symptoms of what was then called manic depression (a diagnostic category discussed in Chapter 4). Next came the discovery in 1952 that chlorpromazine diminishes psychotic symptoms, and, five years later, that tricyclic medications diminish depressive symptoms (Lakoff 2005: 7-8). Previously "untreatable" patients-otherwise relegated to asylums-that responded well to these medications could leave spaces of confinement and instead undergo long-term outpatient psychotherapy while living in the general population. Mental health care professionals thus began to recognize the practical benefits of defining mental illness discretely and not necessarily in terms of what ailed the patient but in terms of which interventions seemed to lessen the patient's suffering (Lakoff 2005: 7). The tight coupling of pathological symptoms with intervention along with the primacy of quantifiable evidence was formally written into law in the early 1960s. In 1962, Congress passed 99 a landmark revision to Food and Drug Administration (FDA) policy that impacted all biomedical research in the United States, with far-reaching implications for psychiatry. According to this legislation, anyone who wanted to develop, promote, or sell a biomedical intervention would have to test the safety and efficacy of that intervention in a randomized controlled trial (RCT).0 For psychiatric research at the time, this meant that investigators should test both drugs and psychotherapy alike using the RCT structure. This decision sent "epistemological convulsions" (Rosner 2005: 136) throughout psychiatry, markedly transforming the way in which researchers frame mental illness as an object of study and concern, and forcing those who wished for their interventions to have any kind of profitable life into defining disease according to the treatment meant to resolve them. The 1962 legislation had a recursive effect. As Lakoff puts it, any intervention developed from that point onward "had to embody the system's model of the relationship between illness and intervention" (Lakoff 2005: 10). More than that, the FDA legislation insinuated that statistically calculated evidence held a special epistemological place-it was the only proof of an intervention's efficacy that the FDA would recognize. This primacy placed on statistical evidence resonated with the neo-Kraepelinians. In the battle of know-how versus numbers, the 1962 legislation was a win for the "data-oriented approach" of Spitzer and his colleagues, and a 2 The purpose of a randomized controlled trial is to reduce "bias" and produce as objective of evidence of an intervention's impact on a targeted subject population as possible. In an RCT, research cohorts are typically split into two groups: one group receives the actual intervention, while the other group receives a placebo. Investigators utilize statistical randomization in order to determine which subjects end up in which group-hence, the splitting of subjects is supposedly free of bias and agnostic. Investigators calculate the statistical validity of the placebo versus the actual interventions impact on the targeted symptoms. As Vicanne Adams (2013) and Joe Dumit (2012) have discussed, the insidiousness of RCTs lies in their perceived, unilateral "objectivity." Although the results of an RCT bear the moniker of objectivity by way of statistical analysis and the randomization of which subjects receive which intervention, the results of an RCT can still be tweaked in one way or another, i.e., as is the case with the pharmaceutical corporation sponsored, cholesterol lowering drug trials that are the subject of Dumit's ethnography. 100 loss for the evidence-based approach of psychoanalysis. 2 Yet while the FDA, like the future task force members, valued an evidence-based approach, it is not the only possible approach, and nor is it a neutral one. As scholars such as Petryna (2009), Dumit (2012) and Adams (2013) have suggested, a narrow emphasis on statistical evidence in the context of biomedical research participates in the peeling away of what constitutes "wellness" or "health" from "anything experiential" (Dumit 2012: 123), i.e., according to the patient's own estimation of their internal states (though this experiential knowledge may or may not be influenced by technoscientific discourses and expectations).22 The RCT structure posed a particular challenge for psychiatry at the time, highlighting that psychiatric nomenclature is not only a tool for clinical treatment but also for conducting statistically validated research. In an RCT, the efficacy of an intervention must be "measurable in terms of efficacy across populations of comparable patients" (Lakoff 2005: 11). Investigators need a way to build research subjects cohorts; they require an "instrument of commensuration (Schechter 2014: 30) that enables them to group together not just a collection of individuals, but a cohesive population with an (allegedly) shared, homogeneous trait. In the absence of a diagnostic system that could achieve this biopolitical feat, research-oriented psychiatrists like Spitzer began to formulate rating scales and questionnaires, like the PSS that Spitzer built in to DIAGNO-I, Aaron Beck's Depression Inventory (BDI), and the Hamilton Depression Scale (HAM-D). They designed these inventories-some of which are still used for psychiatric screening to this day, and some of which we will encounter elsewhere in the dissertation-in 2 The RCT officially became the gold standard of attesting to the efficacy of a psychiatric intervention in 1980, the same year that DSM-II was published. That year, Bill S-3209 was introduced into Congress, the so-called Efficacy Bill, which legally formalized the link between RCTs and the calculation of treatment efficacy (Rosner 2005: 136). 2 Dumit (2012) situates the shift from "experience" to clinical-trial produced, statistical "evidence" twenty years prior to Spitzer and his colleagues efforts, namely, in the 1940s, coinciding with emergence of population-based mass health and the use of statistical data to stake claims against the tobacco industry and the detrimental health effects of smoking. 101 order to translate symptomal experiences into numbers, to reify the various states of being mentally ill. It was during this time that Spitzer, Endicott, and several other researchers developed the Research Diagnostic Criteria, a matrix for designing multi-step psychiatric research (a combination of, among other things, laboratory studies, family studies, population- level studies) that was yet another test-run of DSM-III (Feigner et. al 1972; Spitzer, Endicott, and Robins 1978). As scholarship on audits and documentation in quantificatory regimes of evidence have shown (Strathern 2000; Roles 2006) the numbers that RCTs and inventory scores produce have a real, material existence because of the kinds of work they can accomplish, in this instance, because of how they can move a patient through-or prevent them from accessing-the health care system. The numerical, evidence-based approach of the RCT proved inviting to yet another constituency concerned with the statistical calculation of costs, benefits, and value: insurance agencies. Following the FDA's regulatory changes came slow but consequential changes in both federal and private health care policy, all of which made research funding contingent on adapting the RCT structure and, after its publication, DSM-III's rigidly defined diagnostic categories. As Kate Schechter (2014) describes, up until the mid 1960s, most patients paid for their treatment out of pocket without relying on insurance coverage. Patient reliance on medical insurance to cover treatment costs increased steadily following World War II, and in the 1960s, insurance plans finally began to include coverage for mental health care. Third party insurance companies like Aetna and Blue Cross endorsed the Federal Employees Health Benefits Program, which "reimbursed psychiatric care dollar for dollar with other medical treatments" (Schechter 2014: 28). Yet as psychiatric care-still predominantly psychoanalytic-became more economically accessible, third party payers began to take issue with the length and intensity of treatment that 102 most analysts required their patients to undergo. Psychoanalysis's "qualitative continua" and "symbolic mechanisms" fit poorly into "an insurance logic that would allow payment for the treatment of discrete diseases and discrete episodes" (Schechter 2014: 31). In other words, psychoanalysis was decidedly un-actuarial. By the 1970s, the private insurance industry was booming, and the cost of coverage for psychiatric treatment rose dramatically. More and more, insurance providers deemed therapy- again, synonymous with psychoanalysis at the time-to be a "financial bottomless pit that would require potentially uncontrollable resources" (Schechter 2014: 18). If, according to psychoanalysis, to be human is to exist in state of psychic disrepair, then there is no real end to analysis-it is an ongoing, perpetual quest for self-knowledge. To insurance providers, the interminability and air of mystique surrounding psychoanalysis rendered it suspect, causing them to call its status as a medical intervention altogether into question. For instance, in 1975, the Vice President of Blue Cross declared, Compared to other types of services there is less clarity and uniformity of terminology concerning mental diagnoses, treatment modalities, and types of facilities providing care... only the patient and the therapist have direct knowledge of what services were provided and why (quoted in Schechter 2014: 29). At the time of the Vice President's statement-a year after the DSM-III task force formed- Aetna had reduced its coverage to twenty outpatient visits (i.e., sessions with a clinician) per client (Schechter 2014: 29). The pressure to devise a system that could neatly define illness, cure, and clear boundaries between different diagnostic categories according to the logic of the marketplace was on. Insurance companies hungered for a paradigm that could, without friction, translate its therapeutic procedures into dollars. A parallel story was unfolding at the federal level. President Carter was well aware of the rising demand for and cost of health insurance coverage, and to meet this growing need, he 103 sought to develop a national Health Insurance Program. His pursuit of this program transformed psychiatric research from yet another angle, this time by influencing the type of research that governmental funding bodies, like the National Institute of Mental Health (NIMH), would support. At Carter's directive, in the late 1970s Congress tried to establish standardized "criteria for reimbursement of medical treatments" and likewise declared that psychoanalysis itself was the major culprit standing in the way of this endeavor, with its failure to answer "practical and quantifiable questions" about the match between pathology and treatment type (Rosner 2005: 117, 135). Not unlike Harold Zinn, Congress underscored that psychoanalytic researchers and therapists were not in the business of capturing and cataloguing tangible proof of their paradigm's efficacy that might be evaluated by an independent party who was not part of the patient-therapist encounter. The Carter administration found a key figure in seeing this change through in Gerald Klerman, the director of the NIMH's parent institute, the Alcohol, Drug, and Mental Health Administration (ADAMHA). Klerman was a respected researcher and practitioner, and thus ideally positioned to mediate between policy and research practices. At the vanguard of both psychopharmacology and depression research, he had developed his own novel, decidedly un- psychoanalytic paradigm for treating depression called interpersonal therapy (Klerman et al. 1974). Klerman was also a central DSM-III task force consultant, and he was keen to the rising critiques gathering around psychoanalysis. He was well aware of psychoanalysis's limitations when it came to producing metrics for measuring health and wellness that could prove to an outside observer, in the rhetorical language of numbers, that psychoanalytic therapy was working and working well. And in his estimation, science, like insurance coverage and the pharmaceutical industry, was yet another marketplace. According to Klerman, dependent on funding and 104 therefore driven by the same capitalist forces, psychiatric research's own "invisible hand...has not been sufficient to meet public health needs"(quoted in Rosner 2005: 135). Klerman helped to jumpstart and re-route basic science research in psychiatry, uplifting the ongoing efforts of DSM-III's task force and denigrating psychoanalysis while he went. He coordinated and oversaw the first ever-collaborative RCT measuring the efficacy of a medication-imipramine-against a psychotherapy-Aaron Beck's cognitive behavioral therapy (CBT), which diverged markedly from psychoanalysis (Rosner 2005: 117).23 His study achieved many things at once: it demonstrated how to conduct an RCT with a psychotherapy within the FDA's new guidelines, while also demonstrating the efficacy, quantifiability, and RCT- compatibility of a non-psychoanalytic therapy. More than that, the study used DSM-III categories to construct inclusion and exclusion criteria for the subject pool. It used these categories to demonstrate-and calculate-the extent to which research subjects moved out of these categories, their symptoms diminishing, through the course of therapy. Although never written into law, the success of the trial Klerman facilitated signaled that DSM-III categories were yet another new, gold standard for conducting research. According to the researchers with whom Rosner conducted oral history interviews, grant reviewers at NIMH favored research structured around the manual's diagnostic categories and symptom criteria after 1980-it became clear that funding awards depended on the use of DSM (Rosner 2005: 143). Because the 23 As I have argued elsewhere, CBT contains baked within its techniques and its definitions of cure the neoliberal logic of the market. The goal of CBT is to train the patient to be a cognitive behavioral therapists themselves-cure coincides with virtuosic performance of analyzing and evaluating evidence that disputes the validity of negative self- thought, and then adjusting those thoughts to match with this evidence. Beck knowingly developed his paradigm with the RCT in mind (Rosner 2005; Rosner 2018)-he wanted patients to become "junior scientists" fluent in the study (and cybernetic adjustment) of their own selves (Rosner n.d.) 105 NIMH provided the bulk of funding for psychiatric research, investigators who clung to psychoanalysis after 1980 found themselves at a serious disadvantage. THE MACHINE-READABLE EMPIRICISM OF DSM-III As a result of these interconnected changes, by the time DSM-III made its public debut, it had immense power, and its influence only grew over time. Multiple sectors of contemporary life in the U.S. would come to gather around and cut across the manual, from the realm of law and insurance, to conceptualizations of self and social deviance, to medications and the economies that form around them, to academic journals, conferences, and entire research institutes dedicated to a singular diagnostic category listed within its pages. Perhaps most consequential of all, after DSM-III's publication, psychiatric diagnosis came to function as "a key to the repertoire of passwords that provides access to the institutional software that manages contemporary medicine" (Rosenberg 2002: 256). Each diagnostic category in DSM since the third volume coincides with a numerical code, which is itself associated with different tiers and types of insurance coverage. To be diagnosed after DSM-III is to be coded-literally and figuratively-as a certain kind of subject, in need of certain state or private resources. As a mechanism of bureaucratic legibility, diagnosis after DSM-1II is what makes citizens-subjects "machine readable" (Rosenberg 2002: 257) by the actuarial calculus of the health care system, with implications for which treatments you receive, and for how long you are to receive them before you must begin paying for them yourself. Once entered into the system, "the patient is 2 Between 1950 and 1977, according to Rosner (2015), "the federal government spent over $55 million on approximately 530 psychotherapy research grants" (135). By 1977, federal dollars funded upwards of %75 "of all large-scale psychotherapy outcome research world wide" (ibid) 106 necessarily objectified and recreated into a structure of linked pathological concepts and institutional power" (Rosenberg 2002: 257), a member of a category of humans who all supposedly share some likeness. It is the DSM's material, infrastructural connections to all these other sectors of life that fuels its tenacity and sustains its influence. With all this being the case, it might be fair to say that the DSM-III achieved a "conquest" of American psychiatry, as Decker contends. But its success and influence did not follow from the manual's capacity to once and for all to make psychiatric diagnosis into an atheoretical process. Instead, its creators fit the manual in to a historically specific definition of what counts as a fact given the political economic backdrop at the time: that which can be statistically validated and verified by a third party external to the interaction between patient and clinician. If anything, DSM-III's totalizing capture of psychiatry in the U.S. had more to do with its situatedness-the way in which its authors (the task force members) recognized and pursued an intimate fit between what they sought to achieve (uniformly agreed upon definitions of what mental illness looks like) and the contours of the world as it changed around them. That is to say, the "empiricism" of DSM-I1I was not arrived at through close scrutiny of-or attempts to pin down-the body's organic processes. DSM-III did not resolve the question of what (or where) mental illnesses are-in the organism or in the psyche-although it did lay the foundation for increasingly biological framings of mental illness through the primacy it placed on disease specificity. What DSM-III's tenacity and success showcases is the extent to which definitions of the empirical-along with the biomedical-coincide with what David Pye (1968) calls the "workmanship of certainty." Although Pye made use of the workmanship of certainty primarily to discuss industrialized mass production, I expand on his discussion by pointing out how the "exactly predetermined" and therefore "certain" (1968: 341) object it 107 produces also resembles the stabilized, predictable, and replicable outcomes that diagnosis and research under the umbrella of DSM-III are supposed to produce. Psychiatry's ascension to the category of biomedicine has less to do with its pursuit of biological processes and material, and more to do with its ability to standardize its object of study-mental illness-through uniform methodologies and decision-making processes. 25 Hence, Orr (2006) notes that the manual's empiricism is "a strange and elusive" one that "exhibits curious symptoms of epistemological dizziness and ontological trembling" (240). While task force members were committed to establishing a common language for describing symptom manifestation, they never claimed that these criteria were associated with biologically existent entities. On the one hand, DSM-1II was folded into the shifting terrain of the health care sector and federally funded research programs, with lasting, material impacts for those who live with and alongside mental illness. On the other hand, to the researchers and practitioners serving on the DSM-III task force, the 1980 edition was considered a productive starting point, a placeholder, a temporary fix to tide the discipline over while basic science researchers continued to crack away at the pressing, unresolved matter of disease etiology. Spitzer himself attested that all of the categories in DSM-III are "hypotheses to be tested," invitations for further research rather than finalized conclusions (quoted in Orr 2006: 241). In the context of clinical use, task force members intended for clinicians to conduct diagnosis by identifying the best fit between the symptoms the patient presented and the diagnostic criteria listed in the manual. Psychiatry's new nosology provided a system of close-enough approximations and prototypes rather than ideal types (Cantor et al. 1980). 25 Pye contrasts the workmanship of certainty with the "workmanship of risk," which coincides with the artisanal and the handmade. If the workmanship of certainty produces the same, uniform product every time, under the workmanship of risk, "the quality of the result is not predetermined, but depends on the judgment, dexterity and care which the maker exercises as he works" (Pye 1968: 344). 108 Following DSM-III's publication, task force members established an American Psychological Association committee to oversee the testing of DSM's diagnostic categories and its subsequent editions. Thus, figures like Spitzer, Endicott, and Klerman embraced and advertised that the manual was "standardizing but also dynamic" (Lakoff 2005: 13), underlining "rather than obscure[ing] the probabilistic nature of diagnostic categories" (Cantor et al 1980 quoted in Orr 2006: 241). Reflecting on his team's work years later, Spitzer paints a picture of a group of humble and reflexive researchers who shrank from the moniker of "empiricism" for which they had gained their reputation: "I think we knew that we often...were making up these criteria because they seemed really appropriate and useful. But there are very few instances where the actual choice was empirical. And most people don't appreciate that, but that is the fact" (quoted Orr 2006: 240; my emphasis). Thus, we find a divergence from the common tale of DSM's empirical-driven conquest of American psychiatry. Spitzer and the task force were not so confident about the empirics of their empiricism. Empiricism was more of an ideal, a carrot leading them forward, rather than what the revision itself ended up achieving. There is a distinction, then, between the tactics used to revise the manual, and the manual's relationship to other medical fields. As Orr argues, The notion of [diagnostic] validity starts to float free of any measure of an actual correlation between the name and the thing (correlations made, for example, in medicine via the evidence of lesions, bacteria, fractured bones, blocked arteries); instead, validity is increasingly linked to predictive power, the ability to name not the thing but its future path. From an objective measure of realness to a pragmatic measure of predictive utility, the validity of psychiatric diagnoses becomes abstracted from any reality principle at precisely the moment the diagnostic classification system turns insistently empirical (Orr 2006:242). While diagnosis after DSM-II may have left room for a shivering, unstable kind of empiricism, the manual's diagnoses have consequences for those whom it encodes, who live under the weight of its titles, or must wear their membership to the populations it describes like an albatross. It is 109 rather the definition of empiricism that transforms during this time period, which DSM-III reproduces. The machine-readable empiricism that the manual ratifies and laminates is also linked with power, because it is linked with funding, and because of its association with uniformity, the stable "world of algorithms" that Spitzer, Endicott, and other chased after. In the advent of "biopsychiatry," the "biological" is a vanishing category, a floating signifier standing in for that which is stable, can be coded, certain, uniform. Hence, critics like Vaillant denigrate DSM-III and its authors for attempting to make psychiatry like orthopedics and like computer science-there is a linkage, a family resemblance, between the stability, certainty, and uniformity these fields appear capable of achieving. That is, there is an association between disease specificirv and disease computability. Rosenberg's use of software metaphors to discuss what DSM-II1 achieved precisely highlight the point I am trying to make. In DSM-111, the computational/actuarial acts as a placeholder for the biological. It rhetorically functions as the biological-the biologically real,for now, according to task force members, until future task force members can better pin down the reality of psychopathology, perhaps when better science has come along. This is where Computational Psychiatry, and my informants, enter the scene, leading us in to the ethnographic present. In their estimation, better science has not come along. It is theirjob to bring it into fruition, to fully reform and reformat psychiatry once and for all. CONCLUSION: DSM IS DEAD! LONG LIVE DSM! In May 2013, weeks before the APA was to publish the fifth edition of DSM (DSM-5), Thomas Insel announced in the NIMH's "Directors Blog" that the institute would be "re-orienting 110 research away from DSM categories" toward "research projects that look across current categories...to begin to develop a better system." For Insel and many others at NIMH and beyond, "better" meant a classificatory system that categorizes mental illnesses according to their mechanisms of biopathology, with an emphasis on understanding the kind of neural circuitry that leads to the cognitive and behavioral symptoms of mental illness. Following Insel's post, NIMH took a number of steps to ensure that researchers who seek NIMH funding would move away from using DSM in their studies and instead implement NIMH's own, novel matrix for designing research hypotheses, called the Research Domain Criteria, or RDoC (Insel and Gotay 2014:745). Most notably, NIMH enforced an adherence to RDoC through a number of funding announcements 2 6, clarifying that NIMH will be favoring research that posits mechanisms of action over research that uses DSM categories to formulate subject populations or that investigates DSM-specific diagnostic categories (Insel 2013; Insel 2014). 7 %.7'TPsychoas TTICogn NIMH graphic illustratingPthelogic ofRDoC: subtypesofdisordersorundiscovereddisordersduetosomeshared biopathological mechanism may occur in populations of people that would typically be grouped in separate populations according to DSM's traditional diagnostic categories. Insel, Cuthbert and others have proposed that it would be more viable (especially in terms of treatment development) to group mental illnesses according to these shared biological features (biotype 1, biotype 2, biotype n). RDoC is a matrix for developing research to arrive at these biological features without recourse to DSM categories and the "artificial" boundaries they create by, for 2 These claims (that NIMH will not fund research that uses DSM categories or that is aimed at investigating DSM diagnostic criteria) have been tempered and scaled back since Insel left NIMH for Google Life Sciences in 2017. 11 example, conducting studies with people who experience cognitive control or sensorimotor reactivity differences, rather than conducting studies with people who share the diagnosis of "schizophrenia." Insel and others reasoned that DSM was an outdated tool that had fortified the barrier between basic and applied research, causing more harm than good and standing in the way of developing efficacious treatments. To this day, DSM-driven diagnosis cannot guide a clinician in identifying (if such a thing does indeed exist) the essential, biological core of mental illnesses. When researchers use DSM to recruit research participants for studies, there is no assurance that the participants share any kind of biological likeness that might be associated with the disorder. They share, if anything at all, similarly interpreted pathological behaviors and symptom expression, or score similarly on psychological inventories like the BDI. If there is no way to identify the existence of shared bio-etiological traits among patient groups, then the biological validity of the conclusions this research produces (like conclusions about the efficacy of an intervention) are indeterminate and shaky. Insel and others in support of NIMH's position have therefore asserted that DSM is prime culprit of America's mental health crisis. They argue that the RDoC project can deliver American psychiatry's long-term desiderata, since it is supposed to lay the groundwork for the royal road to biopathological mechanisms. The foundational changes the DSM-II task force members made-putting together a manual that focuses on phenomenology and reliability while eschewing models of disease etiology-are finally on the chopping block. Investigators demand that the manual's ontological trembling be held steady and that its pragmatic empiricism be taken to task once and for all. NIMH's rejection of DSM and the public unveiling of RDoC caused a stir, and yet, the waves it sent out across U.S. psychiatry rippled in a familiar pattern. Though DSM-5 and its immediate predecessors contain no trace of psychoanalysis, innovators and disruptors like Insel now name the DSM itself as the culprit of psychiatry's epistemological and public health 112 shortcomings. DSM holds the position psychoanalysis once did-the unresolved bugaboo preventing psychiatry from achieving its medico-scientific status-and suffers the same critiques. For instance, Insel and former RDoC project director Bruce Cuthbert underscored in a 2015 Nature article that even though "clinicians rightly pride themselves on their well-honed observational skills...diagnosis in psychiatry [with DSM], in contrast to most medicine, remains restricted to subjective symptoms and observable signs" (Insel and Cuthbert 2015:499). The NIMH's RDoC funding announcements harken back to rise of RCTs in the 60s and 70s, and the NIMH's eschewing of psychoanalysis. A popular technique in Computational Psychiatry research is to gather together as large of a pool of research subjects as possible who all share some broadly construed symptom-cognitive processing, auditory hallucination 27 -that does not draw from the language or structure of DSM. The overarching goal of this big data approach is to strip bias away from research and achieve pure, unmediated, theoretical agnosticism- buzzwords familiar to the DSM-III task force. And the RDoC approach's focus on broadly construed symptom phenomenology rather than "theoretically specific," conventionally recognized diagnostic categories and symptom criteria resonate with the DSM-III task force member's initial logic for revising DSM. In many ways, then, Insel and Cuthbert-the primary supporters of RDoC at its inception-are attempting to finish what the DSM-III task force member started. Their stance toward psychiatric nomenclature-and their proposed tactics to disrupt the field-may at first blush seem different. However, I hope to have shown through this journey into psychiatry's 27 During preliminary fieldwork, several research investigators at west coast based universities described this as their best guess for what RDoC-compatible research might look like, since there were no clear guidelines or easily accessible examples of successfully funded research. With Insel's exit from NIMH, the fervor and mystery surrounding RDoC has receded, and the promises of its potential to disrupt the old guard of mental health care research have lost their steam. 113 coded past that investigators across these two moments in the field's history are fixated on similar issues. What remains consistent across these two projects-the third revision of DSM and the rise of Computational Psychiatry, an instantiation of the RDoC project-is the battle between objectivity and subjectivity and efforts to extricate human judgment and the idiosyncrasies of theoretical training in psychiatric medicine, with recourse to the machinic and the computational, figured as foils of the human. Many of the tools developed during this time period-from the diagnostic inventories, to the social life that the diagnostic categories themselves have taken on since DSM-III's publication in 1980-were palpably and materially present during my fieldwork, even in the time of RDoC, and even in the middle of interdisciplinary research endeavors that fall under the umbrella of Computational Psychiatry. Although the goalpost for measuring what constitutes adequate empiricism in psychiatric medicine continues to shift (from that which can be calculated and made statistically evident to that which is anchored in the body's material substances and evidence of its mechanisms) by tracking this shift, we can observe the humanistic and the computational being defined dialectically, against and in tandem with each other. Techniques that can provide "unmediated" access to the body, or that actors interpret to be capable of translating otherwise ineffable, internal experiences into reified calculations, occupy the space of the machinic or the computational. Techniques that fail to produce certain or concretized proof of their efficacy, or that are anchored in supposedly private, hidden or difficult to access individuated experiences, occupy the space of the humanistic. The tensions between these two categories, and how they are worked out and negotiated in the context of psychiatric research, are fertile grounds for exploring hierarchies of value and their naturalization and refraction into realms of life beyond the psychiatric and even medico-scientific. These binaries replicate and map onto binaries of gender 114 (masculine/feminine), which itself are refracted through the distinction between research-related tasks that require expertise (the spark of ingenuity and intellect) and tasks that can be completed through supposedly innately human, automatic, inborn capacities (work that can be performed "automatically" because it requires no skills). The symmetry between these two time periods and the productive tension between the computational and the humanistic plays out most legibly in Spitzer and Endicott's early experiments with computerized diagnosis, and I turn to their papers to conclude. In the final DIAGNO paper, Spitzer, Endicott and company contend that the computerization of diagnosis is an achievable goal because "any feature that is capable of explicit verbalization can be precoded" (Spitzer et al 1974: 202). In other words, so long as a phenomenon can be articulated-described verbally or otherwise, recorded, transduced into a more durable form-it can be operationalized. Honing in on this passage, Orr argues that Spitzer and Endicott's DIAGNO studies are emblematic of the entire enterprise of DSM-III, which achieved much more than the creation of a conventionalized, common language with which to describe mental illness. Instead, she asserts that the task force members were participating in a broader project of refashioning "mental disorders into patterns of information" (Orr 2006: 244), a project she refers to, following Donna Haraway (1985), as an "informatics of domination." 2 8 Orr thus argues that DSM-III laid the foundations for making the automation of psychiatric judgment both culturally desirable, and 2 Orr calls this an "informatics of diagnosis"(2010: 356), explicitly building on Donna Haraway's (1985) notion of an "informatics of domination," a concept that Haraway developed to make sense of cybernetics via questions of power. The larger goal of Orr's essay is to make sense of "cybernetics as a technology of social governance" (Orr 2010: 356) and she argues that the remaking of DSM-II1-and psychiatry writ large-after the image of the computer is a key instantiation of how cybernetics as a "governmentality of mentality" (2010 357). In this way, Orr argues that DSM-II1's algorithmic-style diagnostic reasoning is not just about advent of biopsychiatry (by way of disease specificity) but also symptomatic of a larger trend toward an "automated, informatics control of human mentality" (ibid). 115 technically plausible by reformatting the language of U.S. psychiatry in the image of the computer. Orr's assertion that DSM-1II refashioned mental illnesses into patterns of information resonates into the present day, even while the existence of Computational Psychiatry itself bespeaks the shortcomings of DSM-I1I's refashioning work. That is to say, language remains a persistent problem in the context of U.S. psychiatry, from the era of Freudian psychoanalysis, to the neo-Kraepelinians, to renegades and disruptors like Thomas Insel. In the contemporary moment as in the time of Spitzer and Endicott, we encounter the same, persistent language ideology: the notion that language is attached to and has the capacity to represent interior states (like mental illness) yet the speaker can agentively control, modulate, warp, and jam this connection. The truth of the matter of mental illness, while available through language, is difficult to decipher, and the speaking subject in psychiatric encounters (whether psychoanalytic or evidence-based) is fallible and unreliable, perhaps even more so than the listener. Like projects to pin down the communicative unconscious and the computerization of diagnosis, the computerization of psychiatric assessment is just as much about downplaying the agency of the patient as speaking subject, as it is about downplaying the agency of the observing scientist (and the dehumanization of the administrative labor force working under the scientist). On this note, I'd like to suggest that the impacts of DSM-III's publication are not quite as totalizing as Orr claims, in part by (as I have done throughout this chapter) pointing to the administrative work that props up efforts to fold computerization into psychiatric judgment. While Orr marks the publication of DSM-II1 as part of an epochal shift in efforts to govern and control the terms of what it means to be human, I have shown that the impulse to turn to machines in order to circumvent the subject realm of human judgment-especially when it 116 comes to locating the signs of psychic states through language-can be found years before DSM- III's publication, Zinn and Lasswell-as well as years later, with the advent of Computational Psychiatry. Spitzer and Endicott's assertion-that anything that can be verbalized can be coded-represents a version of reality in which the potential patient speaks, and the computer codes their speech as it exits their mouth, leaving out the humans who elicit the speech, record it or store it somewhere, and manually code it before entering it into a computational system. Computerization is not a linear process, just as it is not a singular, individualized one. It takes work, a network of technicians and verbatim laborers, and, as my ethnographic chapters will show, encounters bumps and roadblocks that are sometimes insurmountable. Altogether, turning to recent historical and ethnographic accounts of computing, labor, gender, and race can help to articulate a labor history of psychiatric research, which dulls the shine of Computational Psychiatry's newness, resisting interpretations that dehistoricize its tactics and technologies (Irani 2015, 2019; Amrute 2016; Hicks 2017; Rankin 2018). These histories of computing, labor, and psychiatry are braided together, and form the backdrop of the three ethnographic chapters that follow this one. The "logic" of computer science might feel cold, but the labor is always human-this is likewise the case for psychiatry. Early adaptors of computational techniques in psychiatry and contemporary participants of Computational Psychiatry privilege the figure of the machine for its capacity to reach beyond the human without accounting for the humans who make computerized interventions possible, including research subjects. In other words, the humanistic/computational dichotomy papers over the humans who conduct mechanized, "unskilled" labor, operating from within the slot of the machine. Thus, as psychiatry becomes more technical, it begins to look more computational from a labor standpoint as well. In this way, the chapter foregrounds the distinction made by both 117 Computational Psychiatry and my informants between psychiatric screening as unskilled, mechanized labor, and diagnosis (and psychotherapy) as expert, human labor. 118 References Adams, Vicanne. 2013. "Evidence-Based Global Public Health: Subjects, Profits, Erasures." In When People Come First: CriticalS tudies in Global Health. Pp. 54-90. Princeton: Princeton University Press. American Psychiatric Association. 1968. Diagnostic and StatisticalM anual of Mental Disorders, Second Edition (DSM-IJ). Washington: American Psychiatric Publishing. American Psychiatric Association. 1980. Diagnostic and StatisticalM anual of Mental Disorders, Third Edition (DSM-III). Washington: American Psychiatric Publishing, American Psychiatric Association. 2013. Diagnostic and StatisticalM anual of Mental Disorders, Fifth Edition (DSM-5). Washington: American Psychiatric Publishing. Amrute, Sareeta. 2016. Encoding Race, Encoding Class: Indian IT Workers in Berlin. Durham: Duke University Press. Bayer, Ronald, and Robert L. Spitzer. 1985. "Neurosis, Psychodynamics, and DSM-III: A History of the Controversy." Archive of GeneralPsychiatry4 2(2): 187-96. Casey, B.J., Nick Craddock, Bruce N. Cuthbert, Steven E. Human, Francis S. Lee and Kerry J. Ressler. 2013. "DSM-5 and RDoC: progress in psychiatry research?" Nature Reviews Neuroscience 14:810-14. Cantor, Nancy Smith, Edward E. Smith, Rita D. French, and Juan Mezzich. 1980. "Psychiatric diagnosis as prototype characterization." Journalo fAbnormal Psychology 89(2): 181-193. Cronon, William. 1992. Nature's Metropolis: Chicago and the Great West. New York: W.W. Norton. Daston, Lorraine. 1992. "Objectivity and the Escape from Perspective." Social Studies ofScience 22(4): 597-618. Daston, Lorraine and Peter Galison. 2007. Objectivity. Cambridge, MA: Zone Books. Decker, Hannah. 2007. "How Kraepelinian was Kraepelin? How Kraepelinian are the neo- Kraepelinians? - from Emil Kraepelin to DSM-III." History ofPsychiatry 18(3): 337-60. Decker, Hannah S. 2013. The Making ofDSM-III: A Diagnostic Manual's Conquest ofAmerican Psychiatry. New York: Oxford University Press. Dumit, Joseph. 2012. Drugsfor Life: How PharmaceuticalC ompanies Define our Health. Durham: Duke University Press. Edelman, Robert I. 1969. "Intra-therapist Diagnostic Reliability." Journalo f ClinicalP sychology 119 25(4): 394-96. Erickson, Paul, Judy L. Klein, Lorraine Daston, Rebecca Lemov, Thomas Sturm, and Michael D. Gordin. 2013. How Reason Almost Lost its Mind: The Strange Career of Cold War Rationality. Chicago: University of Chicago Press. Feighner, John P. 1979. "Nosology: A Voice for a Systematic Data-Oriented Approach." American Journal ofPsychiatry 136(9): 1173-4. Feighner, John P. and Eli Robins, Samual Guze, Robert A. Woodruff, George Winokur, Rodrigo Fox-Keller, Evelyn. 1995. Refiguring Life: Metaphors of Twentieth-century Biology. New York: Columbia University Press. Munoz. 1972. "Diagnostic Criteria for Use in Psychiatric Research." Archive of General Psychiatry 26(1): 57-63. Forsythe, Diana. 1993. "Engineering Knowledge: The Construction of Knowledge in Artifical Intelligence." Social Studies of Science, 23(3): 445-477 Freud, Sigmund. 1958[1912]. "Recommendations to Physicians Practising Psycho-Analysis." In The StandardE dition of the Complete Works of Sigmund Frued, Volume XII (1911-1913): The Case of Schreber, Papers on Technique and Other Works. Trans. James Strachey. Pp. 109-120. London: Hogarth Press and the Institute of Psycho-Analysis Freud, Sigmund. 1998. The Interpretationo fDreams. James Starchey, trans. New York: Avon Books. Hayden, Cori. 2004. When Nature Goes Public: The Making and Unmaking of Bioprospecting in Mexico. Princeton: Princeton University Press. Hayles, Katherine. 1999. How We Became Post-Human: Virtual Bodies in Cybernetics, Literature, and Informatics. Chicago: University of Chicago Press. H elmreich, Stefan. 2000. Silicon Second Nature: Culturing Artificial Life in a Digital World. Compton: University of California Press. Herzig, Rebecca. 1995. Sufferingfor Science: Reason and Sacrifice in Modern America. Piscataway: Rutgers University Press. Hicks, Marie. 2017. ProgrammedI nequality: How Britain Discarded Women Technologists and Lost its Edge in Computing. Cambridge, MA: MIT Press. Huys, Quentin J.M, Tiago V. Maia, and Michael J. Frank. 2016. "Computational psychiatry as a bridge from neuroscience to clinical applications." Nature Neuroscience 19(3): 404-413. Inoue, Miyao. 2011. "Stenography and Ventriloquism in Late Nineteenth Century Japan." Language & Communication 31(3): 181-190. 120 Inoue, Miyako. 2018. "Word for Word: Verbatim as Political Technologies." Annual Review of Anthropology 47:217-32. Insel, Thomas. 2012. "Research Domain Criteria-RDoC." National Institute of Mental Health. Director'sB log, March 6. http://www.nimh.nih.gov/about/director/2012/research-domain- criteria-rdoc.shtml, accessed on January 12, 2015. Insel, Thomas. 2013. "Transforming Diagnosis. National Institute of Mental Health." Director's Blog, April 29. http://www.nimh.nih.gov/about/director/2013/transforming-diagnosis.shtml, accessed January 3, 2015. Insel, Thomas. 2014. "A New Approach to Clinical Trials." National Institute of Mental Health. Director'sB log, February 27. http://www.nimh.nih.gov/about/director/2014/a-new-approach-to- clinical-trials.shtml, accessed January 3, 2015. Insel, Thomas and Bruce N. Cuthbert. 2013. "Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders". American Journal ofPsychiatry 167(7):748-751. Insel, Thomas and Gotay 2014. "National Institute of Mental Health Clinical Trials: New Opportunities, New Expectations." JAMA Psychiatry 71(7):745-56. Irani, Lilly. 2015. "The cultural work of microwork." New Media and Society 17(5): 720-739. Irani, Lilly. 2019. Chasing Innovation: Making EntrepreneurialC itizens in Modern India. Princeton: Princeton University Press. Kay, Lily. 2000. Who Wrote the Book ofLife? A History of the Genetic Code. Stanford: Stanford University Press. Kittler, Friedrich. 1999. Gramophone, Film, Typewriter. Geoffrey Winthrop-Young and Michael Wutz, trans.Stanford: Stanford University Press. Kraepelin, Emil. 1921[1919]. Manic Depressive Insanity and Paranoia.R . Mary Barclay, trans. Edinburgh: E.S. Livingstone. Lasswell, Harold. D. 1935. "Verbal references and physiological changes during the psychoanalytic interview: a preliminary communication." PsychoanalyticR eview (22): 10-24. Lempert, Michael. 2019. "Fine-Grained Analysis: Talk Therapy, Media, and the Miscroscopic Science of the Face-to-Face." Isis (110)1: 24-47. Lakoff, Andrew. 2005. PharmaceuticalR eason: Knowledge and Value in Global Psychiatry. Cambridge: Cambridge University Press. 121 Lakoff, Andrew. 2005b. "Diagnostic Liquidity: Mental Illness an the Global Trade in DNA." Theory and Society 34(1): 63-92. Martin, Emily. 2007. Bipolar Expeditions: Mania and Depression in American Culture. Princeton: Princeton University Press. Petryna, Adriana. 2009. When Experiments Travel: Clinical Trials and the Global Searchfor Human Subjects. Princeton: Princeton University Press. Pye, David. 2010[1968]. "The Nature and Art of Workmanship." In The Craft Reader. Glenne Adamson, ed. Pp. 341-53. Oxford: Berg. Orr, Jackie. 2006. Panic Diaries:A Geneology ofPanic Disorder. Durham: Duke University Press. Orr, Jackie. 2010 "Biopsychiatry and the Informatics of Diagnosis." In Biomedicalization: Technoscience, Health, and Illness in the U.S. Adele E. Clarke et al, eds. Pp. 353-379. Durham: Duke University Press. Rankin, Joy. 2018. A People's History of Computing in the United States. Cambridge, MA: Harvard University Press. Riles, Annelise. 2006. Documents: Artifacts ofModern Knowledge. Ann Arbor: University of Michigan Press. Robins, Eli and Samuel B. Guze. 1970. "Establishment of Diagnostic Validity in Psychiatric Illness: Its Application to Schizophrenia." American Journal ofPsychiatry 126(7): 983-87. Rosenberg, Charles. 2002. "The Tyranny of Diagnosis: Specific Entities and Individual Experience." Millbank Quarterly 80: 237-60. Rosner, Rachael I. n.d. In Beck's Basement: Aaron T Beck and the Cognitive Revolution in American Psychotherapy. Unpublished manuscript. Rosner, Rachael I. 2018. "Manualizing psychotherapy: Aaron T. Beck and the origins of Cognitive Therapy ofDepression." European Journalo fPsychotherapy and Counselling 20(1): 25-47. Rosner, Rachael I. 2005. "Psychotherapy Research and the National Institute of Mental Health, 1948-1980." In Psychology and the National Institute of Mental Health: A HistoricalA nalysis of Science, Practice,a nd Policy. Wade E. Pickren and Stanley F. Schneider, eds. Pp. 113-150. District of Columbia: American Psychological Association. Sanders, James L. 2011. "A Distinct Language and a Historic Pendulum: The Evolution of the Diagnostic and Statistical Manual of Mental Disorders"Archive ofPsychiatricN ursing 25(6): 394-403. 122 Schaffer, Simon. 1994. "Babbage's Intelligence: Calculating Engines and the Factory System." CriticalI nquiry 21(1): 203-22 Schechter, Kate. 2014. Illusions of a Future: Psychoanalysisa nd the Biopolitics ofDesire. Durham: Duke University Press. Semel, Beth. 2014. "Tracking the self, installing expertise: Cognitive-Behavioral Therapy and the auto-regulating subject." Paper presented at the Annual Meeting of the Society for Social Studies of Science in conjunction with SociedadLatinoamericanad e Estudios Sociales de la Ciencia y la Tecnologia, August 21, Buenos Aires, Argentina. Semel, Beth. 2013. Culture all the way down? Interpreting "Culture" and Imagining Competence in a Cross-CulturalP sychology Class. Master's Thesis, Brandeis University. Soto Laveaga, Gabriela. 2009. Jungle Laboratories:M exican Peasants, National Projects, and the Making ofthe Pill. Durham: Duke University Press. Spitzer, Robert L. and Jean Endicott, Eli Robins. 1978. "Research Diagnostic Criteria: Rationale and Reliability." Archive of General Psychiatry 35(6): 773-82. Spitzer, Robert L. and Michael Sheehy. 1976. "DSM III: A Classification System in Development." PsychiatricA nnals 6 (9): 102-9. Spitzer, Robert L., Jean Endicott, Jacob Cohen, and Joseph Fleiss. 1974. "Constraints on the validity of computer diagnosis." Archives ofGeneralP sychiatry 31(2): 197-203. Spitzer, Robert L. and Paul T. Wilson. 1968. "An Introduction to the American Psychiatric Association's New Diagnostic Nomenclature for New York State Department of Mental Hygiene Personnel."P sychiatric Quarterly 42(3): 487-503. Spitzer, Robert L. 2001. "Values and Assumptions in the Development of DSM-III and DSM- III-R: An Insider's Perspective and a Belated Response to Sadler, Hulgus, and Agich's 'On Values in Recent American Psychiatric Classification." Journalo fNervous and Mental Disease 189(6): 351-9. Star, Susan Leigh, and James R. Griesemer. 1989. "Institutional Ecology, 'Translations' and Boundary Objects: Amateurs and Professionals in Berkeley's Museum of Vertebrate Zoology, 1907-39." Social Studies ofScience 19(3): 387-420. Star, Susan Leigh. 1991. "The Sociology of the Invisible: The Primacy of Work in the Writing of Anselm Strauss." In Social Organizationa nd Social Process:E ssays in Honor ofAnselm Strauss. D. Maines, ed. Pp. 265-283. New York: Aldine De Gruyter. Stevens, Hallam. 2013. Life Out of Sequence: A Data-DrivenH istory ofBioinformatics. Chicago: University of Chicago Press. 123 Strathern, Marilyn, ed. 2000. Audit Cultures: Anthropological studies in accountability, ethics and the academy. London: Routledge. Turnbull, David. 2000. Masons, Tricksters and Cartographers.L ondon: Routledge. Wilson, Elizabeth. 2010. Affect and Artificial Intelligence. Seattle: University of Washington Press. Young, Allan. 1995. The Harmony ofIllusions: Inventing Post-TraumaticS tress Disorder. Princeton: Princeton University Press. 124 Chapter 2: Talking Heads: Brains, Bodies, and Vocal Biomarkers "Some day we shall know how to validate the saying of the old physician which is on the title- page of this book: 'From him who has eyes to see and ears to hear no mortal can hide his secret; he whose lips are silent chatters with his fingertips and betrays himself through all his pores" (Lasswell 1930: 239) Imagine yourself thoroughly packed into a narrow, white tube that surrounds your whole body as you lie horizontal on a thin pallet, tucked in with a white blanket, yourface covered by what appearst o be a white motorcycle helmet with goggles that wrap around the back ofyour head to your neck. Yellowing squares of mattressfoam along each ofyour ears and afolded up pillowcase at the base of the helmet holdyour headfirmly in place. Your ears are plugged with expensive, noise canceling ear buds, and along the length ofyour legs and torso run a bundle of wires connected to devices that rest on your chest: two boxes with red, green, blue, andyellow buttons on them that you will eventually be directed to press, and an oblong ball that you've been directed to squeeze in case something goes wrong. You are inside afunctional magnetic resonance (fMRI) machine, about to begin yourfirst brain scan. I narrate this second-person ethnographic conceit in my head in an effort to remain calm as I settle into the scanner's constricted passageway. I imagine what kind of descriptive turns of phrase it would take to pull my readers into this cold, tiny space with me, in part as a way to remind myself that I will not be in here forever. Even though I have watched and assisted in more scans than I can keep track of over the last three months, this is my first time in the belly of the beast. And even though it never arrives, I am anticipating the sudden onset of claustrophobia that I have heard grips some people-research subjects, medical patients, my twin sister-who have never before feared small spaces once inside the scanner. 125 My move to the ethnographic ur-voice-a remixing of Malinowski's own mythical fabrication of what it is like to conduct fieldwork-is not unique. Anthropologists before me who have studied cognitive neuroscience labs in the United States and Europe have written similar passages (see especially Joyce 2008; Langlitz 2012) attempting to capture this strange yet biomedically mundane experience while underlining that the study of technoscience is just as valid of an object of ethnographic investigation as, for instance, the gardening practices of the Trobrianders. Moreover, I am not only crafting this naive subject position-for who everything is strange-for my own benefit alone, to soothe my anxious nerves. I am trying to help my informants, a group of researchers working within a cognitive neuroscience lab at East Coast University (ECU), test-run their experimental set-up, giving them feedback from the first-person perspective of a research subject entering the scanner for the first time and knowing nothing about the ins and outs of their research project, its longer history in relation to the history of U.S. psychiatry, or how its scope has shifted over time. As a research assistant on the team with no technical or academic training in neuroscience, this is one of the few research responsibilities that I can actually assist with. Years prior to my scanning debut, I had met with the head of the entire cognitive neuroscience lab to learn more about his ongoing work with the lab's lead research scientist, Sushant, to predict which patients diagnosed with social anxiety disorder would respond well to cognitive behavioral therapy based on fMRI scans. It was in this meeting that I learned of Sushant's collective of researchers and their ambitious project. They were looking for "vocal biomarkers" of depression: micro-level, acoustic features of speech that might be indicators of the presence or onset of depression and that might also be helpful in shedding light on the neurobiological underpinnings of depression. 126 A hybrid of "biological" and "marker," biomarkers are a "broad subcategory of medical signs-that is, objective indications of medical states observed from outside the patient-which can be measured accurately and reproducibly" (Strimbu and Travel 2010: 463). An ideal-typic biomarker would be a gene that codes for an enzyme, indicating the action of some disease mechanism when found in a sample of blood. For those invested in Computational Psychiatry and in building a diagnostic system that is anchored in biology and moves beyond DSM, biomarkers are foundational to translating mental illnesses into decontextualized diseases, offering an entryway into what would be (in the context of Euro-American conceptualizations of the self and body) otherwise interior, private, and inaccessible phenomena. The first time I heard the term "vocal biomarkers," it sounded like a riddle, especially given the normative, Euro-American language ideologies of linguistic transparency that circulate in mental health care institutions. As Summerson Carr (2010) describes, mental health counselors in the U.S. tend to frame mental illness as a kind of semiotic detritus that clogs up the channel between spoken utterances and a speaker's inner self. Undergoing treatment and achieving psychological health corresponds with clearing up this passageway, enabling "honest" and "authentic" talk: speech that is referentially transparent and in direct correspondence with the speaker's intentions. The notion of a vocal biomarker presses tension into this model, for it insinuates that a speaker has no control over the expression of interior states, regardless of how they modulate their speech-self relationship (i.e., regardless of their intention to speak "honestly"). In theory, vocal biomarkers flow freely in streams of speech irrespective of a speaker's intentions to express or conceal them. If they can be pinned down and identified, they promise transparent, unmediated access to interior states. At the same time, vocal biomarkers 127 will remain opaque and inaccessible to speakers and listeners alike absenting the proper technological re-mediation. I wondered: what media ideologies (Gershon 2010)-or on-the-ground ideas about the medium of fMRI, audio recording, and the capacity of these techniques to capture something about brains and speech- supported the research, rendering the study both socioculturally desirable and technically plausible? If the concept of "vocal biomarkers" insinuates that mental illness has telltale sounds, then how might these sounds be made audible to the researchers, or to people with the power to move patients through the health care system? What are the stakes of non-psychiatric personnel using the tools of psychiatry-like the psychological inventories developed during the 1970s and 1980s-to define mental illness, categorizing humans (and their brains and speech sounds) as either depressed or not depressed? In exchange for allowing me to conduct participant-observation alongside them, taking on the role as their "meta-scientist" (as Sushant, the team's Primary Investigator (PI), called me when introducing me to others) the team had to be sure that I was "pulling my own weight." I had to contribute to the research in some useful manner. This includes playing the role of a research subject whenever the team requested. By doing so, I am helping them pin down where errors in their data entered into the workflow and determine if they should augment or amended the directions they give research subjects. The ear buds I'm wearing are designed to protect research subjects' ears from what will be the painfully loud sound of the scanner, which will reach heights of 125 decibels once the scan begins, the sonic equivalent of popping a balloon in front of your ear. The ear buds also carry into research subjects' ears the gentle voice of Victor, an even-keeled and often fashionably dressed third-year PhD student on the team. Playing the role of himself as 128 researcher, Victor instructs me to lie as still as possible, explaining that the first scan will soon begin and will last between ten to fifteen minutes. Though I cannot see him from my position inside the tube, I know that he watches me through the soundproofed window of an adjacent room, called the "control room," surrounded by stacks of blank CDs, errant pens and paper clips, a broken analog clock, three desktop computers, and three laptops. This is the vantage point from which I typically observe the scans, keeping my eyes on the research subject's feet sticking out of the scanner. If I move at all inside the scanner and Victor can see, he must take note of the time and nature of my movements on a Google form that he keeps open on one of the laptops for the duration of the experiment (which will last three hours at minimum). Bodily movements during an fMRI scan subtly changes the position of the research subject's head in the motorcycle helmet apparatus. This "blurs" the images of the subjects' brain activity that the scan is designed to represent, rendering the data unusable for later interpretation. Thus, it is crucial to monitor the subject's body, and so Victor has assistance from Santiago, the team's technical assistant who has recently completed his B.A. in biology but is a programmer and gamer at heart. When I help with scans, I work with Santiago while Victor remains in his office busy with tasks that the team consider to be higher priority, like data analysis, reserved for team members who have more technical expertise and have been working on the team for longer. I try to relax and remain still, closing my eyes as I'm bathed in the rhythmic grinding and pounding and clanking of the scanner, much louder than how I usually hear it from the control room. The sounds-which the ear buds slightly diminish but which I can feel through the scanner bed-recall a combination of fire alarms going off, concrete being jackhammered, metal being sawed in half, and my teeth being drilled at the dentist. This uncanny cacophony has 129 inspired artists like Arnold Dreyblatt to make an entire album of "fMRI music," with each track featuring a scan of a different part of Dreyblatt's body-flesh as both medium and message (Zipoyrn 2013). Although I keep my body as still as possible, I know from the introductory level neuroscience courses I have been muddling through as part of my fieldwork that on a molecular level, I'm really quite busy. The giant magnet embedded in the scanner's tube, sitting somewhere above my head, creates a powerful, static magnetic field that is sixty thousand times the strength of the Earth's own magnetic field. As a red rug on the threshold dividing the scanner room from the control room warns, this magnet is always on. When lying still in the scanner, the magnet tilts certain molecules in my brain that exhibit a physical property called "spin," creating a net magnetization among them and lining all of the molecules up in the same direction. A pulse of radio frequency is sent through the magnet tube, causing it to wrench in place-the source of the familiar, menacing sounds that encircle me-and once again disturbing the position of my brain's molecules. As the pulse of radio frequency creates a secondary, temporary and much less powerful magnetic field, gradually, the molecules realign with the original magnetic field. Once they realign (i.e., as a magnetic "current" moves), my brain releases a second electrical current that can be measured externally. The job of the helmet, called the "coil," is to pick up and track this second current. This is why even the smallest of movements of a subject's head from the cradle of the coil has such dire consequences for the resulting data. Hence, the foam squares lining my ears are both to protect them from damage and to ensure that I remain still. The way the molecules in my brain realign with the first magnetic field (from the scanner itself) tells the monitoring scientists-like Victor, Santiago, and the rest of the team-something about the brains of 130 members of the experimental cohort: the brains of people who identify as having depression and who meet the study's other inclusion criteria, versus those who do not identify as having depression. Victor's voice returns to tell me that the first scan is complete. He and Santiago now have a dynamic model of my brain "at rest" that they will use as a baseline to track my "active" neuronal responses to stimuli soon to be presented to me. The ensuing scans will be "functional," capitalizing on the difference between the magnetic properties of oxygenated and deoxygenated blood to track the ebb and flow of blood29 to different regions of my brain. In theory, the concentration of blood in a brain region means that the region is active, so Victor and his colleagues have designed stimuli that supposedly animate the areas of my brain that coordinate and control the production of speech, the bodily activity that they hypothesize is impacted by the onset of depression. Fixed atop the coil is a mirror tilted camera obscura-style and aimed at a projector screen that Victor explains will display directions for the first of seven tasks I will complete. The ceiling of the tube is a mere three-and-a-half inches away from the end of my nose but the positioning of the mirror gives the illusion that the opening of the tube, which lays about two feet horizontally beyond the crown of my head, sits above me. The projector screen, corresponding with the screen of a laptop that Santiago controls, displays a blank, black slide. Via the mirror, I gaze up at the slide through the opening of the tube as if in an observatory, looking up at a night sky. Instead of stars guiding me, I am met with single-word prompts, repeating in random order: slow, rapid, normal. These are the speeds at which I am supposed to produce the sounds pa-ta-ka. The "rate word" appears as the scanner bleeps, and when a green cross appears under the rate word, 2 Specifically, it measures the BOLD (Blood Oxygenation Level Dependent) signal. 131 the scanner falls silent-this my cue to speak. I know from talking with Victor, who, like Sushant, is trained in phonetics30, that the repetition ofpa-ta-ka is a fairly standard task in the speech-language pathology world. Producing the /p/, /t/ and /k/ sounds spans the full spectrum of possible tongue and lip positions for consonant sounds in Standard American English, and so scientists who study speech consider it to be a good measure for testing a person's ability to coordinate the muscles associated with speech (also known as the articulators) at varying speeds. For the uninitiated, I conjecture, this task must proceed like the performance of a Dadaist poem: Paaa-taa-kaaa-paaa-taaa-kaaa-paaa-taaa-kaaa. Patakapatakapatakapataka. Pa ta ka pa ta ka pa ta ka. After having spent months on the other side of the sound-proofed window, I also know that, from time to time as I speak, Victor, Santiago, and whoever else is in the control room listens in on me conducting the task, though they do not tell me when they are listening, or "checking in" as they call it. Just as they must closely watch the research subjects' body in the scanner to make note of any movements they make and then ask them not to move again, team members must check in on the subjects as they speak to confirm that they are following directions. Are they speaking loudly enough so that the microphone will pick up their voices, and so their speech will be properly audio recorded for later analysis? Are they indeed saying "pa ta ka," and not some other collection of sounds? Does the subject's interpretation of the rate words (slow, rapid, normal) align with what the team agrees is slow, rapid, or normal speech? And if the subject is doing something "wrong," how should team members like Victor correct them? I can picture the process easily: with the push of a button on a speaker, Santiago fills the control room with the sounds of my voice saying pa ta ka, captured by a small microphone 30 A sub-field of linguistics focused on the study and classification of speech sounds. 132 dangling just above my chin. The microphone had been fed through the perforated bottom of a coffee cup taped to the side of the head coil. After switching off the speaker, careful to time it with the scanner pulses so that they do not transmit the blaring sound of the scanner into the control room, team members praise the research subject amongst each other for a job well done. They recognize that the tasks are confusing and that the directions are ornate and errors are common. This is why they have research subjects run through each of the seven verbal tasks two times. If the subject is making what they deem to be an error, they might brainstorm on how to prompt the subject to conduct the task differently for the second round. Or, despite their best efforts to be courteous to research subjects, they might start laughing at them. I admit that I have laughed at research subjects, because sometimes, they make funny sounds, and it can be very boring inside the sunless control room, where researchers grow sleepy as the hours it takes to conduct the scan crawl on and on, and as subject after subject cycles in and out of the scanner while team members begin to lose track of the time of day. Pretending to be a research subject-"piloting" a scan-puts team members in a position to be laughed at by their peers. Victor and Santiago may be having a chuckle at my pa ta ka's, and we have chuckled at theirs. Being the subject of laughter becomes an experience that researcher and participant share. This can make researchers think twice about snickering at the sound of other people's voices. Nevertheless, most research subjects-who are either recruited off of Craigslist or local and university-wide listservs-have never been a member of a cognitive neuroscience lab or any other kind of research lab, and so the scene in the control room is even more foreign and out-of- reach than the meaning of the sounds and the speech they are instructed to utter inside the scanner. And, as the hours advance, this research subject pilot finds it harder and harder to 133 imagine being anywhere else but inside the scanner. Being in here makes me feel extremely- almost mystically-present. The scanner bed vibrates with each beep and clang, and I concentrate on the noises, synchronizing my breathing with them. They resonate through my chest, making me feel hollow inside. The ear buds muffle the sound of my own voice, and so it seems as if my speech is not my own-as if it originates from somewhere outside my head, as if I am barely making any audible noise at all. And yet, lying motionless in the scanner bed makes the physical phenomena of speaking-the focus of the team's study-feel all the more pronounced: the opening and closing of my lips, the warmth of my breath, the rise and fall of my sternum, the tip of my tongue tapping the ridge of flesh behind my teeth, the up and down dip of my larynx... The ethnographer's brain "at rest." 134 USING THE VOICE TO UNDERSTAND THE MIND ECU is the home base not only for Victor, Santiago, Sushant, and the two other team members, but also for informants across my other fieldsites. PIs from Midwestern University and West Coast University had passed through ECU as they advanced in their careers, collaborating-and even training with-Ted, the second PI of the vocal biomarker group alongside Sushant. Thus, despite the fact that the teams in the Midwest and on the West Coast were less concerned with studying the brain or even explicitly studying human biology, when conducting research alongside them and in their own meetings and presentations, I sensed the presence of the ECU team's logic and methods. The ECU team's approach and the theories they were committed to made up the epistemological backbone of the other teams' respective studies. To better understand Sushant and Ted's vocal biomarker study is therefore to grasp something fundamental about the two other projects this dissertation follows, particularly in terms of how they all conceptualize the connection between acoustic qualities of speech and interior states, and the means through which to grasp hold of these connections. While the other teams were intent on producing a technological prototype that could aid in psychiatric screening by detecting mental illness in acoustic qualities of the voice, at ECU, their ambitions were humbler. Their goal was to publish papers responding to a pair of interlinked research questions. Are there connections between qualities of the voice and changes in the brain that occur in tandem with the disease state identified in DSM-IV as "major depressive disorder" (MDD), colloquially known as "depression"? Which vocal qualities suggest the presence or onset of depression? The ECU team's project itself is part of a longer legacy of multidisciplinary inquiry into the relationship between depression and speech sounds, a corpus of research that spans across neuroscience, psychiatry, and psychology (Greden, Albala, and 135 Smokler 1981; Godfrey and Knight 1984; Breznitz 1992; Flint et al 1993; Alpert, Pouget, and Silva 2001; Cannizzaro et al 2004) as well as communication and computer science, and engineering (Darby and Hollien 1977; Hollien 1980; Darby, Simmons, and Berger 1984; France et al 2000; Moore et al. 2003; Ozdas et al 2004; Low et al 2010; Cummins et al 2011; Goechke 2011; Schuller et al 2013; Cummins et al 2015).3 This research has established the basic premise on which the ECU team's study rests, a premise that echoed, however faintly, in the research questions of the other teams: the sounds of speech, like all other sounds, have formal, physical properties. Researchers can mathematically analyze and then reverse engineer these formal features in order to learn more about the nature of the source that produced them: the coordination of the articulators, which specific regions of the brain (the cerebellum, the cerebral cortex, and the basal ganglia) control. 3 According to Sushant, to study speech as a motor control issue is to study speech at its most basic-and therefore universal-level, for all (neurotypical) speakers, regardless of the language they are speaking and the affective or sociocultural intent of their utterances, use the same neuronal pathways to control the production of speech sounds. Following the precedents and conventions set by the researchers before them, the ECU team aims to use the sounds of speech to explore the brain. Ralph, an advanced graduate student on the team, would say in his elevator pitch of the study that they were "using the voice to understand the mind." Rather than attend to speech in terms of the meaning of what the speaker says, using the voice to understand the mind entails treating speech as neurobiologically indexical sign that conveys something about what a depressed person's brain does. Overshadowing this whole endeavor is the hegemonic figure of "brainhood" or "cerebral 3 For an in-depth review of this literature, see (Cummins et al 2015). 32 Guenther's Neural Controlq fSpeech (2016) offers one of the most comprehensive and unified treatises describing this approach to studying speech. 136 subjectivity," the notion that all human behavior and experience is governed by and can be distilled down to brain activity (Vidal 2009; Rose and Abi-Rached 2013; Vidal and Ortega 2017). Despite the univeralist underpinnings of the research questions and the scholarly legacy on which they rest, and despite the vocal biomarker team's pursuit of speech at its supposedly most bedrock of foundations (the brain), in their everyday work, researchers enacted and described to me a much more complicated relationship with both the scale of their study and the claims they had designed their study to produce. On the one hand, Ralph's catchphrase-using the voice to understand the mind-gives the impression that he and his colleagues believed that vocal qualities have always already been linked to changes in the brain, and these qualities are simply waiting to be found and described. On the other hand, Ralph and his colleagues were self- reflexive and self-aware of the complexities, complications and limitations that shaped the facts they could produce about the brain, the voice, and depression. My informants' day-to-day lives revolved around attending to the "experimental system" of their study, or the "local, technical, instrumental, social, and epistemic" aspects of their experimental set-up (Rheinberger 1997: 238). The fMRI scanner, computer programs, microphones, headphones, buttons, wires and audio recording software the researchers depended on were unreliable and malfunctioned in frustrating, unpredictable ways. More than that, using the voice to understand the mind required tinkering with what Jill Morawski (2015) calls "the experimenter-subject system." Human research subjects and experimenters are in a social relationship, and the quality of their data-and by extension, the entirety of their study depended on the management of this relationship, on ordering the body and the speech of the subject. Institutional contexts shape the experimenter-subject system, like the fact that ECU 137 could pay research subjects but Sushant and his team could neither diagnose them nor treat them. But because of the study's emphasis on the speech system, the relationship between experimenter and subject revolved most closely around complex meta-linguistic interactions. The vocal biomarker team aimed to build up an experimental context and stimuli that kept the particular individuality of the research subject and the sociocultural dimensions of language and speaking at bay, treating the research subject as a body that makes sound-as a medium transmitting brain and speech data. At the same time, speaking through the microphone in the control room and into the ear buds lodged in the subject's ear, the researcher conducting the scan had to constantly describe, clarify, and demonstrate for the subject a set of highly specific definitions of qualities like "pitch," "volume," and "speed." In turn, the research subject must produce speech sounds and sentences that align with the team's definitions of these qualities. This places team members in a contradictory position. The overarching goal of the study was to peel away meaning and culture to arrive at the fleshy, fundamental form of speech, but team members must rely on-and constantly contend with-these very same components of language in order to collect the data they needed. Researchers had to wrangle the whole assemblage of technologies, bodies, and sounds all in order to achieve a category of speech referred to in the scholarship of their colleagues and predecessors as "natural speech." In cognitive neuroscience and speech studies, the difference between "natural" and "unnatural" speech pivots on the degree to which researchers manipulate research subjects' bodies and affect. Speech is experimentally "unnatural," for example, if the researcher uses a device to perturb the research subject's lip or hold theirjaw in place while asking them to produce speech. Speech is also "unnatural" if the researcher requests that the subjects vocally evoke a specific emotion, asking the subject to speak angrily, speak happily, or 138 sadly. Relatively speaking, then, the vocal biomarker team studied "natural" speech. Their goal was to ensure that the subject produced speech as they "actually" would, as if they were speaking in any other situation out in the world, outside of the scanner, outside of the lab. The notion of natural speech compliments the media ideology of the vocal biomarker (which grants immediate, transparent access to the body through language). To facilitate experimentally natural speech is to encourage a reflexive, passive reaction rather than an active performance. In this way, although natural and unnatural speech are actor's categories with context- specific definitions, my informants' struggles with maintaining the naturalness of speech offer an intervention into linguistics as a scientific enterprise. For the vocal biomarker team, the distinction between "natural" and "unnatural," like the "biological" and the "social," were always threatening to dissolve or collapse. Researchers had to work to keep them separate. Their struggles to maintain the experimental and experimenter-subject system trouble the "construction of language as a natural object" within linguistics, and the positioning of linguistics against sociolinguistics that treats the sociocultural components of language as variants on a norm rather than essential features (Eckert 2003: 393). A BIG, BEAUTIFUL BUILDING ECU's Neuroscience Department is located in a big, beautiful building that was built with a chunk of funds from a massive philanthropic gift. The building's sleek and impassive exterior-a mixture of marble and floor-to-ceiling green glass-contrasts with the abandoned alleyways around which the building was constructed. The atrium at the entrance of the building, which offers green lounge chairs and round tables for students to gather at in between their classes, 139 draws one's eyes upward six stories to glass-paneled ceilings. Even during the frigid east coast winter, the glass concentrates sunlight onto the tables below, and by midday students move to the corners of the atrium to avoid overheating and to keep the glare off their laptop screens. At the start of my fieldwork, I spent most of my time in this place. I would sit at one of the tables in between assisting with scans and attending introductory neuroscience and acoustic phonetics courses, lectures, and workshops, watching the ebb and flow of professors, students, researchers, and tourists, listening to impassioned ping pong and foosball matches going on up on the third floor, or to the swell of voices whenever a lab offered free ice cream and cookies on the fifth floor. I kept my laptop opened but would peer over its edge, ever hopeful to catch the eyes of someone I knew passing through the atrium so that I could strike up a conversation and try to learn more about what was happening to the brain and audio data I helped to gather with Santiago in the control room a floor below. When the semester began, I had requested access to a desk in one of the offices that team members shared up on the fourth floor of this building, hoping to get closer to and learn more about what I imagined was the real action: data analysis, which I also imagined taking place on individual team member's computers. While the lab's administrative assistant had agreed to find an empty desk for me, it took a month with no communication until she assigned me a space. I would later learn that space in the lab offices was limited and that the allocation of desks to lab members was a charged and sensitive subject matter wrapped up in issues of fairness and efficiency. Desk access was a marker of belonging and authority. It signaled a move inward, away from the social periphery of the group. While Sushant was in support of my presence, I did not fit the demographic bill of a research assistant, a position typically meted to hardworking or recently minted undergraduates with expertise either in programming, in psychological 140 screening, or at least a degree in neuroscience. In arranging for me to have a desk, the admin had to wait for a lab member to graduate or move to a different university for space to free up. Or else, she would have to shift a current team member into a new space that was less close to their core collaborators, sometimes even two floors away. Giving me a desk meant foreclosing on another team member's potential opportunity to have a desk. In this banal calculus, status, rank, and perceived value of a researcher to the rest of the lab were laid bare. Despite the weeks of anticipation and the social significance of receiving a desk, when my desk request came through, it did not bring the sense of satisfaction I had hoped for. Daydreaming over my laptop down in the atrium, I had anticipated that access to a desk would reveal the secrets of what it means to look for vocal biomarkers, the very work of parsing through the brain pictures and the research subjects' audio recorded speech to find...something. And yet, the space assigned to me felt evermore adjacent to some main action that remained concealed to me still. My office was far from Sushant, Ralph, and Victor, who all sat together. The only team member in my office was Santiago, who I spent most of my time with, anyways, conducting brain scans. I only saw the other team members during our weekly meetings, if they came to check on Santiago and I in the control room or if we needed their assistance, during events and conferences, or in passing at the water cooler and around the ECU campus. In the days following my desk success, my mind would wander again while in the control room with Santiago. Sushant had agreed to-and even encouraged-my participant observation, and I finally had a someplace to sit on the fourth floor, but I felt as if my attempt to gain access to what really mattered in the search for vocal biomarkers had failed. I wondered if I would ever be able to observe and assist in some activity that felt more exciting, more complex, less 141 monotonous, and that would demonstrate for me how the team transformed the brain and audio data into research findings about the voice and depression. The daily activities I performed surrounded data collection, which required interacting with research subjects. In addition to piloting, I would observe the process of determining whether or not a subject was eligible to participate in the study, and observe Santiago running through the team's informed consent protocol with subjects who met the team's criteria. I would also do things that felt custodial, bordering on the domestic. I would cover the mattress foams with fresh, sterile, blue hairnets. I would dress the scanner mattress with a clean bed sheet for each research subject. I would help research subjects steady themselves onto the mattress, handing them the earbuds, the button boxes and emergency squeeze ball and arranging the wires across their abdomen and chest. I would ask them if they would like a blanket or a support pillow for their legs and clip the coil helmet in place above their brow. My fieldnotes contain pages and pages describing this sequence, which I began to shorthand as "tucking the subject into the bed." After tucking them in, I would join Santiago to troubleshoot a seemingly endless cascade of problems: with the various scanning computers, with the study's microphones and speakers, and with research subjects failing to respond to or understand Santiago's directions. At the end of each scan, I would place the used sheets, often damp with the subject's sweat, in a plastic laundry basket, and discard the hairets in a plastic trash bin. All of these things struck me as uninteresting, nonessential busy work. After all, how essential could they be to the research project, if anyone-someone with no skills, training or experience, like me-could do them? Writing on the challenges of conducting ethnographic research with the makers and stewards of technological systems, Seaver (2017) asserts that access is "not a precondition for all ethnographic knowledge" or a "perimeter around legitimate fieldwork" (7). Instead, the 142 ethnographer has much insight to gain by attending to access itself "as a kind of texture" (ibid). This scavenging method involves triangulating what is known with what remains unknowable and out of reach while reading gaps in information and doors that stay closed not as empty spaces or barriers but as ports of ethnographic inquiry. My understanding of the nature of the work available to me-and of the vocal biomarker study as a whole-slowly shifted the more I took the perspective that my position at the lower end of the research pipeline afforded. My own initial low regard of this daily domestic labor was itself a kind of meta-commentary, one that reified the hierarchy that places data analysis at the top (as highly skilled, specialized, esoteric, and valuable) and places interactions with research subjects and custodial, domestic-oriented labor at the bottom (as ordinary, banal, skill-less grunt-work that is tangential to the production of scientific knowledge). I participated in shunting it to the margins precisely by not taking it seriously enough as a zone of ethnographic significance. In so doing, I had caught myself buying into only one side of science's Janus face (Latour 1998): the side proclaiming science to be a singular, linear journey fixated on the pursuit of facts alone, rather than, as feminist science and technology studies scholars have shown, a messy, heterogeneous affair that is distributed across bodies, materials, and spaces, and one that is riddled with problems in need of constant tinkering and tending (Haraway 1988; Barad 1999). Mattern (2018) points out that attention to maintenance work is itself an act of maintenance, amplifying otherwise ignored, subaltern voices while exploring these spaces, practices and practitioners as potent sites and agents of ideological distillation that go unnoticed because they appear so benign, so passive. To take the administrative, banal, and sweaty, dirty work of scientific research seriously-work that falls under the murky and capacious rubric of care-is to 143 consider how practices of repair and error correction are not ancillary but vital to the production of scientific knowledge (Martin, Myers, and Viseu 2015: 628). The organization of space within the lab maps out social and academic hierarchies: whose expertise is deemed pivotal to the research, which components of research are the most crucial versus tangential. Who does and does not get a space on the fourth floor of the big beautiful building reifies labor relations within the lab. The less skilled and permanent your position, the more likely you are to participate in maintenance work and interact directly with research subjects, the more difficult it is to access desk space close to your collaborators. This reinforces the notion that fine-tuning, troubleshooting, and interfacing with research subjects are a peripheral form of labor, at the edges of groundbreaking scientific action. But for the vocal biomarker study, the work involved in managing the body and language of the research subject was central to the project of ensuring the speech they collected was "natural" enough, a necessary precondition to gathering (and later analyzing) any data at all. This is the work that enabled the whole study to hang together, infrastructurally and epistemologically. THE NON-NEUROSCIENTISTS When I was given permission to join the rest of the group floors above the atrium and handed a set of keys to an office, I developed a new routine. At the start of each day, upon entering the building, I would either turn right, walking across the lobby and then down a set of stairs to the Imaging Center in the basement, or turn left, up a flight of four stairs and through a winding hallway to my office. Many people in the vocal biomarker's group lab conduct research on childhood autism and dyslexia, so I'd often encounter children along my morning route. The 144 office I came to share was one of many branching off from the lab's main, waiting room area, where the young research subjects and their guardians would sit on a leather couch reading books or playing with the lab's collection of puzzles and action figures on a mahogany coffee table. Along with Santiago, I shared my office with a visiting neurologist from Argentina and Rebecca, a graduate student affiliated with the lab but not working on the vocal biomarker project. I was the only woman on the vocal biomarkers team, and I was happy to have Rebecca's company and kindness in the office. She came from a humanities background and taught English as a second language outside the U.S. for a number of years before having what she told me was a "conversion" experience that led her to pursue a PhD in neuroscience. Rebecca tried her best to explain to me the charts and statistical analyses featured in the research articles assigned in my audited courses. She invited me to attend the monthly journal club that Victor ran called SMALL (Speech Motor Auditory Language Learning), which became a resource for making sense of how Victor and the team positioned their research against or in tandem with the labs producing the articles we read. Not unlike Rebecca, the members of the vocal biomarker team had made their way to the ECU's Neuroscience Department through indirect, interdisciplinary paths. Despite their departmental affiliation, and despite the fact that during the time of my fieldwork they were either taking or teaching courses on neuroscience and brain imaging, Sushant, Ted, Victor, Ralph, and Santiago did not identify as neuroscientists. In the weekly meetings that they held separately from the wider bi-monthly lab meetings, they liked to say that they were merely people who "happen to hang out with neuroscientists." 145 Sushant began his academic career studying computer science in Southeast Asia before moving to the east coast of the United States to pursue graduate training in neuroscience, with a focus on building computational models to better understand speech motor control. For his post- doctoral training at ECU, he deepened his studies in what he calls "speech communication" while pursuing additional projects exploring potential biomarkers for mental health treatment outcomes-the subject of my initial meeting with the head of the lab, during which I learned of Sushant's vocal biomarkers project. Ascending to the position of lead scientist of the lab, Sushant now teaches ECU's only course on speech communication and acoustic phonetics, which he allowed me to audit. Acting as a PI for the team alongside Sushant is Ted, who sports a salt-and-paper mustache and a South Shore accent not unlike the one I used to have growing up. Ted is the head of a lab in an institute affiliated with ECU located in a more rural part of the state and would travel to ECU once a week to attend meetings. With a combined three degrees from ECU, he is a close colleague of Sushant's PhD supervisor and the author of a major textbook in speech signal processing. When I attended an international signal processing conference with him and the rest of the team in California the summer before the semester began, everyone seemed to know Ted. Other attendees silently approached him to shake his hand while sitting in the audience of several talks. Ted trained team members at my other fieldsites; many of them had sought out a degree or training at ECU specifically to work with him. Ralph is a lanky, sarcastic, and thoughtful advanced graduate student who Sushant and Ted both supervise. Working on the project provided a forum for integrating his two main research interests: speech production and cognitive and emotional states. Victor once remarked to me that, to Ralph, every conceivable human or physical phenomena could be distilled into a 146 signal. Ralph himself told me that he believed all components of existence-human life, the formation of the universe, neuronal mechanisms-were governed by the same, basic universal laws of physics, which themselves could be distilled and described through the equally universal, transcendent, language of mathematics. Hence speaking, as well, was a physical process that could be apprehended through recourse to mathematical processes, operationalized in the form of an algorithm. Three years behind Ted in the PhD program is Victor, a California transplant with an approachable demeanor and a gift for putting research subjects at ease, perhaps due to his background in speech pathology. For his undergraduate and master's training, he combined neuroscience and linguistics with an emphasis on phonetics and speech production and perception, specializing in stuttered speech. Team members like Victor, with his more "clinical" or "applied" training, were more likely to have experience working not just with human test subjects, but with humans as clients or patients. Santiago is the youngest member of the group, a first-generation American whose family emigrated from Central America to the States when he was a toddler. He had just completed his BS in neurobiology at a nearby university, and the lab tech job offered a means through which to gain greater research and management experience. He hoped this might position him to pursue a career as a programmer in the biomedical sector. As a lab tech, he did the bulk of the hands-on work necessary for the research. It was primarily his job to respond to research subject recruitment emails, and to review the team's human research subject protocol with potential research subjects to help them determine if they would decide to officially participate in the research (a process the team called "consenting," since the form the subject signed at the end if 147 they agreed to participate is called an "informed consent form"). Finally, Santiago's main job was to scan research subjects. The group saw their non-traditional status as an asset rather than a hindrance to their research. No one on the team had any commitment to the causal models of psychiatry, or even biomedicine. They were less interested in causality and more interested in gathering data in large enough volumes that would enable them to pick out patterns that might otherwise go unnoticed. According to Sushant, the group could "hack a solution" to the lack of robust, biologically based diagnostic markers through an "engineering approach." An engineering or "computational" approach implies a commitment to agnosticism, a willingness to approach or interpret a problem in a radically unexpected manner. For instance, Sushant explained that he was open to believing that the number of times a person touches their nose might be a biomarker for mental illness, a sign that is linked to and the byproduct of a psychopathological process, despite the fact that nose touching bares no conventionalized resemblance with any behavior that has anything to do with mental health. Hence, Sushant's phrase that they were going to "hack" a solution. Hacking in engineering parlance implies figuring out how a technical system works so that "it can be made to perform in previously unintended and unforeseen ways" (Jones, Semel, and Le 2015: 324). Sushant and his team wanted to observe, modify, and rearrange the standards of research on "neuropsychiatric disorders" (a word they used interchangeably with "mental illness") working outside of the traditional boxes and categories of the DSM, studying sounds of the voice and the brain rather than a patient's description of their symptoms. Walking through the professional pathways that led researchers to the vocal biomarker group provides a backdrop for making sense of their study's intervention, in terms of the models of language, body, and mind that their work was committed to and reified. While team members 148 come from backgrounds in linguistics, communication science, mathematics, and engineering, no one on the team had training in psychiatry or experience in mental health care professions. They had never treated or conducted long-term observation of people diagnosed with the disease category they studied: depression. The team did have a psychiatry consultant, affiliated with a local teaching hospital and available for assistance and commentary. This individual helped the team develop their study's inclusion and exclusion criteria and advised them in selecting the screening tools they used to determine who was eligible to participate as a research subject. At the same time, if spatial proximity to the rest of the team is an indicator of a team member's value, the consultant's expertise was additive, rather than central. The consultant was never physically present in the offices. Taking this into consideration, in the following section, I explore what it means for people with no background in psychiatry to define and conceptualize "depression" as a coherent disease-state, describing the team's procedures for evaluating a potential research subject's eligibility to participate in the study. DEPRESSED? Research subjects seeking involvement in the study must first pass through a screening procedure. Under the supervision of a researcher, they fill out psychiatric inventories designed to determine how closely they approximate criteria for different diagnoses in the DSM. The collection of scores they produce determines their eligibility. Many of these inventories-like the Hamilton Depression Inventory (HAMD) and the Beck Depression Inventory (BDI)-were developed during the time period discussed in Chapter 1, at a moment of sweeping reform across American psychiatry aimed at overhauling and "scientizing" the discipline by rendering mental 149 illnesses into more stable, bounded, and quantifiable objects. Researchers initially developed inventories to achieve this feat of stabilization, and the vocal biomarker team uses them in a similar way. But the ethnographic record has shown that, although institutions and individuals in positions of power have used DSM and its interlinked inventories as if they were classificatory field guides, these tools create groups and kinds of individuals rather than map onto groups and kinds that already exist. DSM-directed screening and diagnosis construct likeness, rather than identify an essential, a priori likeness (see especially Hacking 1986; Young 1995; Luhrmann 2001; Lakoff 2006; Conrad 2007; Metzl 2010). Their use as neutral classificatory apparatuses erases the particularities of the populations and milieu from which they were developed, which challenge the universality of their application. For instance, while HAMD is a tool that clinicians use with many populations, the inventory was developed based on studies of cohorts of mostly male and entirely white research subjects living in psychiatric hospitals in the 1960s and 1970s (Williams 2001; Worboys 2012). The vocal biomarker team also treats inventory scores as indicative of the subject's psychological state: either they are depressed enough to participate in the study, or they are not depressed enough, or they fall into a diagnostic category that fits the study's exclusion criteria. As I will show in this section, the vocal biomarker team's use of these inventories to screen and consent their research subjects constitutes a "trial of qualification" (Callon et. al. 2002). By determining what counts as depression in the context of their own study-even as they are humbly honest about their lack of expertise in psychiatry-the team defines the boundary of the diagnostic category. The consenting procedure ratifies the qualities and criteria of being "depressed" established in DSM, even though their study takes place beyond the bounds of a 150 typical clinical interaction, and even while their ultimate goal is to develop methods for diagnosis without recourse to conventional, DSM-derived descriptions. When I was first added on to the team's IRB protocol in the capacity of a research assistant in the summer of 2015, Ralph was excited at the prospect of outsourcing to me all of the tasks that consumed most of his time yet required the least training and skills. One such task was the consenting of research subjects: reviewing the team's ethical protocol for conducting research with human subjects, ensuring that the participant understood the risks and benefits of participating in the study, and administering and scoring a stack of psychological inventories. Ralph insisted that I sit in on and observe him consenting research subjects as part of my apprenticeship, so that I could eventually practice by pretending to consent him and then move on to consenting actual research subjects on my own. This end stage never arrived. By fall 2015, Sushant had hired Santiago, and this job fell under his purview. Not long after I had agreed to my apprenticeship, I followed Ralph up to the fifth floor and waited with him for the day's potential research subject in a long and narrow office that was locked from the outside. When the subject-a white woman with square glasses-arrived, Ralph let her into the office and then to an attached room, beckoning me to join them after she agreed to let me observe, "for training purposes." The room that the team was using to evaluate subjects was smaller than the main office and much more cramped, with no windows. Ralph and the research subject were huddled around a meager table that hit Ralph's legs at his bent knees. The walls of this room were painted white, but someone had cut out shapes-stars, spirals, circles, and diamonds of varying sizes-from pastel colored construction paper and taped them to the walls, maybe to make the room less stark and a little more inviting for younger research subjects. Ralph gestured to a metal chair positioned behind him. I sat down and found that Ralph's 151 shoulder obscured the research subject's face and the forms she would be filling out from my view. Despite the room's crowdedness and the hot summer day outside, the air conditioning gave me goose bumps. The paper shapes shuddered in the artificial breeze. Unlike other biomedical research conducted at universities, ECU has no medical school, so there was no pool of research subjects to draw from who had a clinical diagnosis of depression or who were being treated for depression. Instead, the team opened the recruitment pool up to anyone who self-identified as having depression (or, for the controls, anyone who identified as not having any major psychological illnesses). They sent recruitment emails through ECU's Neuroscience Department research subject list-serv or posted announcements on job opportunity websites like Craigslist. Subjects were compensated in cash both for their participation in the research and for the time it took (usually an hour) to be evaluated and consented, so participation was marketed as a form of short-term employment. In addition to these methods, everyone on the team-myself included-carried a stack of flyers in our bags, to post around the ECU campus and the surrounding areas. The flyer was succinct, inquiring in bolded, black, script, DEPRESSED? along with brief information about the study ("using the voice to understand the mind"), that it was a paid opportunity, and the team's contact email. Their recruitment tactics left it up to the potential research subject to self-identify as having depression or not, banking on the circulation of "depression" as a legible, psychopathological state (see Martin 2007). Only when the research subject had made it as far as this woman with the glasses-contacting the team, scheduling a time to come in, traveling to ECU-did the difference between self-identified "depression" and DSM-identified "major depressive disorder" (MDD) begin to matter. 152 First, Ralph walked the woman through the study's consent form, reading it aloud to her from his own copy as she followed along on hers, pausing to ask if she had any questions (she had none) before moving on to each section. After she initialed and signed in all the right places, Ralph opened up a laptop computer and used Audacity, an open source audio-recording software program, to record her reading "The Grandfather Passage" out loud, a public domain text that is frequently used to gather a speech sample. Like the pa-ta-ka exercise conducted from within the scanner, a speech pathologist wrote The Grandfather Passage to contain almost all of the phonemes of American English 3 3. The team collected this audio recorded speech without a clear plan on what they wanted to do with it or what significance it might hold for the rest of their study, though there was frequent talk of comparing subject's in-scanner speech with their outside-the-scanner speech. After the Grandfather Passage came the Mini-Cog 3 4, a test typically used to assess for dementia and Alzheimer's that the research team was using to evaluate the cognitive processing abilities of the research subject. Next, came the psychological inventories: The Beck Anxiety Inventory (BAI); the Beck Depression Inventory-I (BDI-II); the Bipolar Self-Test Mood Swings Questionnaire (MSQ); the Yale University PRIME Screening Test for psychosis; the SAGE Scales; the Quick Inventory of Depressive Symptomatology (QIDS-SR); Patient Health Questionnaire version 9 (PHQ-9); Generalized Anxiety Disorder version 7 (GAD-7); the Screening for Obsessive-Compulsive Disorder; and the Snaith-Hamilton Pleasure Scale (SHAPS). 3 The team used two other passages: the Rainbow Passage and the Caterpillar Passage. All three passages varied in length and reading level. Researchers interchanged the passages assigned to subjects but typically went with the Grandfather Passage because of its duration and its mid-level difficulty. " For the Mini-Cog, sometimes referred to as "the clock test," the researcher first asks the subject to repeat five words, then asks them to draw the hands of a clock at 10 past 12 on a paper with a large circle on it. After they finish drawing the clock, the researcher asks the subject to repeat back the five words. The subject is evaluated on their ability to remember the words and on their ability to draw the clock. 153 Inventories are a kind of survey and can be grouped into two categories: clinician-rated (which are filled out according to a clinician's interpretations of the semantic content of the patient's answers) or patient self-rated (which the patient fills out themselves according to their own assessment). Following the trend of their recruitment strategies, the team relied on self-rated inventories. They left it up to the research subjects to evaluate their own symptoms, interior states, ability to experience joy or feel a lack of motivations to pursue things that bring them pleasure, and so on. Recall that the diagnostic criteria and diagnostic categories of DSM are likewise based on outwardly observable behaviors and self-reported symptoms, rather than the causal mechanisms that drive or lead to mental illness. In both the DSM and in the context of the vocal biomarker study, then, the agency of the research subject still plays a central role, although the premise of vocal biomarkers subverts the agency of speaking subjects and severs the connection between evaluation and expression. The woman slid the inventories back to Ralph across the table, one by one, as she filled them out. When she handed him her completed BDI-II, Ralph glanced over it and solemnly slid a different sheet of paper back in her direction, a form titled "Community Resources for Psychological Treatment." This document-the only one in the stack that the vocal biomarker team had created on their own-listed contact information for local and national suicide hotlines, sliding-scale community mental health centers, and hospitals with emergency psychiatric units. Ralph bowed his head and whispered, "I'm sorry." I was familiar with the structure of the BDI- II, and so although I could not see the woman's response I knew that this meant the woman had given a score higher than zero for question 9, "Suicidal Thoughts and Wishes," thereby indicating that she had suicidal ideations or urges. No one on the research team is a medical doctor. Ralph was in no position to officially diagnose the woman. It was prohibitive to make a 154 medical referral, or to notify a medical professional of the woman's responses to this question. To do so would be to potentially incur legal liability on behalf of the team and ECU, and the university's institutional review board required the team to make it exceedingly clear that they could not diagnose or provide medical care for research subjects. After all of the inventories were complete, Ralph guided the woman down to the lobby on the first floor to the administrative office, where she was to receive payment for her hour spent with us. I watched her and Ralph exit and was filled with a sudden somberness. I remembered that there were people, and lives, and suffering at the front end of the data pipeline, that the research participants were not only repositories of data or the hosts of speech-making brains, but also human beings who were potentially in unspeakable pain. Their participation in the research could not provide them with much aside from the knowledge that they might, if they qualified for the study, give back to society in some meaningful way by contributing a few hours of their bodies, thoughts, and sounds to science. I wondered if Ralph felt this same weight, or if he had grown accustomed to it or simply couldn't afford to be held back by it. Evaluating subjects through the administration of psychiatric screening inventories without being able to help them was an uneasy but necessary step toward building the study's experimental cohort. In the absence of a channel for providing a legally and/or medically sanctioned intervention, the team had put together the Community Resources list for the sake of the research subjects. This was a care-ful practice: careful to not cross the line into legally unacceptable territory of "care." Providing the paper was not a ratified form of psychotherapy or treatment. Still, it was care-like: a list directing her toward treatment, an action attentive to the anguish the woman had expressed through a number. A small gesture- a recognition, a bow of the head-but a gesture nonetheless. 155 Ralph returned to the office with the empty cubicles. I sat without speaking, listening to him type in and tabulate her scores while listlessly staring at my own laptop. I thought about what it would be like to see her again, if I would soon be tucking her into the scanner bed, offering her a blanket. Ralph cut the silence to announce that she did not qualify for the study. According to her scores, she showed signs of psychosis and of cognitive deficits, two of the study's exclusion criteria. She would also have been excluded if she was under 18 years old, if she showed signs of obsessive-compulsive disorder or bipolar disorder, if she had hypothyroidism, or if she failed to score a BDI-II of at least 14. This would indicate that she was not depressed "enough." In my earliest meetings with Sushant, he identified the vocal biomarker group's research as RDoC-worthy because of its commitment to exploring markers of pathology with a biological anchor, "using the voice to understand the mind" rather than, for example, using the voice to understand depression, or studying slurred speech as a symptom of MDD. RDoC encourages researchers to group together research subjects who might not have been included in the same cohort, with the assumption that DSM categories arbitrarily and perhaps even incorrectly group together people who share no actual biological likeness and in a way that might make biologically significant and biologically based findings impossible. But even as Sushant and others asserted that their research contributed to efforts to move beyond the social conventions of biomedical research on mental illness and identify biologically anchored signs suggesting the presence of mental illness, they still had to make use of tools that categorizes subjects according to traditional criteria. Conventional psychiatric inventories determined who would make up the experimental cohort, and in this way, they shaped the findings that the team would eventually produce. 156 From a theoretical standpoint, the vocal biomarker group treated MDD as a brain disorder, a pathology that impacts neuronal the circuitry and functioning. But when it came to gathering together a group of research subjects to study, they treated depression as something existing at the level of self-reflexive interpretation, caught up in culturally-specific ideas about behaviors and states of being that are either normal or pathological. The team was well aware of this contradiction. They recognized that to try to empty "depression" of its cultural significance by, for example, using a different word in their recruitment strategies, would be to risk missing out on research subjects. While Sushant appreciated and supported efforts to move away from DSM like the RDoC project, he also recognized that DSM was tied up with and embedded in the bureaucratic infrastructure of America's health care system. In our many conversations about the feasibility of RDoC, Sushant would often note that a complete rejection of DSM would require revamping the mechanisms through which health insurance companies cover the cost of patient care. And finally, Sushant knew that subjects were hard to come by. They had to be depressed enough to meet the team's inclusion criteria, but not so depressed that they were unable to rouse themselves from their houses, travel to ECU, and then lay confined in the scanner for 3 hours. Even when they made it to the scanner, much of the data captured during the scan was unusable. The subject had fidgeted, or they spoke too softly, or Santiago and I had installed the microphone incorrectly, and so on. Epistemological hopes aside, at the end of the day, Sushant told me, "we have to start somewhere." The vocal biomarker team's dilemma-their desire to move beyond DSM yet DSM's necessary role in their study-speaks to the bumpy road that ambitious projects like RDoC must encounter, and to the tenacity of the bureaucratic and sociocultural infrastructure built around DSM. The team could support the aims of RDoC but needed to contend with the social life-and institutional power-of diagnostic categories. Moreover, in 157 assembling their experimental subject population, researchers were given a first-hand encounter with the lives at stake in the interlinked epistemological and bureaucratic mental health care reform efforts like RDoC. In the mundane task of handing out and scoring the low-tech, pen-and- paper inventories of an era of American psychiatry in its twilight hours, researchers like Ralph were constantly reminded that data points are people, and that innovation sometimes intersects with human suffering. SOUNDSFUNNY If a research subject qualifies for the study, they schedule a time with Santiago to return to ECU, this time to the Imaging Center in the basement, for their brain scan. I revisit the process of scanning from the other side of the control-room window, exploring scanning as a social activity. I recount a particularly disastrous-and hilarious-scan in order to demonstrate just how many factors can interrupt the process of gathering speech and brain data. One of the most persistent challenges is the embodied individuality of the research subject. Assisting and overseeing brain scans is a rite of passage for researchers within the vocal biomarker group's lab. It is also a kind of ritual enactment that separates experimenter from experimental subject, distinguishing the mentally ill from the (relatively) mentally well. Researchers who sat in the control room and orchestrated the scans could often also act as controls for the study, as members of the unmarked, non-depressed category. To qualify as a control for the vocal biomarker study, researchers cannot have a neuropsychiatric disorder, especially a known diagnosis of MDD. Team members disclose their mental health status to each other through their status as a control, sometimes coyly, sometimes bluntly. One of the first questions Ralph asked me when I tried to join the team was whether or not I am mentally ill. I 158 wondered if the reverse ever occurred-if a researcher would serve as a research subject in light of their diagnosis. This felt too taboo to ever ask out loud, like it would threaten the neat divide between object of scientific inquiry and agent of scientific study. For safety's sake, two researchers must be present at every scan. It is their dual responsibility to double-check each other and stick to protocol, ensuring that the research subject has removed all ferrous metal on their bodies before going in to the scan room. Since the scanner contains a giant, powerful magnet, ferrous metal outside or inside of the body poses a serious safety hazard. In the ominous words of the resident fMRI safety officer, if you don't look thoroughly enough for metal on the subject's body, the magnet will find it for you. Participants with non-removable metal like aneurism clips or tattoos that contain iron (a once-common ingredient in red body pigment) are not "MRI compatible," and cannot be scanned. The two- body requirement was also meant to protect the safety of researchers. Before this rule was put in place, there were several instances in which a research subject assaulted a researcher. A two-tiered hierarchy of scanning responsibilities dictates how much a researcher can interact with research subjects, dependent on a researcher's technical knowledge of brain imaging and computer programming. Each tier is associated with a different colored laminated badge: yellow or green. The badges hang from lanyards that researchers must bring with them to every scan, and the back of the badges is printed with the telephone numbers of the safety officer, universality facilities, and other helpful emergency contacts. "Yellow badge" is the lower level, conferred upon a researcher after they have taken and passed an afternoon-long MRI safety course. A yellow badge grants the researchers the ability to assist with scans or to at least be present in the control room during a scan. A step up from the yellow badge is the "green badge," the higher level, secured after additional training and another test that covers broader topics, 159 including image reconstruction software and human research subject protocol. Green badges oversee the scan, re-consent research subjects and read the task directions to them during the scan. Only a green badge can control the computer that controls the scanner. There has to be at least one green badge in the room for every scan, and if no green badges are available, the scan is canceled. Two yellow badges cannot conduct a scan alone. With my yellow badge, I accompanied and assisted Santiago (a green badge). By "yellow badging" every scan-being the second body in the room with a green badge-I freed up the time of other researchers. Yellow badging is highly undesirable work. All that you can do in the scanning room is check your email, eat, chat, maybe watch video clips or check social media, and hope that nothing goes wrong. The fits and starts of a scan made it difficult to engage in any other activity that require sustained attention, like reading course materials or jotting down fieldnotes. If and when there was a problem with the scan, Santiago and I would try to resolve the issue as quickly as possible. Otherwise, my tasks were limited to tucking the subject in and cleaning up, along with basic data entry: typing in the research subject's anonymized ID number before each task and pressing a single key on a laptop to initiate the task program. While conducting brain scans played a central role in the study, it was also an enormous source of error. Conversations on how to identify and address errors around brain imaging or sound recording dominated the group's regular meetings. The fact that speech recorded outside the scanner during the consent and evaluation (when subjects read the Grandfather Passage) tended to be clearer and required less pre-processing than the speech uttered while inside the scanner. A handful of times, Santiago and I accidentally recorded our idle chitchat inside the control room rather than the subject's speech from inside of the scanner. Or, I failed to save a recording, or he forgot to check in on a subject executing a task or remind them to remain still. 160 Every now and then we would get a cross email from Ralph, informing us that we had sent him a silent audio file. The act of speaking itself could be a source of error. Most of the tasks required subjects to read sentences out loud, repeat nonsense words that other lab members had created, or sustain vowel sounds and consonant pairs, usually for no more than 3-5 seconds. Despite the short length of oral speech each task required, the location of the articulators relative to the brain in the skull meant that participants moved the position of their head subtly as they spoke, which blurred (or created artifacts) in the final reconstructed brain image, ultimately lowering the accuracy of their findings. In their pursuit of "natural" speech, the researchers tried to create an environment in which the subject could hear the sound of their own voice during the scan. 35 Theteam programmed the scanner to pause for a few seconds during the time that the subject was executing the verbal task, hoping as well to eliminate sounds of the scanner from the audio file and ensure that the subject's voice was recorded as clearly as possible. But the sparse paradigm lowered the sampling rate of the brain images captured, resulting in a lower-resolution image. After around four scans together, Santiago and I fell into a rhythm. We began to anticipate each other's reactions to the long and illustrious list of things that could go wrong. Sometimes, when we were especially unlucky, a number of problems would arise at once. On one such day, Santiago was in an uncharacteristically sour mood. He hadn't had time to grab lunch before our noon scan, the second one of the day, and he only had a power bar to get him through the next three hours. Because I was only a yellow badge, he couldn't leave me alone with the subject in the scanner to retrieve more food. Santiago was also in a bad mood because 1 According to speech and communication science, being able to listen to one's own speech plays a fundamental role in speech production. It enables the speaker to constantly adjust and re-adjust the sounds that they produce in a cybernetic feed-forward loop, a phenomenon that was the subject of Victor's dissertation. 161 we had been met by a number of setbacks, before and after the subject-a towering man over six feet tall-had arrived. As had been the case for the past two months, the microphone was giving us trouble. Though the microphone that offered the best sound quality and that had been expressly designed for the study was currently at its manufacturers for another round of expensive repairs, the second-tier replacement microphone was not working, either. After running the mike wire into the scanner room through the copper panel in the control room underneath the desks holding the three computers (to ensure that the mike would not conduct additional radio frequency into the scanner room) I had sat on the scanner bed in the same position the research subject would assume, while Santiago stood watching the Audacity interface and listening for the sound of my voice through the speaker with the soundproofed door closed. The time we could've spent buying lunch and bringing it back down into the basement dwindled away. Audacity was picking up the sound of my voice. The purple waveform representing the words I spoke into the microphone in the scanning room, "test test, seven six five four three two one," showed up like a long, fuzzy caterpillar inching across Santiago's screen. But only silence came through the speakers, which Santiago tapped over and over again while giving me the signal to speak. In the end, we resorted to the third-tier microphone, the electrostatic mike, a piece of equipment that Sushant had built while he was a graduate student in the lab. The mike was reliable but, as Santiago griped, had shitty sound quality. It did a poor job of capturing the participant's voice as they completed the speaking tasks, which meant that Ralph would not be happy with the audio we captured. When the participant arrived, we went through the steps of preparing him for the scan. After Santiago re-consented him in the waiting room area, demonstrating the spoken tasks to him 162 and asking him to change out of his clothes and into a pair of scrubs, Santiago stood with him at the door to the scanning room and swept a handheld metal detector over his body. He joked that we were making sure that the subject was "ok to be let into the club," pretending to be a bouncer scanning the subject's body for weapons, and pretending that participating in the scan might offer the same kind of fun and excitement as a nightclub. After Santiago and I patted down our shirt and pants pockets to ensure we had no metal items on our bodies, we led the participant into the scanner room. I helped him settle onto the mattress, handing him the buttons and wires, showing him how to squeeze the soft foam ends of the ear buds so that they could fit snug into his ear canal and then expand. When Santiago ran back to the control room to test if the participant could hear him through the ear buds, the subject told us he heard nothing. Santiago could not hide the frustration on his face. For the next thirty minutes, we took turns hustling back and forth between the scanner room and the control room, prompting the research subject to speak while the other one of us listened from the other side. We gave up and called the safety officer, who asked Santiago, half- serious, "what did you screw up this time?" Santiago, who liked to assure me that he was not superstitious, also liked to talk about things being cursed: we must be cursed, this participant was cursed, the expensive microphone was cursed, the scanner reconstruction computer that crashed while I fetched the electrostatic mike was cursed. Today, he told the safety officer, was a cursed day. By the time I had retrieved the electrostatic mike, tested to be sure it worked, let the participant out for a bathroom break, and tucked him back in, Santiago was even more hungry and on edge, waiting for the next curse to reveal itself. It manifested in a comedy rather than a catastrophe. 163 Three out of seven tasks in, it was evident that the microphone was working but the subject's baritone voice was too quiet. The waveform in Audacity was thin and neat, which meant his voice would sound faint and indistinct on the audio file. As we prepared for the next task, Santiago once again prompted him, speaking through the small mike that rested on the desk, "just make sure that you're speaking nice and loud so that the mike can pick you up, ok?" "Ok," said the participant, tired and lackluster. Santiago and I had been in the middle of discussing, of all things, the Jewish tradition of bat mitzvah, a custom he was curious about but unfamiliar with. We would pause and resume our conversation according to when he had to talk to and check in on the participant. I had grown accustomed to the stop-and-go cadence of our talk; we could pause and take up the same topic again after stretches of interacting with the subject through the microphone. Santiago once again pressed the button to turn the mike on, the sign for me to stop talking. He began reading the directions for the next task that Victor had written: In this task, you will say the vowels that appear on the screen in different pitches. The vowels are ahh [/a/], ee [/i/], and ooo [/u/]. When you see the vowels on the screen, you will also see a pitch word, either high, normal, or low! When the text turns green and you see a green triangle appear on the screen, begin saying the vowel with the pitch indicated on the screen. Please hold the vowel for a few seconds until the triangle disappears and the next instructions come on the screen. Does that sound ok? The subject, blandly, said yes. Santiago switched off the mike and asked me if boys and girls became bar or bat mitzvah at different ages. I entered in the subject's ID number, pressed the space bar to initiate the program that controlled the tasks and displayed the power point slides to the participant, and explained to Santiago that I had become bat mitzvah, with my sister, around age 12 or 13-1 could not remember. The scanner pulsed around 10 times in rapid succession before pausing. Santiago asked me if my twin sister and I had our bat mitzvah celebration at the same time and pressed the speaker button to check in on the subject. I was going to answer him, 164 but the subject produced such an arresting sound that I stopped in my tracks. "Ee!!!" he cried out, in a short, high-pitched burst. Santiago quickly switched off the speaker to cut off the blaring of the scanner and we erupted into wordless, breathless laughter. As we laughed, he pressed the button again in time with the scanner pause. "Ahh!" said the participant, with his normal pitch and subdued volume, as if he had just taken a sip from a refreshing drink. We continued laughing and wheezing, slapping our knees, still unable to talk as Santiago once again turned on the speaker. "Oo" said the participant, in a high-pitched voice like something had startled him. I tried to regain my composure and Santiago remarked, between gasps of air, "I don't even want to hear his ah again." On the one hand, the participant had exceeded our expectations, and was excelling at the task in terms of pitch modulation, especially given his otherwise deep, dull voice. On the other hand, he was executing the task incorrectly, failing to sustain the vowel sound for long enough, and producing instead a too-short burst of sound. The combination of wide pitch variation and short burst of sound surprised us. "I haven't heard that before," said Santiago. "It was a good little twist." It was so unlike any interpretation of the task we had heard out of the 20 or so subjects we had scanned in total at that time. It was also unexpected because of our expectations of how the subject would sound-his voice otherwise devoid of emotion-and given our initial read of him combined with his gender presentation-a large statured, cis-gendered, manly man. Even if the subjects stayed entirely still for the entirety of what Santiago called a "ridiculous long" scan, there was no guarantee that they would execute the task in the way that the team wanted. For the two tasks that required pitch modulation, many subjects tended to only raise or lower the volume of their voice. It did not help that the prompts flashing on the screen only said "high," "normal," and "low." The spelling of one of the pitch modulation tasks prompts 165 also confused research subjects. One of the sounds they were supposed to sustain was /u/ as in "shoe," but the prompt spelled this sound "ohh." As a result, many research subjects produced the sound /o/, as in "show." Santiago himself had made this error when piloting the scan for Victor and when acting as a normal control. No one corrected him. By the time Santiago and I realized how subjects were interpreting the prompt, Santiago worried that correcting the slide to ensure participants produced /u/ instead of /o/ might introduce some unanticipated, unwanted variation into the dataset. Every now and then, subjects continued to make errors even after Santiago gently pointed the error out and demonstrated once again the desired way to execute the task. Some subjects, I considered, may have done this on purpose. Maybe they were pointedly refusing to perform the task according to Santiago's directions and committing themselves to making whatever sounds they chose in the scanner. This might be a means through which subjects could take control of the scan and subvert their position as a body-as scientific material-in the team's experimental system. They could collect their payment at the end of the scan while leaving the team with unusable data, suffering no consequences for their recalcitrance other than a stiff neck and a few hours of supine discomfort. For the vocal biomarker team, conducting a scan amounts to a negotiation, a push-and- pull that ends in compromise. There was a constant slippage between the team's ideal-typic conceptualization of the vocal qualities that they wanted the subject to reproduce, and the subject's subjective, embodied articulation of that quality (Chumley and Harkness 2013). The tasks were supposed to be devoid of linguistic meaning. The team had designed the tasks in order to activate regions of the subject's brain associated with speech motor control, and they hoped the tasks would escape entanglement with the subject's own ideas about what they meant. But 166 the funny sounding man confronted Santiago and I with the excess of his interpretation, and our own interpretation of him. After all, the subject's novel execution of the task was not the only thing that made the episode so hilarious and so compelling despite its brevity. It was funny because of the incongruence between what we read as the subject's identity and how his voice sounded. A large, baritone-voiced man emitting a high-pitched sound brings normative expectations about masculinity and language rushing into the room (to flip them on their head). Santiago's job in this instance, as with all other scans, was to redirect the subject's verbal performance in an effort to standardize it, re-rendering the research participant into a body, a sounding object rather than a speaking subject. But the subject, however inadvertently, reinserted his personhood, resisting being formatted for the sake of the study. His funny sounds reminded us that he was not just a body but a person, and that even when speech is just a sustained vowel sound, it is still up for sociocultural elaboration. HOW TO DO THINGS WITH WORDS Where do researchers' theoretical models of and technical protocols for studying speech hail from? I had initially assumed that the researchers would take a Saussurean structuralist approach to studying human speech communication. This is in part because linguistic anthropological scholarship since the early 1970s, including texts that survey the history of the sub-field qua the history of linguistics in North America (Duranti 2003; Mithun 2004), situate Saussurean models of language at the normative, hegemonic pole against which language ideologies prevalent elsewhere are compared (Silverstein 1998; Duranti 2004). Saussurean linguistics, in this 167 literature, is foundational to patently "Euro-American" language ideologies or even "language science" writ large (Silverstein 1979). I had thus expected to encounter Saussure in some form or another while working alongside this group of non-neuroscientists studying speech. In Saussurean linguistics, "language" or langue is the invariant, conventionalized correspondence between sound-image and meaning. According to Saussure, the role of the linguist is to study the ordered and systematic structures of this relationship, rather than study "the mechanical, voluntary, accidental, and variable realizations of speech," orparole (Caton 1987:225). For Saussure, the primary purpose of communication is propositional-humans use language to referentially map out reality (Caton 1987:231). In the United States, Bloomfield and Chomsky further solidified the dominance of this model by insisting that linguistic "competence" was the proper target of study over linguistic "performance" (Hymes 1964; Hymes [1973] 2001). They rendered the Saussurean model of language even more cognitivist by replacing structuralist systems of categorization with "grammar," and claiming that the potential for all humans to acquire language-i.e., the potential to learn how syntactically and semantically to cut up the world-originates from an innate, biological apparatus, the universal grammar apparatus. The innovative move of linguistic anthropologists in the late 1960s and early 1970s was to de-center langue and linguistic competence and spotlight the importance-if not the dominance-of practice, culture, and variation in language. For Dell Hymes (1964) and subsequent students of the "ethnography of speaking," this meant arguing that speaking and communicative practices- the domain of Saussurean parole-arei n fact cultural activities, and part of the "social fact" (to use Saussure's words) to be studied by social scientists. For Michael Silverstein and his students, this meant arguing that semantics and pragmatics "do not form an opposition" showing instead, through ethnographic examples, that "semantics is a narrow domain of pragmatics"-that even 168 the seemingly durable rules of grammar can be shaped, warped, patterned, and influenced by cultural practices, institutions, power, and sociopolitical concerns (Harkness 2017:478). Curiously, like the linguistic anthropologists writing against Saussure and Chomsky, my informants were also focused on the acts and processes of speaking, but in a way that did not center on speech as a sociocultural activity. In the vocal biomarker group's weekly meetings, no one ever discussed meaning or talked about language as a system of signs. If anything, meaning was a problem to avoid, like in the creation of nonsense words used in one of the verbal tasks. The words had to be believable as lexical items in American English (recognizable as a word rather than a sound) but unrelated to existing words that might hold some kind of emotional resonance or stir the thought or memory of research subjects in an unintended way. Rather than talk about meaning, the vocal biomarker team talked about the muscles and networks in the brain responsible for speech. For instance, during a meeting in July of 2015, Ralph took to one of the floor-to-ceiling white boards on the walls of the empty classroom we were congregated in to draw out a vast diagram of the potential neurobiological source of pitch control. He sketched a cross-sectional outline of a human head, complete with a simple drawing of the brain, the teeth, tongue, oral cavity, pharynx and larynx. He pointed to the diagram's neck to indicate the location of a muscle in the larynx (the cricothyroid muscle), which is responsible for tightening the vocal chords and controlling the flow of air expelled from the lungs during speech. With electric excitement in his voice, he told us that the cricothyroid muscle "is innervated by a nerve. If you follow that nerve further you hit the neural cortex, and the connection can get you all the way up to the limbic system," a system that depressive conditions have been theorized to impact. Ralph was convinced that this connection suggested that slight 169 changes in the pitch of speech might be the outcome of changes in the limbic system, which might be due to depression. At the time, I was carried along and fully convinced by Ralph's line and the route it took us through. Looking over my notes in my office days later, however, I was perplexed. The logic seemed sound: if the brain and the process of producing speech are connected, then the sonic contours of spoken utterances can be connected back to the brain's activity. Yet it was all still so strange, so alien to me. Where was language-and signification-in Ralph's line, and in my informants' conceptualization of speech overall? I decided to show the team an iconic figure from the opening pages of Saussure's course in general linguistics as a kind of projective test. The image I selected displayed two heads facing each other, one of their mouths opened and the other's closed, with a dotted and solid line looped between them. Saussure calls this interaction the "speech circuit," consisting of (A) phonation (orally producing speech) and (B) audition (listening to and processing the meaning of speech). On occasions when I found myself alone with Ralph, Victor, and Sushant, I presented the diagram and asked them what it was showing, what they thought of it, and if they would make any changes to it. The three men found the diagram more or less unproblematic, even satisfactory, although they were unable to identify its origin. I showed it to Victor once as we sat in one of the plush chairs in the lobby, avoiding the high afternoon sun and debating on whether we should attend a lecture about olfaction in rats that we were already late to. Victor chuckled at the image and 170 responded in an ironic tone, as if the answer were obvious, "ooooh, that looks like human speech communication!" Upon further prompting, he described what he saw: two people are interacting with each other and it looks like-it's showing they are producing oral sounds, at each other, and then there are other lines that hit both their ear and their brain; probably the brain is involved in both the oral and the aural aspects of it, so the hearing and the production side. Yeah. But yeah I [think] it's from like Denes and Pinson because that's a book, Speech Acts, that has like these, you know, schematized pictures of like, what speech communication looks like. I think that's perfectly good, like is, as simply as it needs to be, it's like, two people speaking to each other, if one person uses their brain to produce something, the other person is using their brain to interpret what they heard. To my surprise, his explanation was similar to Saussure's 36 with a few caveats, although Victor never mentioned his name. Later in our conversation, Victor told me that he would add an additional, "side" loop to the circuit connecting the speaker's own mouth with their own ear, to indicate that the speaker listens to their own speech and adjusts it accordingly. With this added feedback loop, the diagram now began to resemble one found in a book that Victor, Ralph, Sushant, and Ted had all suggested I read in order to learn more about speech communication, the same book that Sushant assigned as supplementary reading in his speech communication course: The Speech Chain: The Physics and Biology of Spoken Language. Written by Peter B. Denes and Elliot N. Pinson and published through Bell Laboratories in 1963, I suspect that The Speech Chain is the book Victor had actually been referring to in our conversation, though he seemed to misremember the title of the book as Speech Acts. Victor's (potential) 36 "Suppose that the opening of the circuit is in A's brain, where mental facts (concepts) are associated with the representations of the linguistic sounds (sound images) that are used for their expression. A given concept unlocks a corresponding sound-image in the brain; this is purely psychologicalp henomenon is followed in turn by a physiologicalp rocess: the brain transmits an impulse corresponding to the image to the organs used in producing sounds. Then the sound waves travel from the mouth of A to the ear of B: a purely physical process. Next, the circuit continues in B, but the order is reversed: from the ear to the brain, the physiological transmission of the sound-image, in the brain, the psychological association of the image with the corresponding concept. If B then speaks, the new act will follow - from his brain to A's - exactly the same course as the first act and pass through the same successive phases, which I shall diagram as follows" (1966[1959]: 11-12). M 171 misremembering is evocative because it melds the pragmatic work of speaking (speech acts) with the mechanical activity of embedding meaning in and producing a material, physical effect (speech sounds). It collapses the social doing of speech into the biomechanical making of speech. THE SPEECH CHAIN With its origins in Bell Laboratories, The Speech Chain points to another branch in the history of the scientific study of language in the United States, one running parallel to the trajectory of Saussurean semniology and Chomskian linguistics that linguistic anthropologists have narrated. My informants' disregard for semantic meaning and their primary concern with the physical properties of speech as sound aligns their work with early telephone engineers, who both overlapped with while diverging from Saussurian semniology. As Saussure was penning and preaching his theory of the sign in the early half of the 20th century, in the U.S., information theory, psychophysics, and industrialization coalesced in the technology of the telephone, birthing what Mara Mills (2011) calls "the industrial conception of language," or the notion of speech as "a material good and sellable commodity" (77). Telephone engineers applied Claude Shannon's information theory in an effort to translate the smallest possible intelligible unit of speech--the phone-into an electrical signal in order to move that signal across a channel from 172 sender to receiver with as little interference as possible.3 7 Concatenate with the development of Cold War weaponry and cryptography, in the making and proliferation of the telephone, the name of the game was to maximize intelligibility while minimizing cost. So persuasive was the industrial conception of language that even Saussure's semiology bares its mark. Mills notes that Saussure's theory of the sign, emblemized in the diagram I showed Victor and his colleagues, is "seemingly modeled on a telephone call"-a sender and receiver, in a closed, dyadic circuit, send communication via "impulses" along invisible but still present wires (Mills 2011: 79). Saussure indeed referred to the processes of communication as "the speech circuit." But as Timothy Lenoir observes, these early telephone engineers honed in on a key component of language that sturcturalist semiotics following Saussure ignore: the notion that "language itself is not a pure sign, it is also a thing...tied to voice, to bitmaps on a screen, to materiality" (1994: 122). In other words, the notion that language has a material existence, a texture in addition to a meaning. Not unlike the Saussurean speech circuit, in The Speech Chain, speech links "the speakers' brain to the listener's brain," emphasizing the role of the brain as the ultimate processor (Denes and Pinson 1963: 5). At either end of the speech chain is the talking brain and the listening brain, intermediated by the articulatory organs and the ear, displayed in anatomical detail resembling Ralph's simplified drawing. But gone are the threads and wires of "communication"-or perhaps signification-tying the two conversational partners together in Saussure's model. Instead, there are sound waves, rippling out of the speaker's mouth and through the ears toward the brains of both the conversational partner and the speaker themselves. 37Although, as Mills notes, the embodied subjectivity of d/Deaf individuals played a crucial role in the field of cybernetics and the object and associated infrastructure of the telephone alike, engineers founded their models and technologies on the basis of a normative, exclusionary speaking and listening subject. The vocal biomarker team's entire study likewise is premised on an audist model of speech. 173 This diagram contains Victor's addition: the cybernetic loop connecting the speaker's own speech with their ear, indicating their constant, real-time adjustment of their own voice as they listen to themselves speak, ensuring the transmission of information is as efficient as possible. The Saussurean notion of "sound-image" also departs drastically from how my informants considered and studied speech as sound, a departure that The Speech Chain took part in as well. The Saussurean sign is made up of the sound pattern (signifier) and the concept (signified). According to Saussure, "the sound pattern is not actually a sound; for a sound is something physical. A sound pattern is the hearer's psychological impression of a sound, as given to him by evidence of his own senses" (66). In Saussurean linguistics, the sounds of speech exist in a singular, individualistic, phenomenological impression, and have no independent existence beyond the listener's sensory capturing of speech's occurrence. On the other hand, for the vocal biomarker team, speech sounds are always "actually sounds." For them, there is nothing ontologically phonemic about speech sounds. Speech sounds exist definitively in a common reality that all (hearing, neurotypical 3 8) humans have access to, because speech sounds are governed by the same properties of physics that govern all materials in general and all waves in particular, from the wild and shaky waves of a shhhh to the potent waves of radiofrequency used in fMRI. Although when prompted, Sushant cited "acoustic phonetics" as the discipline from which their model of speech arose, we might also call their model of speech spectral phonetics. A spectrogram, like the team's study, brings sound into representation in a way that is agnostic to the differences between the fleshy, biological realm and the mechanical realm of electric impulses. 3 In the speech chain model, Saussurean psychological perception is replaced with brain and biology-based perception, anchored in an abelist model of the body insinuating that all humans have more or less the same brain, with the same uniform faculties for apprehending and processing sound in the same, standardized way. 174 In the Speech Chain, the core of the interactive, communicative act is spectral and biological, and orchestrated by the brain independently of the intent to create or encode "meaning" or social action. Only the brief, second chapter of the Speech Chain is dedicated to "linguistic organization," covering the phoneme, the syllable, the word, sentences, the grammatical and semantic rules of linguistic organization, stress, and intonation. The opening paragraphs of Chapter 2 explain that the "linguistic level" of speech contains the "message" of speech, which the speaker conveys by choosing "the right words and sentences to express what he wants to say. The information then goes through a series of transformations into physiological and acoustic forms" (10). In a familiar Saussurean division, the speech sounds are the vessel for the planned, intentional meaning of speech, and the units of language-symbols-"stand for objects around us and for familiar concepts and ideas" (ibid). Nevertheless, Denes and Pinson advise, "throughout the rest of this book, we will concern ourselves with relating events on the physiological and acoustic levels with events on the linguistic level" (ibid). The hierarchy of the "levels" of study in this elementary text is clear. Eight out of the nine chapters in The Speech Chain cover physics and biology, spanning from topics like the anatomy of vocal organs, neurons, nerve impulses, the peripheral and central nervous systems, the spectra of speech waves, the formants of English vowels, acoustic cues for speech recognition, and advances in the neurophysiology of speech. Moreover, the fact that the linguistic level of speech is set apart from sections on physiology and physics implies that language cannot be reduced by or pulled apart using these tools. Beyond the hierarchical organization of The Speech Chain and its thematic focus on biology and physics, why did my informants not care about the "linguistic level"? I brought this up with Sushant after talking with him about the Saussure diagram. He explained that speech is a 175 "biomechanical output," and while it has language components, the motor control components are the most elementary of all. In fact, understanding speech production at the level of motor control-as he put it, "understanding how the biomechanics of the system are used to produce things"-is a necessary prerequisite for understanding "how that symbolic mapping [of language] is translated into this continuous acoustic wave form." He was willing to concede that the biomechanistic output of speech, and its neural coordinates, does indeed have "language components," because different languages require producing different sounds, but he assured me that these differences were superficial. I probed him on this further. What about variation within spoken English, like occurrences of vocal fry, which is produced by augmenting the flow of air from the lungs to the oral cavity using the larynx? What about upspeak, another feature of some languages that comes and goes historically? Or what about tonal languages, like Mandarin, which also require different control of the larynx than a non-tonal language like English? Wouldn't that produce a drastically different "biomechanical output" and rely on different parts of the brain? Sushant stuck to his guns: S: you may have differences in the coordination. We haven't delved into Mandarin or other tonal languages in terms of depression but at least, in the Romanic or Germanic languages that we have looked at, control is very similar, but at the basic level I would think control even in Mandarin is somewhat similar: you have mechanisms, yes there's some specialties and that might influence how these processes behave, and that's one of the reasons that we feel we can extract information from voice rather than focusing on the language. B: I think the key point that I was missing when I would try to answer those kinds of questions for myself [about tonal languages] was that you guys are focused on coordination, that's what you care about as far as what's going on in the brain. S: Correct-because that is one aspect we feel is mostly language agnostic. I can't tell you that it's completely language agnostic. But we feel that the basic mechanisms by which you produce sound are in some ways common. Now. There are sensitivities and specificities in each language [...] the repertoire of phonemes in a given language is going 176 to vary, and that might influence how somebody controls those pieces. But on average across a long utterance [...] these phonemes and formants are reflecting the shape of the mouth and the state of your larynx and your breathing all in one go. And so, to us, that's a more fundamental thing that, independent of language constructs, one needs to control, so that's where we focused on.3 9 The vocal biomarker group pursues parole and the mechanisms that produce the sounds of speech for the same reason that Saussure left those things behind: because the act of coordinating the muscles necessary to produce the sounds of oral communication is the "embryo of speech," a realm that is so basic, so fundamentally human because it is so fundamentally biological, that it is beyond the "social fact" of communication. Focusing on the coarsest domain of the communicative act also ensures that their findings will be "language agnostic" and, therefore, edge toward the panhuman. To summarize Sushant's response to the conundrum of tonal and non-tonal languages: the neural pathways that drive the mechanisms for producing sounds will be the same across humans, regardless of the nature of the specific sound being produced. The underlying assumption is that human language speakers all share a common, fundamental feature: a brain, which has the same features and functions more or less the same from person to person. Sushant and his colleagues are looking for, in his words, the most fundamental "basic iota of information that will offer insight" about depression. That is, so long as depression is conceptualized as a brain disorder. Their theoretical framework for studying speech, and pursuing vocal biomarkers, combines two universalist frameworks. On the one hand, it offers up language universalism: all ' Victor had a similar answer: he explained that a "recent study looked at native Portuguese and native English speakers and found like basically no differences between any of the speech network even though you could point out some qualitative differences between the languages, like, speech is speech and you're using more or less all the same characteristics in order to produce it in the brain at least. Maybe in some languages you might lean a little more one way or another like the complexity might be in the variety of speech sounds versus grammatical structure so maybe you would get some slight brain differences either in structure or in activation but, you know, on the whole, I would say that you're almost certainly gonna have the same activation patterns across languages." 177 communities of speakers share a commonality, because they all use the same cognitive faculties to execute the task of producing oral speech regardless of the language they are speaking (Enfield 2012; Evans & Levinson 2009)." On the other hand, it offers up biological universalism: all communities of speakers in their subject population (depressed and not depressed speakers of English) share the same brain, aside from those experiencing depression, whose brains will be slightly, subtly different in a distinguishing way. Like eight out of the nine chapters of The Speech Chain, the linguistic level of oral communication is quite simply beyond the scope of the vocal biomarker group's theoretical, experimental focus. At the same time, that is not only reason they are seeking out the embryonic, pan-human level of communication. Sushant conceded that to conduct a study attentive to the linguistic level would mean his team had to deal with the particularities and nuances of human difference. Such a project would require far more material resources-money, research personnel, research subject, fMRI machines, and time-than they have access to. Sushant was clear about this: it's possible that some of these cognitive states affect the linguistic component more than the basic components, it's just that, that's a fairly complex and comprehensive project...If we were to bring in specific languages, we might need a person per language on the team, to help us do things. [We don't] have access to those kinds of resources.[...] We're not saying that language is not important, it's just outside the scope of our current approach. Studying basic components of speech communication means that there are fewer variables to control, that exclusion criteria for research subjects can tend toward the broad, and that subject recruitment, consent, data gathering, analysis, and publication can be carried out by a rather small and plucky team of five researchers (plus an eager and inexperienced anthropologist- 4° Victor had some interesting theories about "deaf speech." He mused that non-congenitally deaf people who communicate through oral speech might be using more or less the same motor activity networks as non-deaf people; the only difference is that they use somatosensory cues-the position of tongue in the mouth-as their feedback mechanism 178 research assistant). Economic resources shape the making and doing of science just as much as theoretical models and disciplinary convention. This adds another layer of significance to the industrial conception of speech. Scientifically pursuing speech at a granular scale, as a spectral emanation operating at the level of language and biological univerals, is a more cost-efficient option. Attending to speech at a greater level of complexity beyond its smallest iota-taking into consideration culture, history, difference, nuance, and meaning-would be an expensive undertaking. CONCLUSION: NOISEY SCIENCE AND NATURAL SPEECH Together, Sushant, Ted, Victor, Ralph and Santiago pursue speech not as a code of signification, but as a sonic output of the brain's inner workings. In their efforts to use the voice to understand the mind, they run into another, troublesome sonic entity: noise. In the context of information theory-the technical undergirding of communication technologies and the vocal biomarker team's spectral phonetics-noise is "the byproduct of technological reproduction that interfered with the reception of a message (i.e., static on a radio transmission, distortion over a loudspeaker, or hiss on a magnetic tape)" (Novak 2015: 128). As an unavoidable feature of technologically mediated sound, noise is not exactly a sound in its own right. It is more accurately a "metadiscourse of sound and its social interpretation" (Novak 2015: 126). Noise is defined in negative tension with and against the meaningful, the significant, and the valuable. Examining noise in place can indeed tell us something about noise's counterpart, its cybernetic twin: the signal, that which is sought after and the focus of attention. Noise is what stands in the background, to be disregarded and discarded. But even at the edges of attention, noise never 179 disappears. Like the semiotic spillage of the funny sounding man, the woman with the glasses who Ralph could not care for, or other moments in which the research subject's subjectivity (the nuances of their bodies and their voices) interrupted their transformation from people into data, noise is an excess, a leftover that gets in the way. The vocal biomarker study is noisy business. The blaring, menacing wrenching of the magnet in the fMRI machine drowns out and distorts the subject's speech. If the research subject shifts slightly in the scanner, even to move their articulators to produce sounds and sentences according to Santiago's direction, they introduce artifacts into the fMRI image. The microphones fail, the software fails, Santiago and I record the wrong kind of speech (our own), the subject (willfully?) misinterprets Santiago's directions and flubs the task. The vocal biomarker team was looking for the biological underneath the social, but they recognized that their methods and tools and techniques were incapable of offering a one-to-one correspondence with the truth. There was always going to be some kind of interruption, some form of interference. On the one hand, noise is a problem to manage. On the other hand, noise is a testimony to the limits of abstraction and reduction. Just as their efforts to hack mental health care research still contain an unshakable grain of the very classificatory system they seek to disrupt, their pursuit of the universal and bedrock foundations of both mental illness and language contain a signal-jamming grain of the particular, the subjective, the irreducible, the different. Laboratory technicians like Sanitago are tasked with managing the noisy excess of research subject's bodies and voices-the never-ending and menial work of noise cancelation. The goal of the study was to pry apart sound from semantic meaning, biology from the body. The models of speech that the team adheres to and that guides their investigation torques the relationship that a linguistic anthropologist might anticipate between the body and speech. This 180 is because their models essentialize the body-a normative, audist body-as a site of truth, taking the body out of the social by attempting to disentangle speech, and sound, from the social. To search for vocal biomarkers of depression, they must materialize speech through the body of the research subject only to disembody it again, enacting what Schafer calls "schizophonia" (Schafer 1969). They strive to split the sounds of speech from their source, removing them from the particularities of its contexts, only to argue that it is a universally true sign that has been in the body all along, waiting to be found. But in order to achieve this feat of abstraction, the team runs into the very same components of language that they seek to overcome and pull away from the act of speaking: context, semantic meaning, variation and difference. Close cousin to the pair of signal and noise are mediation and immediation. Mediation, Mazzarella argues, is "the ambiguous foundation of all social life," involving the multitude of "conceptual, technical, and linguistic practices by which the actually irreducible particularties of our experience are...reduced...rendered provisionally commensurable and thus recognizable and communicable in general terms" (2006: 476). Eisenlohr notes that, paradoxically, media are the most successful when they disappear, when the fact of mediation melts away, giving the impression of im-mediacy, "drawing attention away from their own materiality and technicality in order to redirect attention to what is being mediated" (2011: 44). In theory, a vocal biomarker of depression is a sound that cuts directly to biological processes, so directly that its mere presence stands in for and is commensurate with a pathological brain state. A vocal biomarker suggests an immediate-im-mediated-connection between voice and mind. This paradox- convincing mediation draws attention away from the very fact of mediation-is a source of power, fueling what Mazzarella calls"the politics of immediation" (2006). 181 One instantiation of this power emanates from the language sciences and their hegemonic commitment to language universals. Language universalism implies that all human language have some irreducible core, durable center that culture, history, and politics can never touch or budge, and that can be arrived at once these other, superficial layers are melted down. In this chapter, I hope to have emphasized what an STS approach to the scientific study of language can achieve: a demonstration of how facts about language, especially facts about the biological basis of language, are mediated, remediated, made and assembled. My attempt to de-naturalize the figure of "natural speech" in the scientific study of language resembles Penny Eckert's (2003) critique of the figure of the "authentic speaker" in sociolinguistic research. Authenticity, like naturalness, is aligned with proximity to the ingrained and invariable core, while inauthenticity, like unnaturalness, is "tainted by the social" (392). The authentic speaker conveys vernacular realness, while the inauthentic speaker conducts a conscious, intentional performance. Likewise, "natural speech" implies a downplaying of the speaker's agency and intentionality. But as I have shown, speech has to be made natural. It is someone's job to ensure that context, culture, the speaker's own interpretations, are all kept at bay. This is why the work of troubleshooting, fixing, and tucking in, while hierarchically nominal, are in fact epistemologically central. This is the very work of rendering the research subject into a transparent medium. If this work is done well, it too will fall to the edges of attention, slip into transparency, enabling mediation to shimmer away into immediation yet again. The advent of Computational Psychiatry suggests that the burden of defining and communicating the signs of mental illness will shift away from the patient. Biological truth will emanate from the sufferer's body, and the knowledge of their own suffering will be external to their sense of self. This is another reason why it is so crucial to emphasize that mediation and 182 noise are a key feature of the search for vocal biomarkers. The notion of a vocal biomarker threatens to naturalize the body as the only site of mental illness, and further threatens to extricate definitions of health and wellbeing from the patient, "utterly decoupled from anything experiential" (Dumit 2012: 123). It re-creates the asymmetrical power relations that feminist critiques of psychoanalysis have attempted to intervene on, suggesting a scenario in which mental illness is a mysterious code that only a technical expert can unwind and demystify through the operation of machines whose inner workings remain out of reach. In this way, Computational Psychiatry's most morally pressing concerns rests not with machines that threaten to replace humans, but with the de-humanization of patients as mere automatons emitting neurobiologically significant exhaust. 183 References Alpert, Murray, Enrique R. Pouget, and Raul R. Silva. 2001. "Reflections of depression in acoustic measures of the patient's speech." Journal ofAffective Disorders 66(1): 59-69. Barad, Karen. 1999. "Agential realism: Feminist interventions in understanding scientific practices." In The Science Studies Reader, ed. Mario Biagioli. Pp. 1-11. New York: Routledge. Breznitz, Zvia. 1992. "Verbal Indicators of Depression." The Journalo f General Psychology 119(4): 351-636. Callon, Michel and C. Meadel and V. Rabehariosa. 2012. "The Economy of Qualities." Economy and Society 21(2): 194-217. Cannizzaro, Michael, Brian Harel, Nicole Reilly, Philip Chappell, and Peter J. Snyder. 2004. "Vocal acoustical measurement of the severity of major depression." Brain and Cognition 56(1): 30-35. Carr, E. Summerson. 2010. Scripting Addiction: The Politics of Therapeutic Talk andAmerican Sobriety. Princeton: Princeton University Press. Caton, Steven C. 1987. "Contributions of Roman Jakobson." Annual Review ofAnthropology 16: 223-260. Conrad, Peter. 2007. The Medicalizationo f Society On the Transformation ofHuman Conditions into Treatable Disorders. Baltimore: John Hopkins University Press. Chumley, Lily Hope and Nicholas Harkness. 2013. "Introduction: QUALIA." Anthropological Theory 13(1/2): 3-11. Cummins, N. J. Epps, M. Breakspear, and R. Goecke. 2011. "An investigation of depressed speech detection: features and normalization." Proceedings oflnterspeech, ISCA, Florence, Italy, pp. 2997-3000. Cummins, Nicholas, Stefan Scherer, Jarek Krajewsi, Sebastian Schnieder, Julien Epps, and Thoams F. Quatieri. 2015. "A review of depression and suicide risk assessment using speech analysis." Speech Communication 71: 10-49. Cummins, Nicholas, Vidhyasaharan Sethu, Julien Epps, Sebastian Schnieder, and Jarek Krajewski. 2015. "Analysis of acoustic space variability in speech affected by depression." Speech Communication 75: 27-49. Darby, John K. and H. Hollien. 1977. "Vocal and Speech Patterns of Depressive Patients." Folia Phoniatricae t Logopaedica 29(4): 279-291. 184 Darby, John K., Nina Simmons, and Philip A. Berger. 1984. "Speech and voice parameters of depression: A pilot study," Journal of Communication Disorders 17(2): 75-85. Denes, Peter B. and Elliot N. Pinson. 1963. The Speech Chain: The Physics and Biology of Spoken Language. Ann Arbor: Bell Telephone Laboratories. Dumit, Joe. 2012. Drugsf or Life: How PharmaceuticalC ompanies Define Our Health. Durham: Duke University Press. Duranti, Alessandro. 2003. "Language as Culture in U.S. Anthropology." CurrentA nthropology 44(3):324-347. Duranti, Alessandro. 2004."Agency in Language." In A Companion to Linguistic Anthropology. Alessandro Duranti, ed. Pp. 451-473. Malden: Blackwell Publishing. Eckert, Penny. 2003. "Sociolinguistics and authenticity: an elephant in the room." Journal of Sociolinguistics 7(3): 392-431. Enfield, N.J. 2012. "Language, culture, and mind: trends and standards in the latest pendulum swing." Journalo fthe Royal Anthropological Institute 19:155-169. Evans, Nicholas and Stephen C. Levinson. 2009. "With diversity in mind: Freeing the language sciences from Universal Grammar. Behavioral and Brain Sciences 32(5): 472-492. Flint, Alistair J., Sandra E. Black, Irene Campbell-Taylor, Gillian F. Gailey, and Carey Levinton. 1993. "Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression." Journalo fPsychiatricR esearch 27(3): 309-319. France, D.J., R.G. Shiavi, S. Silverman, M. Silverman, and M. Wilkes. 2000. "Acoustical properties of speech as indicators of depression and suicidal risk." IEEE Transactions on Biomedical Engineering 47(7): 829-837. Gershon, Ilana. 2010. "Media Ideologies: An Introduction." Journal ofLinguistic Anthropology 20(2): 283-293. Greden, J.F., A.A. Albala, and I.A. Smokler. 11981. "Speech pause time: a marker of psychomotor retardation among endogenous depressives. Biological Psychiatry 16: 851-859. Guenther, Frank. 2016. Neural Control of Speech. Cambridge: MIT Press. Godfrey, Hamish P.D. and Robert G. Knight. 1984. "The Validity of Actometer and Speech Activity Measures in the Assessment of Depressed Patients." The British Journal ofPsychiatry 145(2): 159-163. Hacking Ian. 1986. "Making up people." In ReconstructingI ndividualism: Autonomy, Individuality, and the Selfin Western Thought, ed. T Heller, M Sosna, DWellberg, pp. 222-36. 185 Stanford: Stanford University Press. Haraway, Donna. 1988. "Situated Knowledges: the Science Question in Feminism and the Privilege of Partial Perspective." Feminist Studies 14(3): 575-599. Harkness, Nicholas. 2017. "Glossolalia and cacophony in South Korea: Cultural semiosis at the limits of language." American Ethnologist 44(3): 476-489. Hollien, Harry. 1980. "Vocal Indicators of Psychological Stress." Forensic Psychology and Psychiatry 347(1): 47-71. Hymes, Dell. 1964. "Introduction: Toward Ethnographies of Communication." American Anthropologist 66(6):1-34. Hymes, Dell. [1972] 2001. "On Communicative Competence." In LinguisticA nthropology: A Reader. 2001. Alessandro Duranti, ed. Pp. 53-73. Maden: Blackwell Publishers. Jones, Graham, Beth Semel, and Audrey Le. 2015. "'There's no rules. It's hackathon.': Negotiating Commitment in a Context of Volatile Sociality." Journalo fLinguistic Anthropology 25(3): 322-345. Joyce, Kelly A. 2008. Magnetic Appeal: MRI and the myth of transparency. Ithaca: Cornell University Press. Lakoff Andrew. 2006. PharmaceuticalR eason: Knowledge and Value in Global Psychiatry. Cambridge: Cambridge University Press. Langlitz, Nicolas. 2010. "The persistence of the subjective in neuropsychopharmacology: observations of contemporary hallucinogen research." History ofthe Human Sciences 23(1): 37- 57. Langlitz, Nicolas. 2012. Neuropsychadelia: The Revival ofHallucinogen Research since the Decade of the Brain. Berkeley: University of California Press. Lasswell, Harold D. 1930. Psychopathology and Politics. Chicago: University of Chicago Press. Latour, Bruno. 1998. Science in Action: How to Follow Scientists and Engineers Through Society. Cambridge: Harvard University Press. Lenoir, Timothy. 1994. "Was the Last Turn the Right Turn? The Semiotic Turn and A. J. Greimas." Configurations2 : 119-36. Low, Lu Shih Alex, Namunu C. Maddage, Margaret Lech, Lisa Sheeber, and Nicholas Allen. 2010. "Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents." IEEE InternationalC onference on Acoustics, Speech and Signal Processesing, Dallas, TX https://ieeexplore.ieee.org/document/5495018 186 Luhrmann, Tanya M. 2001. Of Two Minds: An AnthropologistL ooks at American Psychiatry. New York: Vintage Books. Mattern, Shannon. 2018. "Maintenance and Care." Places Journal, November. Accessed 9 Jan 2019. < https://placesjournal.org/article/maintenance-and-care/?cn-reloaded=#0> Martin, Aryn, Natasha Myers, and Ana Viseu. 2015. "The politics of care in technoscience." Social Studies of Science 45(5) 625-641. Martin, Emily. 2007. BipolarE xpeditions: Mania and Depression in American Culture. Princeton: Princeton University Press. Mazzerella, William. 2006. "Internet X-Ray: E-Governance, Transparency, and the Politics of Immediation in India."PublicC ulture 18(3): 473-305. Metzl, Jonathan. 2010. The Protest Psychosis: How Schizophrenia Became a Black Disease. Boston: Beacon Press. Mills, Mara. 2011. "On Disability and Cybernetics: Helen Keller, Norbert Weiner, and the Hearing Glove." differences 22(2-3): 74-111. Mithun, Marianne. 2004. "The Value of Linguistic Diversity." In A Companion to Linguistic Anthropology. Alessandro Duranti, ed. Pp. 121-140. Malden: Blackwell Publishing. Moore, E., M. Clements, J. Peifer, and L. Weisser. 2003. "Analysis of prosodic variation in speech for clinical depression." In Proceedings of the 25 'h Annual InternationalC onference of the IEEE Engineering in Medicine and Biology Society, pp. 2925-2928. Morrison, Hazel, Shannon McBriar, Hilary Powell, Jesse Proudfoot, Steven Stanley, Des Fitzgerald, and Felicity Callard. "What is a Psychological Task? The Operational Pliability of 'Task' in Psychological Laboratory Experimentation. Engaging Science, Technology, and Society 5: 61-85. Morawski, Jill. 2015. "Epistemological Dizziness in the Psychological Laboratory: Lively Subjects, Anxious Experimenters, and Experimental Relations, 1950-1970." Isis 106:3): 567- 579. Novak, David. 2015. "Noise." In Keywords in Sound. David Novak and Matt Sakakeeny, eds. Pp. 125-138. Durham: Duke University Press. Ozdas, A., R.G. Shiavi, S.E. Silverman, M.K. Silverman, and D.M. Wilkes. 2004. "Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk." IEEE Transactionso f Biomedical Engineering 51(9): 1530-1540. 187 Rheinberger, Hans-J6rg. 1997. Toward a History ofEpistemic Things: Synthesizing Proteins in the Test Tube. Stanford: Stanford University Press. Rose, Nikolas and Joelle M. Abi-Rached. 2013. Neuro: The New Brain Sciences and the Management of the Mind. Princeton: Princeton University Press. Saussure, Ferdinand de. 1966[1959]. Course in General Linguistics. Wade Baskin, trans. Charles Bally Albert Sechehaye, eds. New York: McGraw-Hill Book Company. Schafer, R. Murray. 1969. The New Soundscape: A Handbookfor the Modern Music Teacher. Ontario: Berandol Music Limited. Schuller, Bj6rn, Stefan Steidl, Anton Batliner, Fleix Burkhardt, Laurence Devillers, Christian Mfller, and Shrikanth Narayanan. 2013. "Paralinguistics in speech and language-State-of-the- art and the challenge." Computer Speech and Language 27(1): 4-39. Seaver, Nick. 2017. "Algorithms as culture: Some tactics for the ethnography of algorithm systems." Big Data and Society 1-17. Silverstein, Michael. 1979. "Language Structure and Linguistic Ideology." In The Elements: A Parasessiono n Linguistic Units and Levels. Paul R. Clyne, William F. Hanks, and Caroll Hofbauer, eds. 193-247. Chicago: University of Chicago Press. Silverstein, Michael. 1998. "The Uses and Utility of Ideology: A Commentary." In Language Ideologies: Practicea nd Theory. Bambi B. Schieffelin, Kathryn Woolard, and Paul Kroskrity, eds. Pp. 123-145. New York: Oxford University Press. Strimbu, Kyle and Jorge A. Travel. 2010. "What are biomarkers?" Current Opinion in HIV and AIDS 5(5): 463-466. Vidal, Fernando. 2009. "Brainhood, anthropological figure of modernity." History ofthe Human Sciences 22(1):5-36. Vidal, Fernando and Francisco Ortega. 2017. Being Brains: Making the CerebralS ubject. New York: Fordham University Press. Williams, Janet B.W. 2001. "Standardizing the Hamilton Depression Rating Scale: past, present, and future." European Archives ofPsychiatry and ClinicalN euroscience 25 1: Suppl. 2 11/6- 11/12. Worboys, Michael. 20120. "The Hamilton Rating Scale for Depression: The making of a 'gold standard' and the unmaking of a chronic illness, 1960-1980." ChronicI llness 9(3): 202-219. Ziporyn, Evan. 2013. "Visiting Artist Arnold Dreyblatt's Magnetic Resonances." March 19, Centerfor Arts, Science & Technology at MIT, Accessed May 21, 2019. 188 Chapter 3: Do Androids Dream of Electric Speech? The small beam of white light shone steadily into the left eye of Rachael Rosen, and against her cheek the wire-mesh disk adhered. She seemed calm. Seated where he could catch the readings on the two gauges of the Voigt-Kampff testing apparatus, Rick Deckard said, "I'm going to outline a number of social situations. You are to express your reaction to each as quickly as possible. You will be timed, of course." "And of course," Rachael said distantly, "my verbal responses won't count. It's solely the eye-muscle and capillary reaction that you'll use as indices. But I'll answer; I want to go through this and -. " She broke off. "Go ahead, Mr. Deckard." - (Dick, Philip K. 1968. Do Androids Dream ofElectric Sheep? New York: Random House. Pp. 46) In the first week of my fieldwork at the West Coast University (WCU) Research Institute, an engineer I call Klaus asks me to meet him in his large, sunlit office a few feet away from the cubicle that the Institute's head administrator has secured for me. My corner cubicle, #304-c, sits at the periphery of a uniform cubicle sea, and it hews close to a wall of offices and conference rooms named after key figures in the history of American computing. If I slide open the short plastic door of #304-c, I can see who is coming and going from the Grace Hopper conference room directly in front of me. And if the door of the room is open, I can see all the way out the curved glass windows to a busy four-lane highway lined with skinny sidewalks and traffic signals, and to the hazy sky dotted with seagulls, palm trees, distant hills, and billboards advertising headphones and television shows. As early as seven years ago, most of this area was undeveloped marshlands, some place local school children visited on bird-watching fieldtrips. Now, it has been reborn into an up-and-coming technology hub that bursts forth with multi- million-dollar condos, upscale grocery stores and yoga studios, and sleek industrial parks, including the one in which the Research Institute sits. Like many other things at the Institute-the Game Room, the Meditation Room, the Kitchen, the Lounge, the Theatre, the Research Subject Consenting Room-cube #309-c is 189 clean, ergonomic, artfully constructed, and lacking in warmth. It is immersed in a quiet that feels eerie given that one side of the building faces the traffic-congested street, and given the over 300 cubicles on the floor, most of which are filled throughout the day with visiting researchers or post-docs, graduate students, and undergrads who are shuttled in from the main WCU campus 20 miles away. When my surrounding cube-mates do begin talking, faces unseen from within their little plastic boxes as they make idle chat about lunch plans or the news, they fall silent if I try to join in and resume their conversation, acting like I've said nothing at all. Three days after my arrival, the day Klaus calls me into his office, I overhear people whose voices I can already recognize suspiciously mulling over my presence from a few cubes away: who is the girl in Jackie's old cube? What exactly is she here for? What does she want from us? What are we USER INTERFACE (ABBY) FTWA allowed to tell her? I've come to the University to work as a research assistant and learn more about at a technology that a federal agency contracted Klaus and his colleagues in the psychology department to build, a technology I call the Virtual Human Interviewer (VHI). The program 190 officer of the federal defense agency funding this project wanted members of the engineering and psychology departments to collaborate to create an intervention for the high incidence of veteran and soldier suicide and the under-reporting of mental health issues. The agency propositioned Klaus and his colleagues to build a system that could tirelessly and systematically identify the nonverbal signals of post-traumatic stress disorder (PTSD) that soldiers inadvertently convey, inspired by the evolutionary psychology theory of "honest signals".41 Unlike conventional modes of listening in psychiatric assessment, in which a mental health care worker attends primarily to the content of a person's speech as they answer a set of interview questions about their general psychiatric state, the VHI is incapable of analyzing semantic content. It attends only to the form-the sonic contours-of speech. The VHI has two components: first, there is the software, which Klaus and the engineering team built (which I call VirtuSense). Secondly, there is the user interface, which the psychology team built. To be interviewed by the VHI, subjects are hooked up to a microphone and sit in front of a large screen and a small web cam. On the screen is an animated character: an adult woman with olive skin and dark brown hair, who I refer to as Abby. 42 Abby appears to ask the research subjects a series of interview questions based on a combination of psychiatric assessment scales for PTSD and depression. As you speak to Abby, VirtuSense processes the audio-visual input that the microphone and webcam capture. VirtuSense analyzes this input and 4 Klaus eventually revealed to me that the program officer drew direct inspiration from a now out of print book by Alexander "Sandy" Pentland called Honest Signals: How They Shape Our World (2008). Pentland is a computer scientist, often heralded as the grandfather of wearable technologies and one of the most cited authors in computer science. He directs the Connection Science and Hyman Dynamics labs at the MIT Media Lab. This particular book draws from theories in evolutionary psychology to posit the existence of an unconscious social signaling system that runs alongside language, and that was developed as a precursor to spoken language and is still used to this day by all communicating humans. Pentland suggests that wearable sensor devices can be used to capture, interpret, and operationalize these signals (i.e., using them to better understand and get a leg up on business negotiations, interpersonal relationships, etc.) 4 While this is a pseudonym, it resembles the name that the team had given to the system: a proper noun, gendered female. 191 then calculates a score for the assessment scales. This software also enables Abby to provide real-time, non-verbal feedback, in response to the paralinguistic signs that you display to the system's sensors. If you smile, Abby smiles. If you lean forward, she does too. She nods as you answer the interview questions, prompting you with positive minimal responses like, "hmm," "ok," or open-ended follow-ups in response to one-word answers like, "can you tell me more about that?" According to the psychology team, this interactive animation is meant to illustrate that Abby is listening, all in order to establish a sense of rapport and encourage the research subject to keep talking, producing enough speech data as possible for VirtuSense to calculate a robust assessment score. As soon as I take my seat in Klaus's office, he tells me that he has set up this meeting so that I can get to know the VHI data as soon as possible. Klaus is trained in machine learning and speech signal processing. As the bookshelves in his office testify, although he has no clinical training, he reads up on psychiatry and psychology often, paying special attention to diagnostic inventories and shifting trends in diagnostic criteria. A blonde, Austrian man, his square, stern face hides a laid back, lassize-faire attitude. He embodies the stereotypical hacker demeanor that anthropologists like Christopher Kelty (2008) and Gabriella Coleman (2014) have observed: a tendency to subtly, ironically, subvert convention and authority while working from inside it. Much to my relief, Klaus begins writing up a list of people involved in the VHI project, walking me through who they are, where they sit, and what they do. I do not mention the icy reception I got from my cube mates, but Klaus makes it clear that people will be more forthcoming about speaking with me if I introduce myself as his student. I soon learn that he is well liked and well respected across the Institute. With an official appointment in WCU's engineering department, he is the PI of the Institute's Multi-Modal Analysis team (the engineering team). He oversees 192 several graduate students, post-docs, and a few lucky and talented undergrads, all of whom he gathers together to meet once a week in Grace Hopper. Klaus tells me that he has set up this meeting so that I can "get to know" the VHI data as soon as possible. He often advises his students to review items in the data set before they begin their work-this allows them to develop a system that is informed by the data's nuances and textures. I'd naively been expecting us to examine lines of code together, so I'm startled when he pulls up an archived video of an Abby interview with a research subject, a veteran. This is when I realize that the 500 or so video recorded interviews are the VHI data. Or, to be more precise, the research subjects' speech is the data, the fundamental building blocks of the whole VHI system. Like many of WCU's research subjects, this subject is a veteran, recruited from either Craigslist or the local Department of Veteran Affairs (VA). Klaus begins the video, and Abby, the user interface, introduces herself: "Hi," says Abby, in a Standard American English voice, "I'm Abby. Thanks for coming in today. I was created to talk to people in a safe and secure environment. I'm not a therapist, but I'm here to learn about people, and I'd love to learn about you." This phrase, "I'm not a therapist," is key. It is meant to indicate that interactions with the VHI do not constitute professional medical care. The veteran says yes, and the interview is on its way. The interview starts off with Abby asking light-hearted questions about the veteran's favorite place to travel. Gradually, the questions creep into darker territory, like, "what's a memory you wish you could erase from your mind?" Although the vet indicated on a form that all research subjects fill out before their interview that she had no upsetting dreams about her past, the vet describes flashbacks of a near- death experience from her deployment and confesses to Abby, "I've had every fucking dream there is to have." The psychology researchers take this misalignment between what the vet put on 193 paper and what she says to Abby to be proof of the system's success. They reason that Abby strikes a sweet spot in the uncanny valley: subjects reveal more to her than they would to a human, because she is clearly not a human. At the same time, they reason that her interactive, responsive yet nonintrusive feedback component makes the assessment process feel more like a "natural," dyadic conversation. As the interview progresses, the questions grow more probing, and the contents of the veteran's answers grow more graphic, to the point that I'm uncomfortable to be listening alongside Klaus. But Klaus is not listening to the content. He wants me to guess if the "computer" (VirtuSense) assessed the veteran as showing signs of PTSD, depression, or neither. He tries to direct my ears to the kinds of things that the software is supposed to pick up. "Listen to the breathiness of her voice," he urges me, "or how she slurs her words a little." I guess that she is showing signs of depression, but Klaus tells me that I'm incorrect. He plays the interview again, but I still can't hear the breathiness or the slurs. And what I also can't see is that there are other people present in the video-people whom I wouldn't learn about until much later on in my fieldwork: two younger, female members of the psychology team I call Nava and Taylor, who had watched and listened to the veteran's interview from another room, monitoring the content of her speech for any mentions of suicide or homicide in a way that VirtuSense couldn't, because the system is not designed to analyze content and cannot catch the semantic nuances of suicidal speech-it cannot even identify individual words. For legal liability reasons, for the sake of the wellbeing of the research subjects, and because the VHI did such a good job of not attending to speech content-despite Abby's animation suggesting otherwise- there always had to be humans in the loop. 194 Klaus shows me several more videos, following this same procedure: he plays the video and then asks me to guess VirtuSense's assessment. The research subject's responses to the assessment questions-tales of assault, estrangement, and violence-continue to disturb me, but Klaus's attention remains fixated elsewhere. I remark that the videos are tough to take in. Queuing up another video, Klaus says, "that's why we need virtual humans." If this was what he wanted to me "get to know about the data," then his message is a contradictory one. It honors the work of people doing psychiatric assessment-presumably, the professional figure after which Abby was modeled, one who primarily plays a listener or facilitator role in a conversational interaction aimed at gathering information about the other conversational partner. Klaus recognizes that the job of these professional actors is demanding and draining, in part because taking up the potentially disturbing content of a would-be patient's speech while remaining as calm, receptive, and understanding as possible is emotionally exhausting work. At the same time, to suggest-as Klaus does and as his colleagues did, in building the VHI in the first place-that it is possible for a machine to do this listening instead of a person inadvertently devalues that labor, implying (intentionally or not) that it does not require the type of skilled, tacit knowledge that automated systems are incapable of capturing. This first encounter with Klaus and the VHI brings to mind the Voight-Kampff test depicted in the 1982 film, Blade Runner, which was inspired by Philip K. Dick's 1968 novel Do Androids Dream ofElectric Sheep? In a post-apocalyptic future where androids and humans co- exist and are indistinguishable from one another, the Voight-Kampff test is supposed to help bounty hunters sort out the humans from the machines. Like the VHI, when the suspected android answers a series of interview questions, the Voight-Kampff apparatus focuses in on minute, unconscious bodily reflexes that reveal the speaker's inner state irrespective of the 195 content of their answers. It's telling that the 2017 sequel to Blade Runner re-imagines the Voight-Kampff as a test for PTSD that seeks out signals of emotional trauma in the voice. This speaks to the cultural pervasiveness of the idea that emotions and psychic pain are contained in the voice, are unconsciously expressed, and can be made knowable by listening, but must be listened to in a certain way with the aid of technological intervention. Despite these familiar resonances with the Voight Kampff test, my meeting with Klaus and his attempt to get me to guess the VHI assessment was not so much a test as it was a demonstration.a demonstration that I am human, and that there are limits to what I can hear in mental illness. He provided a simulation of what the software listened for by showing how out- of-reach these signs were to me. This was also an enactment of why the VHI is necessary: I was focused on the content-indeed, driven to distraction by the veteran's words-while Klaus's software could focus on things that were adjacent to the content. But in addition to demonstrating the power and necessity of the system, Klaus had also performed a sleight of hand, a kind disappearing trick. His demonstration left out the role of Nava and Taylor, who monitored the subjects from a hidden room, whose responsibility it was to listen to the content of the speech that the system ignored. This is not to suggest that Klaus was trying to hide these women from me or that he didn't want to reveal their presence, although the larger purpose of the study was to convince research subjects that Abby was all machine and that there were no humans listening to them, because the team wanted to investigate if people are more emotionally open when they think they are talking to a computer. Perhaps Klaus didn't bring up Nava and Taylor to me because they weren't a part of his definition of the software system, VirtuSense, which was his primary research interest. He wasn't very interested in Abby, 196 the user interface. Nava and Taylor weren't his students. They weren't on his team-they were on the psychology team. As members of the psychology team with few academic or professional credentials, Nava and Taylor were also responsible for explaining the study to research subjects, securing their consent, and then debriefing them after their interview. In the very early stages of the research, it was their job to interview research subjects face-to-face and then transcribe and code their interviews, outlining the basic interactional infrastructure that would be built into Abby. The team then selected Nava to be the "voice" of Abby, and she spent many hours in a recording booth reciting the lines of speech that Abby now speaks. In later stages, as the psychology team was trying to figure out what Abby's animation should look like (what her "active listening" and rapport-building body language look like) these two young women actually controlled the bodily movements and the timing of Abby's questions as research subjects interacted with the VHI. Neither of these young women had extensive clinical training-Nava was still an undergraduate at the time of the study. Still, they played a fundamental role in making sure that the rest of the team got the data that they needed, by managing the comfort of the research subjects (the data source) by managing the extraction of data (answers to the interview questions). Klaus's conjuring trick connects with a pervasive feature of automated systems that scholars in anthropology and science and technology studies (STS) refer to, following Hamid Ekbia and Bonnie Nardi (2017), as heteromation. According to Ekbia and Nardi, it's not very productive to think of automation as machines doing things autonomously, with no human intervention. Instead, it's more productive and actually much more accurate to think about automation as a mixture of human and machine work-in other words, heteromation. By bringing the humans who play a fundamental role in automated systems back into view, humans 197 like Nava and Taylor, the concept of heteromation gives us an anthropological grip on studying automation as a cultural process rather than one that is set aside from culture. It also opens up the space to explore why humans like Nava and Taylor are so hard to find in representations of autonomous systems, so that we can dig into the politics of their invisibility. ANIMATING ASSESSMENT In this chapter, I tack back and forth between the ethnographic present and the history of the VHI's development (which is wrapped up in the history of the Institute itself) as retold to me by the researchers who worked most closely together on the project, and by comparing these narratives with institutional documents, publicly available materials (press coverage, the Institute's website, etc.) The VHI has yet to come to full, clinical fruition. It has been more or less shelved since data collection ended in 2015, and it will not be put to use in its desired clinical contexts, like the local VA, anytime soon. For these reasons, rather than looking at its reception in the popular press or among mental health care professionals alone, this chapter attempts to trace out the hopes, imaginaries, and legitimizing rhetoric that drove the dreaming up, development, and testing of the VHI, and that continues to animate it. I draw from archived videos of interactions between research subjects and the VHI and my own interactions with the system and with other human-computer assemblages. I also draw from my experience collaborating with researchers to carry out a (ultimately failed) comparative study, in which we attempted to run the VHI sensory processor through a life-size, humanoid robot. Getting to know the system through its frustrating, puzzling failures and shortcomings helped me to understand the friction between how the virtual human is designed to appear to 198 research subjects, and how the perceptual system takes up and interprets human speech, bringing into greater relief the tension between the multiple modes of listening and participation frameworks that the system entails and partakes in. This firsthand experience with the two components of the system (the virtual human and the sensory processor) butted up against and fell short of the hopeful and promissory representations of the system-what it supposedly does and how it works-in grant proposals, drafted and revised articles for publication, promotional videos, and conversations with the press and the general public at symposia and during monthly open house tours of the Institute. Analytics like heteromation underscore the image of the autonomous machine operating with little to no human intervention is a cultural myth, reinforced in the U.S. and the U.K. by popular and conventional histories of computing, which struck from the historical record the oftentimes gendered labor and laborers that made the development of contemporary computers possible (Daston 1994; Light 1999; Chun 2011; Hicks 2017). If computers and other machines appear to be acting on their own accord and with their own agency, this illusion is achieved through the erasure of the humans who maintain and mediate the interaction between machines and the people that use them, "removing some people out of the loop so that others [i.e., end users] may feel close to the machine" and are given the impression that the machine is completely autonomous, and that their interaction is unmediated (Irani 2013:733). For my informants, "closeness with the machine"-the VHI-is the interactional goal; the user interface in particular has been designed with the hopes that users (the research subjects) will come to trust it and as a result, emote openly in front of the system's various sensors. This "closeness" itself turns on the illusion that the VHI is entirely machinic, with no human intervention, even while it 199 depends on the downplaying of humans like Nava and Taylor, whose vigilance and attention keeps the interaction socially meaningful as well as psychiatrically safe. I show how researchers use the virtual human craft and stage an interaction that capitalizes on a language ideology dominant in American psychiatry that privileges the referential function of language, all in order to provide the system's sensory processor with data that is analyzed in a way that conflicts with that ideology. My interlocutors contrasted the sensory processors' "machine listening" against what they called "human listening," "listening as a human," or listening that had "the human touch," the kind of listening that is wrapped up with Euro-American conceptualizations of empathy that the virtual human pantomimes, and also the kind of listening that must go on behind the scenes to ensure the system's proper functioning. Using the VHI and its related components-especially the user interface, Abby-my informants attempt to encourage trust, rapport, and intimacy, engineering feelings of closeness between the research subject and the technology to encourage emotional expressiveness. Analyzing the design and development process lays bare how ideas about the relationship between language and self that circulate in contemporary psychiatric encounters in the U.S. depend upon a model of the self as an individualized, authentic core, interior to the person and inaccessible to the public. Building rapport and enabling trust amounts to cajoling a person to allow an interlocutor to access this private, secreted core. The trope of animation is analytically useful in piecing apart the VHI, and not only because the VHI interface is an animated character on a screen. Animation is a useful trope especially in regard to the questions of interaction, affect, and labor that are ethnographically central to this dissertation. The literature on animation expands upon Goffman's (1974; 1981) key texts on participation framework, taking seriously his invitation to move beyond a 200 performance model of expression and self and pursuing more nuanced analyses of agency, intentionality, and technology in linguistic interactions (Gershon 2015; Manning 2018). Goffman pointed out that an interaction never really just involves two people-the speaker and the hearer-but instead involves multiple parties, or multiple participants with different levels of engagement (see also Goodwin and Goodwin [2004]). There are ratified and non-ratified addressees (intended and unintended recipients of speech), the principal (the person or parties whose viewpoints motivates the speaker's talk), the author (the person who composes the form and content of the speaker's utterances) and, finally, the animator, "the talking machine, the body engaged in acoustic activity" (Goffman 1981: 144). The notion of the animator and its attendant action, "animation," implies that the participant producing oral speech may not always mean what they say, challenging the relationship between speech and intentionality that many anthropologists have found to be a key feature of Christian, Euro-American models of language and mind (Throop and Murphy 2002; Robbins 2004; Desjarlais and Throop 2011; Duranti 2014) and which I argue are central to psychiatric interactions in the U.S. That Goffman refers to the animator as a talking machine points to the potential utility for "animation" to illuminate situations, like my ethnography, in which non-human machines mediate, modulate, and intercede upon human speech. Moreover, animation challenges the dramaturgical, performance model of interaction, in which interactional actors "play" social "roles" that do not necessarily align with their true selves, which remain intact and can always be returned to. "Performance" maintains the connection between language and authenticity, insinuating that the model of the self that is conventional to U.S. psychiatry is a feature of all interactions, and all interactional partners. Ethnographically exploring the VHI and its distributed assemblage of animators and actors 201 illustrates how authentic, intimate encounters are made, and the felicity of these encounters (along with the psychological health of the speaker) is maintained. The concept of animation also helps me to flesh out the distinction between what I call "linguistic labor" and "emotional labor" (Russell Hochschild 2012). Russell Hochschild developed this term to refer professions that involve displays of positive affect, like wait staff and flight attendants, even if these emotions are inconsistent with what the person is actually feeling. The "labor" of emotional labor comes from the misalignment between how the person feels, and what they show to be their feelings; this reiterates the front-stage, back-stage dramaturgical setup of performance theory, in which there is a true self to be found all along. Linguistic labor decenters the role of emotions and focuses instead on communicative strategies which craft the impression that speech in an interaction is being taken up in one way or another-for example, that listening to a person's story elicits sympathy from the listener and affective investment in the speaker. Additionally, animation confronts us with questions of resemblance and similitude. As Silvio (2010) writes, animation encompasses "a range of technologies and skills that are used to create the 'illusion of life' in the guise of puppets, dolls, and masks" (426). The life-likeness, liveliness, and like-ness of Abby-the illusion that the interface is alive and the extent to which interacting with the system resemblance a socially legible form of interaction-is distributed across multiple people, and depends on the concealed labor of Nava and Taylor (Suchman and Stacey 2012). Keeping this in mind, I parse apart what it means to make Abby a "virtual human" by leaning into virtuality's connotations of almost but not quite-of seeming like the real thing in an incomplete way. Specifically, I focus on the psychology team's arguments about how they used Abby to manage the flow of research subjects' speech and to shape subjects' impressions of how their speech was being listened to. Engineers relied on research subjects' highly emotional 202 speech in order to build their software-their speech was the team's data, and therefore foundational to the connection-making work that the software is supposed to do. The engineers thus recognized how important it was for the psychology team to develop a user interface that could elicit this data in a standardized way, and in a format that would be socially legible to research subjects. As researchers put it, they needed the interaction to feelfamiliar to research subjects: to feel like a communicative interaction with an interlocutor with whom they wanted to share their answers. There is a politics to this "familiarity," this interactional likeness. Researchers' ideas about what might make Abby familiar to research subjects articulate broader expectations regarding what kind of human listens to you in the thoughtful, empathic way that Abby is supposed to imitate. They also articulate the value of this kind of human. Some researchers like Klaus have specific research interests invested in the VHI, but the broader, shared goal of the team is to market the VHI as a public health tool: a technology for streamlining psychiatric assessment. While diagnosis is the medico-legal designation of an illness, psychiatric assessment is a more informal triage process. Assessment involves sorting potential patients into categories: people who might be showing signs of psychic distress and are therefore in need of medical diagnosis (which would grant them access to insurance-covered treatment) and people who are not sick. My informants argued that a tool like the VHI, which they were developing to do this sorting work on behalf of humans, would save money, time, and save people from burning out in emotionally laborious jobs. Anthropological and STS scholarship on automation has also pointed out that in order to automate human labor, that labor has to first be conceptualized as mechanical and unskilled (Irani 2015; Hicks 2017; Eubanks 2018; Taylor 2018; Ticona and Mateescu 2018). Therefore, in order to understand what it means to automate psychiatric assessment, we have to 203 understand how the organizational hierarchy within the teams conceptualizes the skills and the work associated with psychiatric assessment, especially psychiatric listening, as mechanical labor, and how this hierarchy replicates hierarchies of clinical labor within mental health care in the U.S. To ask who Abby is supposed to resemble, which kind of professional she is supposed to listen like, is to dig deeper into the political economic implications of the VHI and hierarchies of value within U.S. mental health care, illustrating how "claims about automation are frequently also claims about kinds of people" (Irani 2018). WHAT DO I KNOW? WHO SHOULD KNOW? HAVE I TOLD THEM? How did the Institute-and the VHI-come to be? In this section, I summarize the history of the Institute and of the VHI project, followed by three examples (the fishbowl, the stolen sign, a lie by omission) that illustrate how an ethos of paranoia, opacity, and illusion-which are reflected in and refracted through the design and logic of the VHI itself-are made concrete in the Institute's material, informational, and social infrastructures. The system was not as smooth and seamless as it appeared in videos I watched with Klaus, or in any of the promotional videos available on the Institute's YouTube page. Gaps in narrative about how it worked that were gradually revealed and then slowly filled in, not just based on what people told me in interviews or in private conversations, but what theyfailed to tell me: by a collection of secrets, silences, and lapses in information. The Institute was developed in the mid-I990s through a partnership with WCU and several military and defense organizations. These organizations were looking to draw upon advances in special effects and computer graphics coming out of the film industry and the 204 cutting-edge computer science research of nearby Silicon Valley to create training, simulation, and medical interventions for civilian and military populations. For instance, the Institute specializes in building immersive, augmented and virtual reality environments used for both exposure therapy for veterans with PTSD and also for resiliency training for soon-to-be-deployed soldiers. While technically a satellite campus of WCU, the Institute bureaucratically exists beyond the university. It is not totally beholden to the university's governance and regulations, and human subjects research conducted through the Institute must pass through the WCU Institutional Review Board (IRB) and various military IRBs. Within the Institute, researchers, post-does, and interns are partitioned into different labs. By the time I had arrived there was a sense of competition and among the labs-they all had to vie, separately, for funding. It had not always been that way. Not unlike the building of the Mars rover project that Vertesi (2012) describes, the VHI was a kind of totem, a point of convergence, and labs across the Institute gathered around the common goal of developing, designing, building, and testing it. The VHI team in its entirety included Klaus and his Multi-Modal Analysis lab (consisting of researchers trained in engineering and computer science), the Virtual Human lab (generally for researchers trained in social and organizational psychology and interested in human-computer interactions) fronted by co-PIs Allan and Valerie, both of whom were trained in psychology. The Art Department provided additional support, responsible for Abby's physical appearance, along with the Natural Language Processing lab, which was responsible for VirtuSense's speech recognition properties, and the Special Effects lab, which was responsible for developing the VHI's capabilities for tracking gesture, posture, and head SOut of all the labs collaborating to build the VH1, the NLP lab was the most bitter about the project. Because the VHI is not meant to be able to parse and analyze semantic content of speech, the system's NLP capabilities are by design quite poor and not very advanced. 205 position. There were also a series of employees, interns and WCU undergraduates like Nava and Taylor, whose paid and unpaid labor-things like voice acting for the virtual human, piloting the intervention, taking part in face-to-face interviews with research subjects, recruiting research subjects, acting in promotional videos-helped to make the whole project possible. At the time of my arrival, the VHI was more or less shelved and in a state of disarray. There was scant opportunity for cross-lab collaboration to be found. Everyone's objectives no longer aligned. Any issues that the VHI had were left unaddressed until at a time when someone would be able to secure funding to continue working on the project. This had taken me by surprise at first. Based on promotional materials that the Institute was producing and recent interviews about the VHI with the press that I had followed closely, I was under the impression that the Institute was still actively using the VHI in research studies. Moreover, none of the people with whom I had spoken in the process of gaining access and preparing for my fieldwork had mentioned that the VHI was no longer in use. This was the first taste I got of the unpredictable paths and channels through which information about the VHI circulated, and the degree to which demos and other public facing materials rhetorically reinforce one interpretation of the VHI's functionality and efficacy while redirecting attention away from others. The industrial park in which the Institute's building now sits contains manicured gardens with cacti, succulents, bright and flamboyant wildflowers, glittering, artificial streams, two miniature soccer fields, and a hatch shell for outdoor summertime concerts. The Institute moved to its current location, away from its former, ocean-side and much smaller and humbler environs, around the early 2000s, when the area was just beginning to be developed. The Institute's move also coincided with one of its primary federal defense funders setting up its West Coast headquarters in a building connected to the Institute's main building by a causeway that offered 206 an outdoor seating area with benches, tables, and potted plants. By the time data collection for the VHI project ended in 2015, there was a well-established material and metaphorical pathway between the Institute and the security and defense sector. Many researchers, especially post-docs, left the Institute forjobs at the military unit (like Jackie, whose cube I had taken over) and would still join their old co-workers and cube mates for lunch on the causeway. As this connection between the military and the Institute concretized in the wake of the Institute's move, the atmosphere and demographics of the Institute shifted. Without much explanation, several employees lost their jobs, especially many of the women in leadership positions. In the shadow of these unexpected firings, the place became less open, and more charged with paranoia. Some researchers murmured to me, at the tail end of happy hours outside of the Institute's chilly interiors, that their formerly progressive-feeling workplace had become an old boy's club. When they felt comfortable talking about it with me, researchers expressed ambivalence and frank cynicism about the Institute's close ties to the military and its changing atmosphere. At least two researchers who were non-U.S. citizens and were disturbed by the casual and historically deep connections between technologists, computing, and the military in the U.S., pointed out that the American flags flanking the Institute's entrance only showed up once the military moved in next door. Over lunch one day at a taco stand near the WCU campus, I timidly explained to Hillary (a research affiliate working under Allan and Valerie) and Zach (a PhD student supervised by Allan and specializing in robotics) the phrase that other scholars had used to describe close ties between technologists and the defense sector: the military-industrial- entertainment complex." They both shook their heads bashfully and Hillary said, with a chuckle, 4Julian Bleecker's (2004) term, "the military industrial light and magic complex," may have more accurately captured the Institute's particular blend of Hollywood swagger and special effects technology with illusion, spectacle, and military money. 207 "yep, sounds about right." I was initially unsettled by how physically present the military was at the Institute. It was not unusual to encounter uniformed army or navy personnel lounging and joking at one of the restaurant style booths in the kitchen. Once morning, I found myself making small talk by the coffee machine with a woman who designed "intuitive to use" weapons. Her products were much easier to operate than the coffee machine, she griped and boasted. I had also misunderstood that the Institute's military ties meant that funding was plentiful and stable. In fact, the Institute was a precarious place to work. Researchers and employees, even the security guards and people working in H.R., frequently came and went as they graduated or as they sought employment elsewhere. Because of the Institute's unusual relationship to WCU, academics employed as head researchers or PIs could not seek out tenure either at the Institute or at WCU, although they often taught classes and advised students. They had to continually apply for external grants to fund their researchers and the non-graduate student researchers who worked for them.4 5 Once a year, PIs had to hold days-long marathon meetings with military officials, arguing in favor of sustained funding for their job, their research, and their lab. These meetings would always take place in a large conference room, nicknamed the "fishbowl," on the second floor located in front of the elevators. The wall of the room that faced the elevators was entirely made of glass and completely transparent, offering a clear view into whatever was going on inside the room. However, in the event of a particularly important meeting, the glass wall was covered by a veil of running water that could be turned on or off, making it difficult to distinguish who was in the room, revealing only the blurry outline of their figures. At first, I found the gentle sound of 208 the water pleasant and calming, but I realized that it also had the effect of making it impossible not only to see but also to hear whatever was going on in the room. I came to take the running water as a sign that the meeting in the fishbowl might have serious consequences for my informants and the sanctity of their research projects and their jobs. And I came to see the fishbowl as a metaphor for the Institute, and the play and performance of secrecy and transparency that characterized it. In the fishbowl, serious and consequential matters were discussed "out in the open," technically public and available to all but yet concealed, the facts of the matter distorted and obscured. Things were not as they seemed, and the distinction between what was private and taboo and what was already known and common coin to all was unclear. WHAT DO I KNOW? WHO SHOULD KNOW? HAVE I TOLD THEM? .f-0 Another illustration of the Institute's ethos of and predilection for concealment, and of the shaky distinction between secret and matter of fact, is the case of the stolen sign. The sign had hung from the ceiling, in between the second-floor elevators and the fishbowl. In red, bold block script, it asked, WHAT DO I KNOW? WHO SHOULD KNOW? HAVE I TOLD THEM? 209 Next to this message were three circles meant to represent three people, all of whom were connected by three arrows, forming an interlocking loop. Hillary and I both interpreted the sign to be encouraging a citizen watch campaign along the lines of "if you see something, say something," suggesting that researchers help keep each other in check and be on the lookout for suspicious behavior, seeming to imply that the Institute had security issues. I did not notice that the sign had gone missing until I received an Institute-wide email about it, sent by a military liaison who rarely ventured below his office on the fourth floor. In the email, the liaison claimed responsibility for making and hanging up the sign. He had used similar signs previously, in a variety of contexts in groups and organizations of varying sizes (from 30 to 400 to 42,000 people) to help keep the flow of information running smoothly. Communication and openness, he reminded everyone, were key to the "organizational health" and mission of the Institute. The signs were meant to help everyone recall that they might know something that could have benefited others, and to circulate that information as liberally as possible. The sign was not meant to belittle the expertise of researchers, but rather to prevent bottlenecking or the siloing of information, which everyone recognized to be endemic to the Institute. For example, many of the people who gained expertise in operating the VHI for public demonstration purposes no longer worked at the Institute or else claimed they had forgotten everything they had learned. When Hillary was called upon to demo the VHI at a public WCU symposium, she eventually turned to me for help. Because I had spent so many hours, so many days, trying to piece together an institutional history of the project through interviews, toiling through the Institute's online archives, conference proceeding, research papers, and press releases, I had the clearest sense out of anyone else there of the VHI's backstory. So the sign was not a citizen policing campaign, but a proactive bureaucratic, and 210 cybernetic call to disciplined information sharing for the benefit of the Institute as a whole. The week following the sign's reported absence, Klaus, Hillary, a project manager (PM) of another psychology PI and I took ourselves out to lunch at a restaurant located a 10-minute drive from the Institute. Halfway through our meal, the PM revealed that he knew who stole the sign. Klaus and Hillary, delighted, begged him to confess, but the PM would not yield to their pleas. He was not going to reveal the thief's identity to anyone, he told us. He would simply ensure the sign's silent return and would do his best to ensure the thief did not lose face over such a trivial prank. This led to a rowdy joke from Klaus, directed at the PM: "WHAT DOES HE KNOW? He knows who stole the sign. WHO SHOULD KNOW? The guy who made the sign. HAS HE TOLD THEM? Fuck no!" The stealing of the sign, the protection of the thief's identity, and Klaus's joke are a means of protest and resistance against the heightened feeling of suspicion and an asymmetrical transparency at the Institute. Why should researchers be transparent with each other, sharing what they know with someone who might not know it, when they were all competing with each other for funding, and when the logic driving decisions about their jobs-like the prompt and mysterious firing of many employees-would never be as transparent? This will-to-not-know and refusal to keep information circulating was a form of self-preservation, a way to protect one's self and one's peers from recourse of the men running the Institute and the control they had over the allocation of resources. Researchers at the Institute practiced self-preservation through small acts of refusing to let information be known to as many people as possible or simply thumbing their noses at this sentiment. This practice did not just concern office gossip, like who had stolen the sign. It also involved refusing to circulate information about the VHI's own shortcomings, which might have broad implications for not only individual researchers but for 211 the sanctity and reputation of the Institute as a whole. Because the VHI was so charismatic, it attracted frequent and sustained attention from the popular press. The VHI was the Institute's prized prototype. Some even called it the mascot of the entire Institute, but it had significant shortcomings. There was a misalignment between what they hoped it could one day do (operate on its own without an architecture of human support, accurately provide assessment scores, respond in a socially appropriate way to a person's tales of distress) and what it could accomplish and execute in its current, shelved form. For instance, promotional videos of the system and videos recorded for public press coverage give the impression that the VHI is incredibly, socially adept. Interactions captured in these promotional videos seem smooth-there are no gaps or awkward pauses, Abby nods her head at all of the right places, and Abby apologizes if the subject's speech overlaps with Abby's. But only after having watched many videos of research subject's interactions with the VHI and parsing through the archive to find earlier versions of Abby's script did I realize that all of the publically available versions of the video feature older versions of the system: the version of the system in which Taylor and Nava pupeteered Abby's bodily movements and the timing of the questions. This stage of the system's development is referred to as the Wizard of Oz-the WoZ or wizarding stage-standard terminology for human-computer interaction (HCI) studies. 46 Researchers use the WoZ phase to figure out which components of the system work best. Interactional data gathered from the WoZ stage, like the optimum time Abby should pause before answering a question, get built into the final, automated version of the system. But the 4 This term is a reference to the wizard of L. Frank Baum's The Wonderful Wizard of Oz. Characters spend the book on a quest to find the wizard, promised to be capable of magically solving their woes. When they finally arrive, they realize that the wizard possesses no magic; the form they encounter as "the wizard" is actually an automaton, controlled by a regular, run-of-the-mill human man, who conceals himself behind a curtain. Likewise, in the WoZ experimental paradigm, the research participant should experience the technology being tested as totally autonomous. Meanwhile, researchers keep the humans who animate the supposedly autonomous technology's interactions, hidden from the user. 212 automated version of the VHI was far less smooth than the WoZ system. Klaus, Taylor, Nava, and others who were involved in moving the VHI from the WoZ phase to the automated phase conceded that talking with the automated system was awkward: there would be too-long phases between the research subjects' speech and Abby's questions, for example. Discussing the VHI to outsiders-including non-Institute affiliates, like visiting anthropologists-required not only choosing one's words carefully, but also choosing when to let misinterpretations or misunderstandings go uncorrected. I experienced this firsthand when I found myself on the receiving end not of a lie per se, but of a failure to disclose. The admission of an omission was let loose during an interview with Nava, one of the two young women who controlled the VHI and helped to develop its interactional infrastructure. As I will explore in more detail, Nava was the voice actor for the virtual human; Abby speaks in her voice. She conducted face-to-face interviews with research subjects, interviewed research subjects using the virtual human in the WoZ, and then monitored subjects' interactions with the VHI in the automated phase. She had been an undergraduate at the time and worked on the project until data collection ended in 2015, subsequently leaving the Institute to pursue a doctorate in neuroscience elsewhere in the state. I had been asking her to speak in more detail about her involvement in facilitating face-to-face interviews and then tagging that data for analysis, when she casually let slip her understanding that VirtuSense cannot map sentiment to paralinguistic, non-verbal features of speech in real-time, as Abby's conversational partner is speaking: "it's [VirtuSense] not saying like this line was delivered in this tone, it can't map to that degree...all they're [the researchers] doing is looking more on a global scale overall, this is the sort of like inflection that the user was displaying [...] the truth is it's not it's not looking at the tone of that specific statement, there's no tool I know that can do that." 213 In other words, the emotional, psychic tenor of a speaker's voice is calculated at the end of a subject's conversation with Abby, once the interaction has come to a close. This analysis does not unfurl as the conversation progresses, which was the impression that I had had. I had clearly expressed this assumption to others at the Institute, including in private, one-on-one interviews. By the time I interviewed Nava, I had been at the Institute for roughly two and a half months and was about half-way through conducting interviews with researchers pulled from the list Klaus had made me. But none of them had clarified or pointed out my misunderstanding. What's more, this information contradicted what Klaus and Valerie had told me about Abby's responsive capacities. They had conveyed to me that Abby is able to respond with socially appropriate body language because, supposedly, VirtuSense tracks the emotional tenure of conversation in real time. Confused, taken aback and not sure who to believe, I told Nava that I had not realized this to be the case, expressing shock that no one had ever explained it to me that way. Nava was not shocked at all to hear me say this, though. On the contrary, and speaking as a veteran of the Institute, she told me this kind of thing was par for the course. "To be honest," she began, I feel like they leave a lot of things open ended because they want you to interpret it in a beneficial light which is what will happen, you're gonna interpret it the way that you want to and they're not gonna correct you [...] when you ask the right questions it'll come out for sure...or even just looking at the dials very carefully. Nava's comments ring true about the system itself: Abby, the system's interface and her listening "body language," co-fabricate a not entirely honest depiction of thoughtful listening. Abby's designers cash in on the familiarity of her attentive body language in order to encourage a specific (mis)interpretation of how the system is listening to ensure that research subjects speak (and emote) as much as possible so that VirtuSense gets the data necessary for its analysis. But I also like to think of Nava's comments, her insistence that researchers purposefully leave things open ended and prone to whatever suggestive (mis)interpretation I might take up, alongside 214 Klaus's insistence that I get to know the VHI data because it would help me know what kinds of questions to ask. Perhaps this was Klaus's subtle way of suggesting-without directly saying so-that I look closely at the dials, as Nava put it: that I "learn how to ask" (Briggs 1984), learn to take what was given to me and examine it critically, pursuing the gaps in people's explanations because the truth would not be articulated in a straightforward way. Faced with inquisitive outsiders like myself, researchers at the Institute created their own fishbowl, playing with the transparency and opacity of facts and fiction, through strategies of deferral, misdirection, and concealment. The polyvocality of letting things go unsaid-of refusing to reject an outsider's misinterpretations and instead allowing them to flourish-is refracted through the VHI and its virtual human interface. Just as the team relied on outsiders to fill in the blanks on their own as to what the VHI was doing-how and what it was listening for-as I will discuss, the design of the interface offers up a space of projection and fantasy, an openness as to what Abby's animated bodily movements me, and as to who Abby is supposed to be. And, just as the VHI calls into question what it means to listen in psychiatric contexts, conducting fieldwork in a space like WCU's Research Institute challenged my own assumptions about what it means to listen in ethnographic contexts. People did not always say what they meant, and not necessarily due to a desire to conceal information from me or to keep something secret. Before every interview I conducted, after presenting researchers with my consent form, they would ask me, "am I allowed to sign this?" And in the interview itself, likewise, they would ask, "am I allowed to say this?" These were rhetorical questions, of course, since as an outsider I had no sense of the limits of what they could or could not reveal. Nevertheless, the questions were telling: it was clear that researchers themselves did not fully understand, or trust, the limits of what could and could not be known. Fieldwork in such spaces requires a pursuit of thin 215 listening, following Jackson (2013) and Benjamin's (2019) conceptualization of "thin description" as an anecdote to Geertzian thick description. Like thin description, thin listening is a method of humility, a method of attending to surfaces "such as screens and skin," key features of interfaces like Abby (Benjamin 2019: 45). Thin listening implies that there is no absolute knowledge to be acquired, no god's ear trick in which all ethnographic data will be revealed evenly and completely, not only for epistemological reasons, but out of respect of other people's (like research subjects') boundaries. GHOST STORIES In this section, I follow up on my initial encounter with the VHI in Klaus's office, focusing this time on Klaus's team and the component of the technology they built: VirtuSense, the system's software. I compare the psychology team's visions for the VHI's application, and the way in which they envision users interacting with the user interface, with the ways in which the engineering team interacted with the research subject's data. I describe the different ways in which Klaus's students confronted the research subjects' speech, as opposed to Nava and Taylor. While the two women encountered and interacted with research subjects face-to-face, or through the interface of the virtual human, the engineering team members approached the research subject's interviews with the VHI as data-auditory and visual data-that can be reduced to its formal qualities. Thus, I explore the motivation behind Klaus's reduced listening-a mode of attending to the acoustic components of speech alone. The approach that the people on Klaus's team take toward the data is not a cold, detached mode of listening that denies the individual personhood of the research subject, but rather a professional mode of interpretation, which we 216 might call following Charles Goodwin (1994) and Thomas Rice (2010) "professional listening." Unlike the psychology team at the Institute, as the PI directing the engineering team, Klaus is not interested in the nuances of how or why people come to trust Abby. As far as he is concerned, Abby (the interface) is useful because it ensures standardization, since Abby asks the same questions in the same way regardless of any external factors. He is much more concerned with VirtuSense, the system's software he helped to build, and the pursuit of what he calls the "vocal thoughtmarkers" of psychological distress: signs that suggest the presence of either depression or post-traumatic stress disorder (PTSD) or (in his work outside of the VHI project) signs that suggest that the speaker might commit suicide. Klaus used the term "thoughtmarker" to put into words what his research centers on, and to describe how it aligns with but also departs from Ted and Sushant's research at ECU. Klaus uses "thoughtmarker" rather than "biomarker" because he does not ask after or look into human biology, although other researchers do use this term to describe markers of neurocognitive processes (Just et al 2014; Rea 2014). His goal is to use artificial intelligence techniques of pattern recognition to identify connections between human behavior (with an emphasis on acoustic features of spoken utterances) and standard diagnostic criteria, more or less black-boxing the brain. Unlike Sushant's team, Klaus takes an almost behaviorist approach to studying thoughtmarkers. He is concerned with automating the connection between inputs (the psychopathological processes of mental illness) and outputs (the sounds of speech) but not necessarily in understanding the nature or the causal mechanics of that connection. 4 Not only do Klaus and Ted know each other, but Klaus organized a special session at a major, international speech signal processing conference on vocal biomarkers of neuropsychiatric disorder and invited Ted (plus a PI from my third fieldsite) to speak. Ralph ended up presenting the group's research at the session; Ted sat in the audience with me, in the row in front of me. 217 In public presentations, promotional videos, interviews with the press, and when guiding visitors through tours of the Institute, researchers on the psychology team took pains to declare that Abby was not a therapist, that encounters with her produced an assessment rather than diagnosis, and that Abby was absolutely incapable of conducting psychotherapy. Allan, Valerie, the Virtual Human lab's co-PIs, and researchers throughout the Institute were keen to emphasize that the VHI is an assessment tool. The purpose of the technology is not to make a diagnosis. Nor did Institute researchers wish to build a tool that could stand alone and in place of professional clinical judgment. Instead, and following the dictates of their military funder, they wanted to build an assistive tool that could help a mental health practitioner make a diagnosis, providing them with additional insight alongside whatever expertise they brought to the clinical encounter, helping them to determine the extent to which the patient was in need of care. Sometimes, Allan and Valerie would introduce Abby as a triage technology. Painting a scenario in which Abby was the first "person" a potential patient would interact with, a gatekeeper determining whether or not the patient would see a human professional (if they were in dire need of care), be sent home. At the same time, there was less of a clear and straightforward story of who Abby was supposed to be, and how she came to look and move, and "listen" the way she does, a mystery I discuss in more detail elsewhere in the chapter. Researchers gave even less of a straightforward story about the research subjects-who they were, and the kinds of things they spoke about with Abby. This was partly because very few researchers, I found, actually interacted directly with either research subjects or with their data (the recorded interviews). It was roughly three weeks after my meeting with Klaus, at his birthday party, that I finally had the chance to speak in depth with two of his students-Edward, an undergraduate, and Alok, an advanced PhD student-who were some of the few engineering students who had 218 worked on building VirtuSense and were still at the Institute. Their cubicles were positioned far from mine and I rarely saw them milling around the Institute's lounge or kitchen. My only extended interactions with them were during the weekly, thirty-minute check-in meetings that Klaus held for all of his students in Grace Hopper. Almost all of the meeting time was spent discussing the finer points of the machine learning side projects in which Klaus's students were involved, and I would occupy myself by taking detailed, verbatim notes (as if that would make the material less opaque) and nodding along (like Abby-as if I followed) laughing when everyone else did though the jokes made no sense to me. Apparently, I was not alone in my confusion. One of my cubemates-a French post-doc working under Klaus-once ushered me into his cube and asked me in hushed tones howI , a non-initiate of machine learning, managed to follow the conversations in the check-in meetings. He wanted some tips, since he confessed that even he struggled to understand what was being discussed. Klaus's birthday party-he was approaching his mid-30s-was held at a hip local pizza place with 1970s-style wood paneled interiors and a DJ spinning vinyl records of pre-disco funk. It was a relaxed and friendly affair. Researchers who were normally brusque and distant were bubbly and talkative, sharing pitchers of beer, laughing loudly and playing pool. Everyone in attendance gathered around a single table to present Klaus with a cake and sing him happy birthday. He threw back his head in his typical, uproarious laughter when he saw that someone had scrawled HAPPY BIRTHDAY DR. MULTI-MODAL in loopy cursive on the cake's surface. I took advantage of the frenzied moment to pull up a chair between Edward and Alok, who were sharing a can of Pepsi (Edward, a junior WCU engineering student, was not yet 21). With the raw and intense VHI video I had reviewed with Klaus still on my mind, I was hoping that the two students would speak candidly with me about their experience working with 219 the data. After all, Klaus had marked their names down on the list of people with whom I should speak, saying that Edward in particular was responsible for pre-processing the audio and video data. I had no real understanding of what this work entailed, but I assumed that if either of the students had processed the videos, then they must have, at some point, listened to them, especially since they were working for Klaus, who was so concerned with the relationship between acoustic features of speech and mental states. My curiosity fired Edward up right away, and he responded viscerally to my question about his experience working with the data. "Oh, what a nightmare, we could tell you ghost stories about the VHI data it was so scary," he warned with exaggerated seriousness, his eyes widening from behind his frameless glasses. "If you want to hear about the data it'll be a ghost story! We have to sit around in a circle on the floor and turn off all the lights and I'll put a flashlight under my chin like oooOOOo and we'll make like, s'mores." Alok, and other students surrounding us who had started to eavesdrop, snickered. Once again not in on the joke, I asked him to tell me more-why was the data scary? Playing the mature older brother, Alok interceded before Edward could say more: "He means it was a nightmare because the dataset was such a mess and required a lot of pre-processing." The French post-doc, who also had been listening in, stepped in to further disambiguate: "Edward's still young so he's never worked with a real dataset before," he said, more of an aside to me than a comment to the group, he's used to working with data for class that's been all cleaned up for you already. The VHI dataset is ok, messy but ok, more like a normal data set. It's just normal. There are issues with the audio not aligning with the video, or sometimes there's no audio, or sometimes you can't see Abby or the audio is not very clear and there's lots of noise, so you can't make assumptions about the state the data is in before you begin extracting features from it. The dataset was "normal" because it contained irregularities, the outcome of unanticipated malfunctions, things like a research subject putting the headset on backward and covering up the 220 microphone with the hood of their sweatshirt. Examples of "messy" data also include data with poor audio quality, or instances when the video and audio were out of synch. Poorly synched audio and video made it difficult to cut up portions of the video into analyzable chunks (in a single chunk, the audio and video might not correspond). This was particularly troublesome due to the goal of analysis: correlate the visual (facial expressions, gestures) with the auditory (acoustic features of speech). So before Edward could work through the VHI data, it needed to be "cleaned up." He had to first determine which data was usable, although in his naivete and zealousness he had started working on the data before realizing some of the data might not be useable. Edward affirmed that the biggest lesson he learned from the nightmarish ordeal with the VHI data is that it is important to review and understand the state of the data before attempting to work with it. This echoed Klaus's justification for inviting me into his office to get to know the VHI data. Edward assured me that, in the future, he would follow Klaus's advice and avoid making assumptions about the state of a dataset by reviewing it thoroughly first. Even more confused, I decided to ask Edward outright: "when you worked with the data, when you reviewed it, did you listen to it? Because some of the stuff people say is really, really, messed up, and I'm wondering what you did about that." The atmosphere shifted, and I was met with blank stares from Alok, Edward, and the rest. I was not sure what I had said wrong, or what the blankness of everyone's faces meant. It was only later, after having talked with Klaus about his students' abrupt and mysterious reaction to my question, that things made sense: Edward, Alok and the others probably had not watched (or rather, listened) to enough videos for long enough or for enough times to fully grasp or internalize their contents. Even when Edward had played the video, he did not absorb the audio or visual content-he was focused on analyzing the videos for characteristics that would get in 221 the way of later analysis, like a lack of audio-visual alignment. Klaus conceded that even he has not listened to all of the videos in the dataset (there are close to 500 videos). Over the past five years or so, he has only reviewed maybe 50 or 60 of them total. He explained the steps of processing to me, to help me better understand why his students might not be familiar with the dataset's narrative contents: [we will] sporadically like listen to a minute or two and then a few [videos] we watch in their entirety but just to make sure that the virtual human doesn't do weird stuff. What we often [do]- and like my students rely on-is, we do basic feature extraction methods that you might call machine listening, where we basically do signal processing and that kind of extracts features or characteristics of the signal and then we basically analyze with respect to the statistical validity or the statistical differences and then also, double check that the measures that we extract are also like helping a classifier to identify differences in uh people's behaviors and identify if a person is depressed or not. Their listening was cursory because they listened in order to assess the formal properties of the video in order to separate unusable data from usable data. If Abby's responses were lagging, if there were extended pauses between Abby's questions and a subject's response, if her audio was out of synch with her body movements, if the participant was unable to hear Abby's questions, the data could not be used. Klaus and his students cared first and foremost about the quality of the signal. Their aim was to find statistical validity. For example, how did qualities of the acoustic signal, the frequency of consonants and vowel sounds, line up with or deviate from the phonetic norms of Standard American English, or the acoustic qualities associated with the sound /ba/ versus /pa/? After isolating these qualities, the final step would be to train a classifier to recognize them in a stream of speech. It was part of their job, then, to detach form from content. They were unfamiliar with how disturbing the videos were because focusing on that was beyond the task at hand. As far as they were concerned, processing the videos meant being able to weed out signal from noise, and this did not involve internalizing the videos as narrative testimony. 222 On the one hand, it might be tempting to read this as a sign of negligence, or a willful following of professional codes of practice and interpretation at the expense of emotional attachment or investment in the research subject's upsetting, confessional illness narratives-a case of computational detachment. One thinks, for instance, of the Rodney King trial in the early 1990s. Though white supremacy was the animating logic of the officers' acquittal, as Goodwin describes (1994) describes, the legal defense team stripped the video footage of its racist motivation by slowing down the beating of King, breaking the violence into disconnected, formal events, refraining the beating into the expert practice of "de-escalation" to make the point that the officers were simply doing their job (see also Feuerherd [2018]). On the other hand, it was indeed outside of the task at hand for team members like Edward to listen to and absorb the videos contents, also because as an up-and-coming engineer, he lacked the proper training that would prepare him for this difficult work. Health care practitioners who must listen to and analyze traumatizing, disturbing stories from their patients as part of their jobs receive extensive training on distancing themselves and attending to the secondary trauma that patients' stories might ignite. But as I will discuss in Chapter 4, mental health care professionals-like people conducting psychiatric screening-also listen strategically and selectively with the intent of filling out a psychiatric inventory, although they may perform intersubjective sharing as a means of establishing trust (a tactic that Valarie and Allan tried to build in to the VHI's interface). Maybe, for Klaus's team, to absorb the content would amount to a violation of the subject's privacy. For instance, because I had been added to the VHI study's IRB protocol, I was technically research personnel, and the subjects had consented to allowing research personnel to access and analyze their recorded conversations with Abby. Nevertheless, though I had the subject's consent, watching the videos felt voyeuristic-had the research 223 subjects really understood that anyone, so long as they had the team's consent and filled out the proper paperwork, could access their files? Paradoxically, Klaus's and his team members' failure to absorbs the videos-their thin listening, the fact that they forgot what the videos contained aside from how much pre-processing they required-respected the boundaries of the research subject's privacy, affirming the gravity and the intimacy of the things the research subjects shared with Abby. I'M DOING ALRIGHT...I GUESS When I had watched the videos in Klaus's office and then later on my own, the progression of the interaction between research subjects and Abby always impressed me. Almost without fail, subjects would go from being reserved, stiff and unsure, to relaxed, their responses growing in detail and length, becoming more reflective, more involved in retelling their own stories. What were the mechanics of this trick? How does Abby transform from being more or less a standardized, pen-and-paper psychiatric inventory to a human-like interlocutor with whom strangers are willing to share their most private stories, like the memories they wish they could erase from their minds? In this section, I attempt an answer to this question, focusing this time on the design and development of Abby, the interface. I parse out what the VHI discloses about culturally specific conceptualizations of empathy, which I contend are wrapped up in ideologies about the ability of speech to convey the contents of a speaker's self. Thus, I stitch together the language ideologies in operation in the VHI system with models of the self as a container for private and otherwise secret, concealed information that exists in an indivisible, unique "core" at the center of every 224 person (Rosaldo 1984; Lutz and White 1986; Lutz and Abu-Lughod 1990). In particular, I zoom in on the gap between the listening that Abby performs with her non-verbal responses to a speaker's utterances and bodily expressions, and the limited and reductive "machine listening" that VirtuSense is capable of. As noted, the VHI is arrested in its development stage. Part of the reason why the system cannot be deployed in clinical contexts, outside of a controlled research study conducted at the Institute or off-site under the supervision of Institute researchers, is because of the system's limitations when it comes to analyzing semantic content. The VHI is designed for the reception of speech, and so Abby, the user interface, does not say much. Researchers constantly reminded me and would recite during tours of the Institute held regularly for the general public that Abby is strictly a "listening agent," emphasizing the system's receptive passivity. VirtuSense has poor natural language processing abilities. The only verbal responses that the decision tree in Abby's programming code allows for, aside from the questions in the assessment scales, are open-ended follow ups ("can you say more about that?") all to get the user to speak more and speak continuously. The system's passivity brings to mind ELIZA, the chat-bot creation of MIT computer scientist Joseph Weizenbaum designed to answer interlocutor's questions in the form of a Rogerian psychotherapist employing the "echoing" technique by simply reiterating, verbatim, the text that the interlocutor had typed in the form of a question. Despite the supposed passivity of this echoing, as was the case with Abby, users felt great catharsis in chatting with ELIZA, and described their interactions with system to be therapeutically efficacious (see Wilson 2010) As Valerie explained, Abby's main job is to "evoke emotion" and encourage users to "open up" so that the user produces as much data as possible for the software to analyze. A 225 handout given to research subjects scaffolds the interpretation that Abby understands and is thoughtfully attentive to the narrative content of their inner selves: Abby meets your smile with a smile of her own because "the software tries to take your feelings into account." Abby provides scaffolding as well: "I'm here to learn about people," she says, "and I'd love to learn about you." In other words, Abby performs one mode of listening - built to look and sound as if she is attentively listening to the semantic content of a speaker's verbal utterances, all in order to enable VirtuSense's mode of listening which is agnostic to the semantic substance of your speech, a mode of listening for sound features that Klaus and his students identified to be salient markers of psychic distress. I was only able to understand the full extent to which VirtuSense is incapable of attending to semantic content by witnessing the system from the inside out, in the course of a failed experiment involving a humanoid robot that I worked on alongside Hillary and Zach. The experiment had been Valerie's idea-she wanted to make use of a robot that a Japanese researcher had lent us while he was visiting to present at a public symposium Allan had organized on human-robot interactions. The experiment seemed doomed from the start. Hillary and I frequently had to call Klaus down to the WCU campus to help us figure out what was going wrong with VirtuSense. What's more, Hillary had predicted the experiment's inevitable failure even before the robot had arrived at the Institute. She knew that VirtuSense was incompatible with Windows 10, but Allan and his project manager insisted on only securing a computer for the study that ran on Windows 10. Hillary, Zach, and I nevertheless went through the motions of putting it together. According to Valerie, the purpose of the experiment was to determine whether the VHI interface indeed had an impact on people's willingness to trust and disclose personal information 226 in the course of the system's assessment interview. She wanted to compare people's interactions with a virtual character (Abby) to an embodied, real life character (the android), exploring if interacting face-to-face with an embodied, human-like form as opposed to interacting with a screen would impact people's feelings of trust and rapport. To achieve this, we would have to figure out a way to run VirtuSense through the android, so that the audio-visual data captured by the webcam and microphone could be analyzed. Both Abby and the android were designed to have similarly gendered bodies. Many researchers told me that they are both supposed to look like women because the team wanted subjects to experience the user interface as non-intrusive and non-aggressive. Nevertheless, the experiment with the android had a significant variable that Hillary and I tried to avoid bringing up: the android was designed to resemble a Japanese woman, and Abby was not. In an effort to direct subjects' attention away from this discrepancy, Allan and Valerie had suggested that we fix up the android to resemble Abby as much as possible. Allan assigned Hillary the task of procuring an outfit for the robot that would match Abby's. He asked Hillary and I to fix and re- fix the android's hair to resemble Abby's as well. Shopping for the android and styling its hair took time. This was yet another instantiation of lower level research personnel conducting gendered, domestic labor-Hillary and I had an administrative position to both the psychology and engineering teams. In this instance, the labor took a very blatant form of social reproduction: ourjob was to recreate gendered presentations of hair and dress on a piece of machinery, all in pursuit of making the robot look convincingly like a virtual woman (Abby), who the Art team had designed to look convincingly like human woman. It was our labor that animated the robot's gender, and reaffirmed Abby's gendering.48 " For a discussion of the historical resonances of human-like robots in Japan, and the use of robots to reify and reproduce notions of gender, kinship, and the family, see Robertson (2017). 227 The subject pool would consist of undergraduates from a WCU Introduction to Psychology course. Some would be interviewed by the robot, and some would be interviewed by the virtual human, with the same questions asked every time. Hillary, Zach and I joked that we knew the outcome of the study before it got started. We didn't need the study, and the thousands of dollars it took to assemble it, to prove that the android (with its oversized hands and yellowing, corpse-like skin) was terribly creepy. Hillary styles the robot's hair for a public showcase of the study, while other researchers attend to the computers in the background. We set up shop in a cloistered set of offices, set along the perimeter of a cavernous, high ceilinged reading room in WCU's gothic style library. In one office, we propped two camcorders on tripods, one camera trained on the robot and the other trained on a seat in front of the robot where research subjects would set. Zach and Hillary carefully taped a Microsoft Kinnect to the wall over the robot's shoulder. In the other room, we set up two computer monitors to view the 228 video feeds, and a third for operating VirtuSense, which was housed in a USB thumb drive that Klaus and Hillary called the 10K dongle (the amount of funding set aside for developing VirtuSense alone). The offices were dusty and smelled of mold, and the bundle of wires that connected the android, the android's computer, its speakers, and the camcorders, all of which ran into the other smaller office, prevented us from completely closing the door that separated the two. A large compressor powered the android, and when it was turned on, it kicked up hot dusty air and produced a sound that made it impossible to think and that the half-closed door only amplified. We had to shout to be heard over the compressor and were constantly worried it would set something in the old offices on fire. Two weeks in, people occupying the surrounding office were banging on the office door to complain about the compressor's sound on a regular basis. The experiment start date was deferred nearly a month before the project was abandoned altogether because we could not get VirtuSense to operate through the android's body, just as Hillary had predicted. But its failure opened up for me an otherwise unavailable view of VirtuSense, exposing its inner workings. For instance, when we were trying to determine if the VHI would recognize a participant's speech, one of us answered the robot asking the VHI assessment interview questions, while the others remained in the smaller office with the computer, watching the Dialogue Manager, which illustrated the words that VirtuSense's natural language processor picked up. When Hillary responded, to the question "What are some things you really like about living here?" The Dialogue Manager showed us that the NLP interpreted her response as "an to in the uh or go to" (which is not the answer she had given). VirtuSense was not only incapable of picking up the semantic content of spoken utterances-it was also agnostic to even the source of speech, and given the right conditions, A 229 would recognize the speech its own system produced as if it was the speech of a human research subject. This was not too far from the truth-after all, the system used a human voice (the voice of Nava). It was designed to treat any form of human speech, regardless of the source, as data. We discovered this after a series of frustrating and frightening days, during which the robot would begin repeating the VHI standard assessment questions, pause mid-word, and then apologize for interrupting ("oh, I'm sorry, please go on") even though Hillary, Zach, and I had not said anything. It took a frantic phone call to Klaus to figure out the cause of this demonic display: we had turned the speakers up to high, and VirtuSense was capturing and processing its own speech as if it was the speech of a human interlocutor. The system was essentially interrupting itself. SAre you Okay With Ot? Sso, how are you doing today? IM DOING OK AY ther'e good 4 Iwhereareyouf fo 1iginafy MtWht are some things you really Mke about Ni1gha lre? 0 sTerryup n TO14THE H R (S' T u What are some things you really ike about living here? isorry, plaee00 o lr IN1 THE PASI a HE UMA tWhet are some things you really ike about "g here? VaIT sH E sorry,pienAse connyseA1IN THE ARM vJUST THAT THEIR LIVES what are some things you really Vike about BWbig here? Displayofthe~~~ialogue~~anager veJrUSuT THzATn TH"tM hLJUeS yt«GmAM 0s.51L4093~5 Shtsaewmentnfg teIIitrutnisl~~. Display of the Dialogue Manager (visualizing the systemn's NLP) showing the VH41 interrupting itself, i.e., VirtuSense misrecognizing Abby's speech as the speech of a user. Witnessing this ghastly malfunction illustrated the breadth of the gap between the mode of interpreting language that Abby's body performs, and the mode of analyzing language that 230 VirtuSense is designed to execute. Abby's interactive, mirroring body language-the affirming nods that guide a speaker on, the probing follow-ups, the smiles that "take your feelings into account"-altogether performs a mode of interpreting speech that circulates in U.S. mental health care, linked to what E. Summerson Carr (2010) calls "the ideology of inner reference." In this ideological framework, speech's primary function is referential, and is directly tied to and therefore expresses a speaker's authentic and otherwise interior self. In this way, listening to speech provides a pathway to the intersubjective knowing of another's self and is central to Euro-American conceptualizations of empathy. The ideology of inner reference entails a listening ideology and a listening ethics-a way that speech should be interpretively and sensorially attended to that matches with what speech is doing, and how it works. In turn, VirtuSense effaces the ideology of inner reference, listening not to you butfor your speech's sounds. In turn, Abby's animation reinforces or rather exploits the ideology of inner reference and its adherents, with an embodied performance of empathic listening, anticipating that the vulnerable research subjects will recognize and participate in it, encouraging them to produce "illness narratives" that are meaningful to them but are meaningful to VirtuSense in a radically different way. The ideology of inner reference has vaguely psychoanalytic undertones. It implies that the self is otherwise interior and hidden form the world in which the speaker inhabits. This self also has a depth to it and is cushioned by layers that wrap around and shield its more private, indivisible core. Valerie and Allan on the psychology team expressly designed Abby according to a psychological theory that reinforces this model. Otherwise known as the onion theory of interpersonal communication, psychologists Altman and Dalmas (1973) developed Social Penetration Theory in order to describe the formation of intimate relationships. When Hillary and 231 I were asked to salvage the robot study, Valerie gave us an assignment: produce a series of staged videos, shot to seem as if the interview and the robot were interviewing me. In one video, my character's responses should be "mundane," and in the other video, they should be "intimate." We were to make four videos in total: one intimate, one mundane, with the robot interviewer, and one intimate, one mundane, with virtual human interviewer. Finally, she asked us to send the videos out to Amazon's crowdsourcing "microwork" platform, Mechanical Turk (AMT), for AMT workers to view and rate.4 9 Hillary and I wrote a first draft of the script modeled after the research subject population but as we worked, we realized we did not fully understand what Valerie was seeking. We told the story of a woman (my character) who had grown up in and out of foster care and as a result did not have many friends as a child. She was now estranged from her family, after a rough patch of substance abuse in late adolescence. For the "mundane" version of the script, the answers my character gave were short and terse. For instance, for the question, "how are you doing today?" the mundane character responded, "I'm doing alright." In the "intimate" version of the script, the character expanded on her terse responses, revealing more about how the questions made her feel. For instance, she would respond, "I'm doing alright.. .I guess," pausing heavily and casting her eyes down. Valerie rejected the draft, asking us to "make the intimate one more intimate." Hillary asked her to be clearer. What, precisely, did she mean by "intimate"a nd "mundane"? Valerie's " Employers - anyone from university researchers, like my informants, to tech start-up - send out tasks to be completed through AMT, and AMT workers or "turkers" execute these typically menial tasks (such as click through hundreds or thousands of images and identifying images that contain cheetahs versus domestic cats) for below minimum wage. Irani (2013) has observed that AMT platform-the website through which employers request jobs-keeps turkers hidden from employers and from the people who benefit from their labor, such as internet users making google image searchers of cheetahs. Through this "redistribution of tedium" (Irani 2013:729) AMT's infrastructure helps sustain the illusion that "innovation economy" of Silicon Valley runs on creativity rather than drudgery. 232 response was prompt: she included in her email the "intimacy measures" they had used when designing the questions to be asked during the VHI assessment, and the measures were based on SPT. "I'm doing alright...I guess": video still of the ethnographer performing the "intimate" script in a mock-assessment interview with Android Abby. According to SPT, he outer layers of the self are superficial and not terribly important or unique to a person. The closer you get to the core, the more private, unique, and individual the layers become. In interpersonal relationships, people build intimacy and trust by transmitting the contents of increasingly deeper layers through speech, getting closer and closer to the self. Between two humans, each of whom possesses individual selves, this exchange is mutual. But when it comes to interactions with non-human entities-virtual humans like Abby-the goal is to encourage disclosure of the contents of these deep layers in the absence of reciprocated disclosure. 233 If the goal of interactions with the virtual human is disclosure, then how does the team engineer the desire to disclose in a situation in which a non-human agent is only "here to learn about people" and would "love to learn about you" rather than share anything about themselves? The psychology team pursued rapport through the design of an agent that users would find "familiar": both in terms of the interaction the agent engages the user in, and in terms of the agent's embodiment. In the following section, I describe how Institute team members utilize race and class as flexible resources for achieving rapport. PAY NO ATTENTION TO THE WOMEN BEHIND THE CURTAIN Researchers' ideas about what might make Abby familiar to research subjects articulate broader expectations regarding what kind of human listens to you in the thoughtful, empathic way that Abby is supposed to imitate. Here, I expand on Carr's argument about the relationship between the ideology of inner reference and the interpretive practices of U.S. mental health care by underlining that language ideologies are not only ideological-they are embodied and enfleshed. They not only structure and are structured through expectations for how speech is listened to. They are also wrapped up in and reproduce expectations for which kinds ofpeople listen in that way, especially in terms of gender, race, and class. When I asked members of the psychology team why they made Abby look like a woman, everyone agreed: they wanted to ensure subjects felt like a non-aggressive and understanding agent was listening to them. There was less agreement when it came to Abby's race. Some insisted that Abby was racioethnically ambiguous by design, because this allows research subjects to project their own identity onto her and identify with her as a result. They would cite 234 Abby's voice as evidence for their claim. The psychology team had unanimously selected Nava to be the voice of Abby. I asked her why they picked her, and she guessed that it was because she had "no accent." Although Nava is American-born Iranian, she told me, "The team thought I sounded like I could be from anywhere." Once disarticulated from her body, the team imagined that Nava's voice could shed its specificity and became a resource for transforming Abby into a racioethnically blank, projective screen. Note, however, that the "unmarked," accentless, and disembodied voice is not necessarily a neutral voice-it is the white voice. This aligns with what Reed and Philips observe in their 2013 article on realism in performance capture technologies: whiteness tends to operate for the team who developed the Abby interface as "transparent universality." Together, Abby's body and ad Nava's voice formed a mirror through which research subjects could see, hear, and recognize something about their selves. Nava told me that this seemed to work with research subjects. Oftentimes, during debriefing period, a number of subjects of varying races and ethnicities thanked Nava and Taylor for giving them a chance to talk with a doctor that actually looked like them, meaning, a doctor that shared their racioethnic identity. Yet while some researchers argued that Abby could be anyone, others argued that they designed Abby to have an embodied specificity-to be both racioethnically and professionally marked. Specifically, they told me that Abby was fashioned after Googled images of "Latina social worker." If you examine Abby's programming, you find that someone gave her a Latina surname. The morphing together of these Googled women-the chain of assumptions and associations, the linkage of skins and screens, race and visuality, that they form-brings to mind Haraway's analysis of a figure she calls SimEve. SimEve is the name Haraway uses for the image on the cover of Time magazine's special fall 1993 issue on immigration, which shows the 235 smiling face of a racioethnically ambiguous woman meant to represent the "New Face of America," the impacts of multi-racioethnic marriage (259). But Haraway asks, what does this sterile, computer mediated coupling that produced SimEve dry up and hide away, in terms of colonial histories of violence and resistance? Likewise, we may ask of Abby, what does her automated "relating" and resembling cover up? What are the implications of using Googled images to design Abby after a Latina social worker, especially given that Google has been shown, through its search protocols, to sediments racist and misogynistic associations (Noble 2018) rather than producing neutral, value-free pairings between words and images? Nevertheless, the researchers who told me the story of Abby's techno-pastiche origins wouldn't cite Abby's skin color or the Latinadad inscribed in her metaphorical DNA as evidence for her being a Latina social worker. Instead, they would cite the interactional framework of the interview itself, while also referencing the socioeconomic status of research subjects and the VHI's target population. Being interviewed by Abby was supposed to feel like being interviewed by a Latina social worker, because the local VA was in a predominantly Latinx neighborhood, and assessment with a social worker in a public health setting like the VA (rather than diagnosis with a physician or treatment with a therapist in private practice) was probably the only kind of mental health care resources that the research subjects had access too. If the gendering of Abby signals expectations about women as good, passive, listeners, then the racing of Abby signals expectations about what kind of woman is most likely to fill this passive listening role in administrative mental health contexts, along with expectations about the socioeconomic status of the people who interface with care workers like Abby. There is a hierarchy of value that maps on to the distinction between therapist and social worker and the sociocultural capital that separates the two jobs (time set aside for extensive 236 schooling, money for frequent and costly licensing and credentialing, etc.) This distinction is evident in the different degrees of medical judgment that social workers vs. therapists are licensed to make. The premise of the VHI as a tool and Abby as not a therapist re-inscribes these hierarchies of clinical labor, which value the work of diagnosis and treatment as "real" medical practices, while instrumentalizing (and dehumanizing) the work of assessment. Making the Virtual Human Interviewer familiar and "real enough" means making sure that interactions with it are not-quite-professional.I t means rendering the listeners in the system invisible-rendering the traces of the Latina social workers, and Nava and her accentless voice, virtually human. Abby's designers call upon race, gender, and ethnicity as a form of flexible capital (Nakamura 2014: 933): malleable, pliant, capable of shifting depending on the rhetorical needs. Through the body and the interactional habitus of Abby, the association between race, gender, and a form of passive and administrative professional listening are reinforced and reiterated. Abby's animation both depends on sewing together of these traits, while also naturalizing and reproducing them as coupled together. The notion that Abby's listening habitus-the system's active listening body language-can be automated bears further analysis. Abby's racioethnic and gender presentation, along with the signs that the system's animation expresses means to indicate that the system is attentive to the semantic content and narrative contours of a speaker's answers to the assessment questions: together these comprise the system's rapport-building capacities. These material-semiotic flourishes-an understanding head-nod, a familiar-looking, passive, administrative listener-give the feeling of intimacy, closeness, and proximity, a sense that the interface is tuned in to the inner most regions of the speaker's self. Together, they give the impression that the expression of empathy is an automatic, ingrained response-part of the fabric 237 of being human but also, at the same time, a reflex that can be formally reproduced in a non- human machine, that requires no expertise, and that is not truly work. REACHING OUT Before the untimely beginning-and eventual end-of the android study, I worked alongside Hillary at a public exhibition on human-robot interactions, helping to showcase the android and recruit potential research subjects for the study. While the demonstration Klaus put on for me in his office at the start of my fieldwork was to show me the limits of my own human listening, for this demonstration, Hillary and I had once again to downplay the role that human mediation would play in the robot study, and that it had played in the development of the VHI's various components. Hillary and I were responsible for running the exhibition and explaining the study to anyone who passed by our table. There was a rush of people in the exhibition hall and Hillary and I spent roughly three hours fielding their questions and comments. While other exhibitors sat behind their table with their prototypes and technologies displayed on the surface of the table, Hillary and I sat the android down in a single chair, placing it in the position that a human exhibitor would occupy, while placing the other chair in front of the table, inviting people to sit in front of the android and gaze upon it. We displayed a promotional video of the virtual human on a projector screen above our table, so that we could explain the relationship between the android study and the VHI. People would sit in the chair and wave their hands in front of the robot's eyes, asking us, "can she see me? Can she hear me?" Or they would ignore Hillary and I altogether and respond to the VHI interview questions that played from a speaker sitting on a 238 windowsill behind the android's shoulder. No, we would explain, the android is not receptive to audio or visual data at the moment-there was no video camera or microphone set up to capture this data, and even if there were, we had not yet enabled VirtuSense so the data would not be processed. One the most difficult and frequent comments came from people who found the whole premise of the VHI alarming. These people accused Hillary and I of trying to build a "robot therapist," or trying to "replace humans." Hillary dealt with this kind of accusation whenever giving public demos of the VHI. She would explain that the system was still in development and not ready for actual clinical use and would emphasize that the point was to conduct assessment. The VHI couldn't provide therapy, she would say, let alone replace a human therapist, and it couldn't even make diagnosis-only a licensed, trained, professional human could diagnose another human. Keep in mind, as well, that Abby is not a therapist because some members of the research team had designed her specifically to look (and interact) like a Latina social worker. There is a hierarchy of value that maps on to the distinction between therapist and social worker and the sociocultural capital that separates the two jobs, and while the caring and services professions within U.S. health care are gendered, they are also, as Evelyn Nakano-Glenn (1992) points out, racially stratified. The premise of the VHI as a tool and Abby as not a therapist re-inscribes these hierarchies of clinical labor, which value the work of diagnosis and treatment as "real" medical practices, while instrumentalizing (and dehumanizing) the work of assessment, placing it at the margins of biomedicine and figuring it as unskillful. It's telling that people's anxieties and disgust upon witnessing the VHI revolved around the automation of therapy, and that Hillary's 239 alibi-they were only trying to automate assessment-seemed to put people at ease. Hillary's alibi resembles a description of the VHI that Taylor once gave me: When you go to the doctor you're going to see a nurse first, she's going to draw your blood and get all your baselines. So [the VHI assessment] is your objective measures. It's getting your tone of voice, your measurements [...] and it's giving a numerical output, which would then tell a doctor [...] they're showing all different signs...[that] may indicate that they're [...] maybe showing signs of PTSD or depression, and then...an actual human can make a diagnosis. According to Taylor, Abby is like a nurse because the system only provides initial indicators of how a patient's doing before an "actual human" doctor steps in with a truly medical call. But Taylor's analogy does not quite fit. It suggests that the nurse's embodied presence, holding the needle, is unnecessary. To make her comparison work, you have to treat the needle, the nurse, and the analysis of the blood as one, ignoring all that goes into finding a vein, inserting the needle, and making sure the patient stays still. Taylor's comparison is also a humble one, because it downplays the difficulty of her and Nava's own work. Interacting with Abby was not exactly like getting blood drawn, because drawing up and listening to the content of research subjects' personal stories is a charged process that can be traumatic for listeners and re-traumatizing for speakers, especially if the system malfunctioned, like the time Abby responded "that's great!" after a research subject described the passing of his wife. Taylor and Nava were actually present in many of the videos I watched with Klaus, and in promotional videos of the Virtual Human Interviewer system. In fact, in all of the system's promotional videos, Taylor and Nava were controlling Abby. They did this for the second, WoZ phase of the technology's development, during which Taylor and Nava puppeteered Abby's actions from another room, producing the interactional data that was later coded and used to build the framework for the fully automated version. And in all videos, WoZ 240 or not, Taylor and Nava had monitored the interactions from another room, through VirtuSense's cameras and microphones, listening alongside the system in a way that honored the ideology of inner reference. Neither of them had clinical experience, but it was their job to keep an ear on the interactions in case a subject disclosed suicidal or homicidal intentions (which VirtuSense's poor natural language processing could not pick up). This remote listening took a heavy emotional toll on both young women, precisely so because they had to attend so closely to the words of the research subjects' speech, and because they could not let the subjects know that they had been listening. Doing so would break the illusion of Abby's total non-humanness and therefore disrupt the experiment - their labor had to remain invisible. As Winnie Poster (2019) describes, in the context of the outsourced labor of call centers, operators interact with customers through increasingly computerized interfaces meant to hide the location and racioethnic identity of the operator-for instance, through a variety of pre-recorded audio samples that the operator plays in response to the caller's questions or qualms. The operators engage in what Poster calls "cyborg identity management": they perform their humanness to ensure the caller that they are talking to a human, that the interaction is not automated and anonymous but personal. Nava and Taylor perform their own kind of cyborg identity management-"covering up how much of the technology they are using to mediate the conversation" (Poster 2019: 259)-but in the opposite direction. The goal is to conceal the human mediation in the interaction through techniques that downplay their proximity to the research subject, a difficult feat considering the nature of the conversations between Abby and the research subjects that Nava and Taylor witnessed. As Taylor described to me in an interview, being removed from the situation and listening to someone talk about things that are very difficult, for them [...] you could feel it in the room or [Nava and I] would out loud we 241 would sigh or we would say oh my god [...]and the whole point of Abby is to be a listener, she's not supposed to be a therapist, she's not supposed to be there to give tons of feedback [...]But, on the flip side of that as a human it is difficult to hear someone kind of go through something and you almost want to reach out and give someone a tissue or reach out and hold their hand, but that's not really the intent behind any of that anyways, so. I had to remind myself of that. Part of their cyborg identity management required them to distance themselves from the research subject, to avoid the urge to discuss the conversation they had either witnessed or taken part in while the research subject spoke with Abby. Even a gesture as simple and as powerful as handing a subject a tissue would give away the young women's position as interactional mediator. During the debriefing period, Nava and Taylor had to figure out strategies for, as Taylor put it, "reaching out" to repair any psychological damages Abby may have done, all without revealing that they had observed the entire interaction, this would have broken the experimental paradigm of Abby's total non-humanness. In debriefing, they would open up a space for the subjects to share their pain independently, asking them open ended questions about how the conversation went. On several occasions, they sit and spoke with the subjects for hours after the interview had come to a close. Nevertheless, while the two young women conceded that interactions with the VHI could be painful or even harmful, they also told me positive stories about working with the VHI. They said that many subjects found the whole interaction therapeutic. That is, even though Abby was not a therapist and the VHI could not conduct therapy, Nava, Taylor, other researchers, and subjects' themselves reported that there was something beneficial, cathartic perhaps, about "just being listened to": being listened to by Abby, and then by Nava and Taylor. Moreover, it is not as if all research subjects acted and interacted genuinely with Abby. At times, they flipped the script of the conversation through recalcitrance and refusal, not unlike the "non-compliant" research subjects at ECU who refused to conduct the tasks in the fMRI machine according to the 242 Santiago's directions. Other research subjects seemed to take pleasure in Abby's artificiality. One man repeatedly asked her out on a date, not because he thought that Abby was a real woman, but because he seemed to get a kick out of pretending that she was real-that she could exist off the screen and enjoy a ride in the man's convertible with him. As Wilf (2019) describes, sometimes, when interacting with robots and other machinic, human-like forms, humans take pleasure in their awareness that the machines are not actually humans, that what they are witnessing is a strategically crafted, mediated performance. Diving into and widening the gap between the virtual and the actual can be a form of play. In this regard, the interface can offer the momentary suspension of reality, and respite from a world that has forgotten and even discarded people like the study's research subjects-disabled, unemployed, homeless, veteran. The interface, the screen, the animated virtual human, is not only a calibration of habitus and machine, automation and affect. It is also a powerful, capacious space for fantasy and projection, a realm not only of illusion and misdirection, but of possibility (Helmreich 1998). CONCLUSION: VIRTUALITY'S ECHOES The virtual human's animation and the software (which typically remains concealed from users and others who interact with the system) would seem to suggest that the VHI conducts two modes of listening, which map on to the hierarchy of clinical judgment that separates diagnosis and treatment from screening. The VirtuSense mode of listening to the sounds of speech beyond semantic meaning coincides with diagnosis, a technical skill, a form of medical judgment that requires more training and credentialing to be able to conduct. The virtual human mode of 243 listening-listening to and silently, non-verbally responding to the narrative contours and affective texture of speech-coincides with assessment, an automatic, reflexive practice (or so the story goes) that requires less training to be able to conduct. Nevertheless, both kinds of listening are necessary to the whole enterprise, the whole process of what the VHI is supposed to be doing: connecting speech sounds with interior states. VirtuSense may be able, in theory, to pin down sounds that circulate beyond the reach of the human sensorium. But in order to have any material for analysis, the system requires data. Abby, the interface, can also, in theory, do something that humans cannot: tirelessly listen to stories that are tangled together with emotion, without requiring any breaks or time to recover. But in practice, the line between the technical and mechanical, or treatment and assessment, is blurred. The presence of Nava and Taylor-hidden in the room and embedded in the body and voice of Abby-also complicates this neat divide. In troubling the binaries, they also show us something about psychiatric screening: it is both humanistic and technical, requiring honed skills as well as the capacity to be emotionally present and compassionate, which is itself a skill. Reading the cracks and troubles in the system tells us something about the nature of language and interaction in psychiatric encounters: while the expression of empathy is an important skill, it is not necessarily an authentic expression, nor is it affectively motivated alone. The interlocking ideologies of language (the ideology of inner reference) and self (Social Penetration Theory) form the basis of empathy, intimacy and rapport in interactions with the VHI, and watching this coordinated activity work (and fail) gives us to get an empirical grip on the otherwise phenomenological realm of intersubjectivity. Studying the VHI in (inter)action illuminates the crucial role that linguistic practices play in affective states that otherwise seem immaterial and ephemeral. 244 The pressing together of race and gender to form the voice and the body of the virtual human is supposed to render its animated, passive, affectively invested listening all the more convincing. Here, I am not trying to say that the VHI's creators and authors, like Klaus and Valerie, and its various distributed participants, like Taylor and Nava, set out to create a racist and sexist technology, or are personally responsible for the ways in which their technology reiterates the bundling together of qualities with types. The figure of the non-human, human-like machine as a feminized, raced servant that passively supports its users (or else threatens to overthrow its masters) hails from the much larger, Euro-American legacies of computing and colonialism (Suchman 2007; Philip, Irani, and Dourish 2012). In Irani's (2015: 733) words, "hierarchies of value have long overlapped within hierarchies of gender in the historical imagination" surrounding "artificial life," which is notoriously gendered, with male artificial life as monstrous (like the Golem and Frankenstein's creation, who often seek to kill their fathers/creators), while female artificial life forms appear as lovers or mothers (or witchesO to the (usually) mean who create them (Helmreich 1998). Think also of Pygmalion's Galatea, inert matter built in the image of her creator's desires. Think also Blade Runner's Rachel Rosen, who, in the film, is an innocent adolescent unaware of her status as an android but, in the original source text, is a cunning and manipulative seducer, who exploits the unheimlich empathy that Decker the bounty hunter extends towards androids in a way that causes him to question his own fundamental assumptions about the humanness of empathy. Think also of ELIZA, the chatbot therapist who reproduces a pantomimed version of Rogerian "echoing," a psychotherapeutic technique Rogers developed. In Rogerian psychotherapy, the therapist attempts to render their own selves as transparent as possible-they are to be a mirror reflecting back the client's thoughts and problems in a different context so that 245 the client can process them. While echoing is a complex technique achieved through verbal strategies of removing the self when reiterating the client's talk (Carr and Smith 2013). Joseph Weizenbaum parodied this technique with the ELIZA program, which repeats almost word-for- word what the interactional partner has typed. Scholars and computer scientists have critiqued people's enjoyment of the chatbot, citing something pathological about feeling soothed by a non- human entity inertly performing passive mimesis (Turkle 2007). But these readings do not account for the complicated nature of repetition and resemblance, including the unstable correspondence between original and imitation, like between Nava's voice, Abby's code and the subjects who project their identity on to her, and the Googled Latina social workers. As Inoue writes, "once the supposedly inert subject of the verbatim copy is recovered, a little universe of determinate and far from unmotivated subject positions, contextual framings, and mechanical and technical effects come into view" (2018: 218). Instead, with ELIZA as with Abby, following Spivak (1993) writing on another kind of Echo", I suggest that we "seize on the glimpse of difference" between the copy and the original, Abby and the various people refracted and captured through her (Inoue 2018: 218). Virtuality exists somewhere in this space between the copy and the original. It is not quite a faithful rendering, a direct miming or a mirror. The "virtual" has connotations of almost, but not quite, an "as if' that is never final yet never fully independent of the thing it approaches (Boellstorff 2015). Nava and Taylor listen to the research subject's speech as if they are Abby 50 Inoue invokes Spivak's (1993) critique of Freud's On Narcissism, the tale of Narcissus and Echo from Ovid's Metamorphosis to discuss the disjuncture between copy and original as a space for agency and subversion. Spivak argues that Freud's interpretation "ignores the structure of gender in the relationship between Narcissus and Echo while also exploring "her own ethics of speaking for subaltern women" who are "figured as Echo...women who do not speak and only respond to, and thus repeat forces that structure them" (Inoue 2018: 223). Spivak's intervention is to explore lapses in the correspondence between original utterance and its repetition, and how these spaces of difference "afford an intricate ethical position that prevents subaltern agency from being a knowable subjectivity" (ibid). 246 they also listen as if they are mental health care workers, despite their lack of training. And Abby listens as if she is a Latina social worker, who also listens almost as if they she is a therapist, although she is not. Virtuality has other connotations as well: connotations of virtue, which I have attempted to encapsulate through the pseudonymous moniker, VirtuSense. Here, it is productive to think alongside Lisa Nakamura's (2019) invitation in scrutinize the "virtue" of virtual reality (VR) documentary-style films that are supposed to invoke feelings of empathy for the people and places the experience allows viewers to feel close to. In her talk, "Virtual Reality and the Feeling of Virtue: Women of Color Narrators, Enforced Hospitality, and the Leveraging of Empathy," Nakamura explores the use of women of color narrators in VR films that promise to put audience members immediately and directly "in the shoes" of someone in a refugee camp, of a migrant laborer, of a person walking down the street experiencing racist micro-aggressions, and so on. VR is meant to capture and mimic the perspective of someone occupying this oppressed, subaltern subject position, "immersing" the viewer in an otherwise distant experience (with the assumption that the viewer does not occupy any of the identities depicting in the film as radically alien, out of reach, and other). The illusion of proximity depends in part upon narrators or "guides" in the film-primarily women of color-who explain the scenes, provide scaffolding, and treat the audience member as a friend or confidant. According to Nakamura, the immersive aspect of the film-the fact that the technology ports the viewer to an otherly world, "virtually," "as if' they are there-"enables a fantasy of virtuous empathy" (2019). The feeling of proximity, closeness, the "almost but not quite" effect of virtuality is a decoy, a stand-in, for structural change or political action. It gives the impression that empathy, as a feeling, is a proxy forjustice, and that racism, sexism, and other 247 forms of violence, are (like empathy) feelings rather than structures. If an ethnographic study of the Virtual Human Interviewer has shown us anything, it is that proximity and closeness, like "immersion," is the outcome of hyper-mediated practices (Helmreich 2007) rather than inevitable, automatic pretext of conversations about psychic suffering. Like the guides in these VR films, Abby the non-intrusive interviewer guides research subjects on a journey inward. Any closeness or trust that the subjects feel toward Abby may have been engineered, but the closeness that Abby and Nava feel toward the subjects was very real. 248 References Altman, Irwin and Dalmas Taylor. Taylor, D. 1973. Socialpenetration:T he development of interpersonalr elationships. New York: Holt. Baum, Frank L. 1900. The Wonderful Wizard of Oz. Chicago: George M. Hill Company. Benjamin, Ruha. 2019. Race After Technology: Abolitionist Tools for the New Jim Code. Cambridge, UK: Polity Press. Bleecker, Julian. 2004. "The Reality Effect of Technoscience." PhD diss. University of California Santa Cruz. Briggs, Charles. 1984. "Learning How to Ask: Native Metacommunicative Competence and the Incompetence of Fieldworkers." Language in Society 13(1): 1-28. Carr, E. Summerson. 2011. Scripting Addiction: The Politics of Therapeutic Talk and American Sobriety. Princeton: Princeton University Press. Carr, E. Summerson and Yvonne Smith. 2013. "The Poetics of Therapeutic Practice: Motivational Interviewing and the Powers of Pause." Culture, Medicine and Psychiatry3 8:83- 114. Chun, Wendy. 2000. ProgrammedV isions: Software and Memory. Cambridge, MA: MIT Press. Coleman, Gabriella. 2014. Hacker, Hoaxer, Whistleblower, Spy: The Many Faces ofAnonymous. New York: Verso. Daston, Lorraine. 1994. Enlightenment Calculations. CriticalI nquiry 21(1):182-202. Desjarlais, Robert and Jason Throop. 2011. "Phenomenological Approaches in Anthropology." Annual Review ofAnthropology 40:87-102. Dick, Philip K. 1968. Do Androids Dream ofElectric Sheep? New York: Random House. Duranti, Alessandro. 2014. The Anthropology ofIntentions: Language in a World of Others. Cambridge, UK: Cambridge University Press. Ekbia, Hamid and Bonnie Nardi. 2017. Heteromation, and Other Stories of Computing and Capitalism. Cambridge, MA: MIT Press. Eubanks, Virginia. 2018. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. New York: St. Martin's Press. 249 Feuerherd, Peter. 2018. "Why Didn't the Rodney King Video Lead to a Conviction?" JSTOR Daily, February 28. < https://daily.jstor.org/why-rodney-king-video-conviction/> (accessed July 20,2019). Gershon, Ilana. 2015. "What Do We Talk about When We Talk About Animation." Social Media + Society. Goffman, Erving. 1974. Frame analysis: An essay on the organization of experience. New York: Harper and Row. Goffman, Erving. 1981. Forms of talk. Oxford: Blackwell. Goodwin, Charles. 1994. "Professional Vision." American Anthropologist 96(3): 606-633. Goodwin, Charles and Marjorie Harness Goodwin. 2004. "Participation." In A Companion to Linguistic Anthropology. Alessandro Duranti, ed. Pp. 222-244. Malden: Basil Blackwell. Haraway, Donna J. 1997. Modest_ Witness@SecondMillennium. FemaleMan©_MeetsOncoMouseTMN ew York: Routledge. Helmreich, Stefan. 2007. "An anthropologist underwater: Immersive soundscapes, submarine cyborgs, and transductive ethnography." American Ethnologist 34(4): 621-641. Hicks, Marie. 2017. ProgrammedI nequality: How Britain Discardedi ts Women Technologists and Lost its Edge in Computing. Cambridge, MA: MIT Press. Inoue, Miyako. 2018. "Word for Word: Verbatim as Political Technologies." Annual Review of Anthropology 47:217-32. Jackson, John Jr. 2013. Thin Description:E thnography and the African Hebrew Israelites of Jerusalem. Cambridge, MA: Harvard University Press. Just, Marcel Adam, Vladimir L. Cherkassky, Augusto Buchweitz, Timothy A. Keller, and Tom M. Mitchell. 2014. "Identifying Autism from Neural Representations of Social Interactions: Neurocognitive Markers of Autism." PLOS One 9(12): 1-22. Irani, Lilly. 2015. "The Cultural Work of Microwork." New Media and Society 17(5): 720-739. Irani, Lily. 2018. " 'Design Thinking': Defending Silicon Valley at the Apex of Global Labor Hierarchies." Catalyst: Feminism, Theory, Technoscience. 4(1): 1-19. Kelty, Christopher M. 2008. Two Bits: The Cultural Significance ofFree Software. Durham: Duke University Press. Light, Jennifer. 1999. "When computers were women." Technology and Culture 40(2): 455-483. 250 Lutz, Catherine and G.M. White. 1986. "The Anthropology of Emotions." Annual Review of Anthropology 15: 405-436. Lutz, Catherine and Lila Abu-Lughod, eds. 1990. Language and the Politics of Emotion. Cambridge, UK: Cambridge University Press. Manning, Paul. 2018. "Animating virtual worlds: Emergence and ecological animation in Ryzom's living world of Atys." FirstM onday 23(6-4). < https://firstmonday.org/ojs/index.php/fm/article/view/8127/7414> (accessed July 19, 2019). Nakamura, Lisa. 2014. "Indigenous Circuits: Navajo Women and the Racialization of Early Electronic Manufacture." American Quarterly 66(4): 919-941. Nakamura, Lisa. 2019. "Virtual Reality and the Feeling of Virtue: Women of Color Narrators, Enforced Hospitality, and the Leveraging of Empathy." Lecture, Princeton University Thinking Cinema Series, Princeton, NJ, March 3. Nakano Glenn, Evelyn. 1992. "From Servitude to Service Work: Historical Continuities in the Racial Division of Paid Reproductive Labor." Signs 18(1): 1-43. Noble, Safiya Umoja. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. New York: New York University Press. Pentland, Alex. 2008. Honest Signals: How They Shape Our World. Cambridge: MIT Press. Philips, Amanda and Alison Reed. 2013. "Additive race: colorblind discourses of realism in performance capture technologies." Digital Creativity 24(2): 130-144. Philip, Kavita, Lilly Irani, and Paul Dourish. 2012. "Postcolonial Computing: A Tactical Survey." Science, Technology, & Human Values 37(1): 3-29. Poster, Winifred R. 2019. "Sound Bites, Sentiments, and Accents: Digitizing Communicative Labor in the Era of Global Outsourcing." In digitalSTS: A Field Guidefor Science & Technology Studies. Janet Vertesi and David Ribes, eds. Pp. 240-262. Princeton: Princeton University Press. Rea, Shilo. 2014. "Carnegie Mellon Researchers Discover Brain Representations of Social Thoughts Accurately Predict Autism Diagnosis." CarnegieM elon University News, December 02. (accessed July 20, 2019). Rice, Thomas. 2010. "Learning to Listen: Ascultation and the transmission of auditory knowledges." The Journalo fthe Royal AnthropologicalI nstitute 16: S41-S61. Robertson, Jennifer. 2017. Robo sapiens japanicus:R obots, Gender, Family and the Japanese Nation. Berkeley: University of California Press. 251 Robbins, Joel. 2004. Becoming Sinners: Christianitya nd Torment in a Papua New Guinea Society. Berkeley: University of California Press. Rosaldo, Michelle Zimbalist. 1984. "Toward an anthropology of self and feeling." In Culture Theory: Essays on Mind, Self, and Emotion. R. Shweder and R. LeVine, eds. pp. 137-157. Cambridge, UK: Cambridge University Press. Russell Hochschild, Arlie. 2012. The Managed Heart: Commercialization ofHuman Feeling. 3rd Edition. Compton: University of California Press. Scott, Ridley. 1982. Blade Runner. Film. Burbank: Warner Brothers. Silvio, Teri. 2010. "Animation: The New Performance?" Journal ofLinguistic Anthropology 20(2): 422-38. Spivak, Gayatari Chakravorty. 1993. "Echo (nymphe)." New Literary History 24(1): 17-43. Stacey, Jackie, and Lucy Suchman. 2012. "Animation and Automation: The Liveliness and Labours of Bodies and Machines." Body and Society 18(1): 1-46. Suchman, Lucy. 2007. Human-Machine Reconfigurations: Plans and Situated Actions. 2d Edition. Cambridge, UK: Cambridge University Press. Taylor, Astra. 2018. "The Automation Charade." Logic, August 1. (accessed July 19, 2019). Ticona, Julia and Alexandra Mateescu. 2018. "Trusted Strangers: Carework platforms' cultural entrepreneurship in the on-demand economy." New Media and Society 20(11): 4384-4404. Throop, Jason and Keith Murphy. 2002. "Bourdieu and phenomenology: A critical assessment." Anthropological Theory 2(2): 185-207. Turkle, Sherry. 2007. "Authenticity in the age of digital companions." Interaction Studies 8(3): 501-517. Vertesi, Janet. 2012. "Seeing like a Rover: Visualization, embodiment, and interaction on the Mars Exploration Rover Mission." Social Studies ofScience 42(3): 393-41. Villeneuve, Denis. 2017. Blade Runner 2049. Film. Burbank: Warner Brothers. Wilf, Eitan. 2019. "Separating Noise from Signal: The ethnomethodological uncanny as aesthetic pleasure in human-machine interaction in the United States." American Ethnologist 46(2): 202- 213. 252 Wilson, Elizabeth. 2010. Affect and Artificial Intelligence. Seattle: University of Washington Press. 253 Chapter 4: Listening Like a Computer "Auditory hallucinations frequently appear only in the night-time, or at least much more then. They seem, as a rule, not to possess complete sensory directness. They are voices "as in a dream," "from the underworld," "voices in the air, which come from God," more rarely gramophone or telephone voices, wireless telegraphy." - (Emil Kraepelin, Manic-DepressiveI nsanity and Paranoia2 002[1921]) Every year, the Bipolar Research Unit of Midwestern University's (MWU) Depression Center holds Bright Nights, a community forum event on bipolar disorder and the Unit's current research projects. Bright Nights serves a multitude of functions. It indeed provides the general public a chance to hear from and present questions to the Bipolar Research Unit's (BPU) PI, the head of the clinical staff, the head of the BPU's stem cell research team, and two patient-research subject panelists. But because the BPU runs almost exclusively on philanthropic donations, it is also a fundraising event, meant to pull on the heartstrings of audience members, instilling hope in the groundbreaking potential of BPU's research and encouraging financial support. More than that, it is a recruitment event, intended to inspire potential subjects to lend their bodies and voices for the greater good of locating biological markers to help predict, intervene on, and improve the wellbeing of others who have been diagnosed with bipolar disorder. This year's Bright Nights was convening in the meeting hall of a country club in a rural town with a population not unlike the majority of the Bright Nights attendees: white, elderly, and middle-to upper-class. The country club was opulent bordering on musty, clinging to its grandeur though the place had long passed its prime. The fabric of the tufted armchairs and sofettes, tablecloths, and voluminous drapes was old and worn but velvet, in shades of wine and aubergine. Exaggerated crystal chandeliers dripped from ceilings crested with ornate crown molding. The gold paint was flaking off of the large, framed mirrors that hung above solid 254 mahogany end tables, upon which sat brass candelabras and wide, Grecian vases containing artificial flowers: birds of paradise, orchids, lilies, roses. Although, in an ironic twist, the hall was dimly lit, if the lighting had in fact been brighter, I would have expected to find a film of dust settled over everything. The country club was a good forty-minute drive away from the even more rural, bucolic setting of the BPU headquarters in the Depression Center, which is attached to a public geriatric hospital and sits on a plot of land set off the highway among flower beds and fields of tall grass. I had shared'a car with two of my informants, Adele and Cheryl, both white women in their fifties and members of the clinical team of the BPU. With Adele-the research manager of BPU's clinical team-at the wheel and the early summer sun setting slow in pink and orange streaks, the women told me stories about their BPU patients and the passing scenery, pointing out shuttered factories where relatives and parents used to work, places they had visited as children and places they now visited with their grandchildren. Their talk of elementary school fieldtrips and memories of their mothers made me feel at ease and as though I had known them for years, even though I had only arrived a few weeks prior to begin my ethnographic study of the BPU's efforts to develop a mobile phone application that can predict the onset of a pathological episode through the analysis of acoustic features of speech. As a Bright Nights volunteer, it is my job to stand opposite Adele at the country club hall's entrance and welcome everyone who enters, handing them a golf pencil and a blank index card. In what I hope is a warm and cordial tone, I instruct them to write down on the card any questions they might have during the panelists' talks, because the cards will be collected and distributed to the PI, who will address the questions to the audience. Some attendees already have their hands full with plastic glasses of complementary wine or powdered lemonade, and 255 others are plagued with arthritis, so I am asked to tuck the cards and tiny pencils into purses, the pockets of blazers, or atop plastic plates already filled with cheese and crackers. Others ask for extra cards, anticipating many questions. Cheryl, Adele, and other volunteers recognize some of the audience members. Quite a few are either long-time research subjects or family members of the subjects that they have come to know over the past years, since the BPU is conducting one of the largest, longest running longitudinal studies of bipolar disorder in the United States. It is also not uncommon for subjects and researchers to share a mutual friend from childhood; perhaps they even attended MWU together. Among the rush of people entering the room, the BPU PI (Primary Investigator) approaches me with a patient of his who will be speaking on the panel: a lean man with optic white hair pulled into a ponytail, both ears pierced several times with slim, silver hoops. Dressed in a navy-blue suit and tie, he is significantly taller than I am, with etched lines in his sunburnt face, dark eyebrows and a spray of freckles across the bridge of his narrow nose. This is Jacob, one of the earliest research subjects of the longitudinal study and also enrolled in the cell phone study. After a quick introduction, the PI steps away. Jacob immediately shifts to stand very close to me as if we are engaged in a conspiratorial conversation, so close that I am nervous. He begins touching-palpitating-the crook of my elbow as he speaks, and since my arms are crossed in front of my body this puts his hand in the dangerous vicinity of my chest. His voice is gravelly but fast, electrified and hypnotic, one word carrying over and falling into the next like overlapping waves in a storm. He wants to know what an anthropologist is doing here studying the cell phone project, and I try to give him as inconsequential and as positive of an explanation as possible while still remaining true to at least a partial portion of my research interests. I say 256 that I am interested in the collaboration between clinical psychiatric people and computer science people, and that their coming together to use technology to try and intervene on the subjective nature of diagnosis-is like the meeting of two different "cultures." He praises the project and the project's PI with extravagance: they are doing the best thing, they are far ahead of everyone, this is phenomenal work and phenomenal that I am studying it and so great to be a part of it. He does not want me to forget the human dimension of things. Curious to hear his reaction, I tell him that, at the other sites I studied, people expressed fear and anxiety over the prospect of using technology to address mental health issues, because they were concerned this meant technology might replace the jobs of clinical professionals. He scoffs at these straw men I have presented to him. People are scared or skeptical, he explains, because there is a lack of education. They are not thinking of the human lives at the other end, or that cell phones can in fact save people's lives. By way of example, he launches into his own story about how his iPhone saved his life, years ago, when he first attempted suicide during his first manic episode. He had been an incredibly successful computer programmer. He had a beautiful wife and two beautiful children, and it was Mother's Day. He got up in the middle of this beautiful brunch, and left, driving and driving and driving until he ran out of gas. "And then," Jacob says, leaning in even closer and gripping my elbow near my left breast, "I did an overdose." He climbed atop a fountain and took a bunch of pills, a handful - or, he had taken the handful of pills and then later found himself at the top of this fountain. He was hanging over the water when his phone dropped from his pants pocket onto the ground below. Moments later, unconsciousness overtook him and he too fell, into the water. Then, his wife called his phone, and a stranger who was passing by on an evening walk noticed it ringing and ownerless on the 257 pavement. The stranger picked up the phone and spoke to Jacob's wife, who directed the stranger to look around the area for Jacob, and sure enough the stranger discovered him floating face down in the basin of the fountain. Jacob was rushed to the hospital and miraculously revived. If his phone had not rung, he tells me, he would have died. Without his phone, someone eventually would have found his dead body in the fountain and would have to tell his wife and children that he senselessly took his own life. At a moment when he could no longer speak for himself, the phone provided a conduit through which his wife could alert a passing stranger that Jacob was in distress. His iPhone "saved his life" because of the vital connection it afforded and mediated, enabling his wife to reach out, to find him, despite his attempt to disconnect from the world altogether. His story finished, Jacob moves on to questioning me about where I am living and how long I will be in town, reassuring me that he will help me out with whatever I need, that he will contribute to my research in whatever way possible, that I can share his story. Finally, the P- who must have been standing watch and listening in-intervenes, wedging his own body between Jacob's and mine, letting me know that Jacob will be stopping by the Depression Center soon for their appointment together and that he could arrange a time for us to talk further. Jacob turns away to mingle with other guests, and the PI replaces him, also leaning in close to me but this time to tell me something that truly should be private. It surprises him-and this, he says, is part of what makes bipolar disorder so fascinating as an object of study-that some patients can be so high functioning and be doing so well, while others cannot save their own lives like Jacob did (though in Jacob's narration, he did not save his own life-his iPhone did, along with his wife and a stranger). 258 Months after this initial, hectic encounter with Jacob at Bright Nights, I encounter him once again, though not in the form that I had anticipated. I am sitting in a stale, windowless office in the Depression Center with two other silent researchers, bathed in twitching fluorescent light and hunched over the keyboard of an old a desktop computer. Through top-of-the line sound canceling headphones, I am listening to randomly ordered, 3 to 30 seconds-long excerpts of the cell phone study's research subjects' weekly phone assessments with a BPU clinician. I can only hear the research subjects' voices. The engineers on the team have ensured that the mobile phone application used to gather data for the study does not record the voice of research subjects' conversational partners, transforming the excerpted phone calls from dialogues to monologues. My job, along with the two undergraduate engineering researchers in the office with me, is to "annotate" the data, to rate the emotional tenor and "feel" of these excerpts. With these ratings, the undergraduates and I will help produce the metadata necessary to building an algorithm that BPU researchers hope will help to predict when someone with bipolar disorder will have a pathological episode based on minute changes in the sounds of their speech. Suddenly, a familiar, gravelly voice fills my ears. It is Jacob, without a doubt, but all the electricity is gone. Instead, as I click through the excerpts, trying to assign numerical values to the sound of his voice rather than the content, I hear a man who is deflated and hopeless. His speech-more of a series of sighs than words-is listless and indistinct, barely audible though I have the volume turned as high as it can go and I am mashing the powerful headphones against my ears. I feel sorry to have been intimidated by a man who could become so meek, so utterly sad. And at the same time, I feel guilty that I know it is Jacob. The dataset that the other researchers and I are listening to and annotating was gathered three years prior, long before Jacob met me 259 and long before he told me-and invited me to share-his story about the life-saving iPhone. Although I am only supposed to be listening to the acoustic quality of research subjects' voices and dis-attend to the content, this task becomes all the more difficult-fruitless, even-when I cannot help but match the voice with a face, a hand gripping my elbow, and a tale of attempted suicide. As was the case at my other two fieldsites, my inclusion on the cell phone study's IRB protocol ratifies and ethically validates my participant role in the interaction between Jacob and the clinician. Indeed, when he signed the project's consent form years ago, Jacob agreed to allow a researcher to later listen to his phone calls with the clinician. I am an "unaddressed recipient" (Goffman 1981:133) of the interaction between Jacob and the clinician, a sanctioned eves- dropper with a crucial caveat. While the task at hand-listening to and annotating the excerpts- grants the other researchers and I broad access to the details of Jacob's day-to-day life, given away in his responses to the assessment calls, the lead engineers on the team urge us, over and over, to avoid letting these details enter our attention as much as possible, to not listen to the details at all. Glancing at my own iPhone that rests near the keyboard, I try to imagine what it must have been like to answer the call from Jacob's wife, and the connection between Jacob and the stranger that his stranded phone made possible. What to make of the connection (or, if the engineering team's job is to forget the details, the disconnection) between Jacob, all the other research subjects, the other researchers, and I, that building the cell phone study's predictive algorithmic infrastructure affords? PHONES THAT SAVE LIVES 260 For a growing number of researchers of mental illness like the BPU's PI, the cell phone represents a promising, untapped research tool. Cell phones can be repositories of data, the volume of which offers up a level of specificity of analysis that would be difficult if not altogether impossible to achieve through traditional surveys, inventory tools, or through face-to- face interactions between researchers and research subjects. With cell phones, users passively transmit multi-modal data captured by the phone's various sensors or actively enter in data as a byproduct of using the phone as they normally would. They scan their fingerprints and their faces in order to enable personalized security features, to be sure that only they can unlock their phones. Calls made with the phone are time-stamped, indicating the duration and time of day and date the call was made. Walking around with a phone in hand or in pocket produces gyroscopic data, and GPS coordinates if the user has enabled location tracking. Even the quality of fingertip touches to the screen can be a form of data, as researchers seeking new methods to track, diagnose, and understand the progression of Parkinson's disease have attested (Zhan et. al 2018). Thus, while the data that passes through phones might seem meaningless, unrelated to mental health or unrelated to diagnostic symptom criteria set down in the DSM, researchers like the BPU's PI argue that systematically recording and analyzing cell phone usage data can reveal habits and behavioral patterns that have never before been correlated with the incidence of mental illness (see Onnela and Rauch 2016). In their eyes, cell phones have the potential to allow researchers to track down novel indicators of mental illness that previously have gone unnoticed or that professionals never considered to be meaningful signs as at all. Researchers refer to this method of data capture and the diagnostic precision it promises as "digital phenotyping," or, "the moment-by-moment quantification of the individual-level human phenotype in situ using data 261 from personal digital devices" (Torous et al., 2015; Insel 2017).' Proponents of digital phenotyping position the cell phone as uniquely capable of rendering the otherwise intractable minutiae of everyday behavior into calculable, traceable material. The use of the word "phenotype" signals the belief that proper-accurate, expansive-description of mental illnesses' manifestation holds the key to pinning down something like a genotype, and therefore the fundamental, biological mechanisms driving mental illness. The BPU's cell phone study is an instantiation of the promissory pledge of digital phenotyping. The logic driving the project is that the mysteries of mental illness can be unlocked by pursuing connections between observable behavioral signs and internal states that have never been studied in tandem, specifically, the pathological mood episodes associated with bipolar disorder and the acoustic contours of the human voice. However, as I have argued elsewhere in the dissertation, researchers do not merely locate or stumble upon the correlations and correspondences that data-driven mental health research-like digital phenotyping-produce. Researchers constitute and work to hold steady these connections in the very process of pursuing them. When engineers collaborate with psychiatric professionals to transform cell phones into research tools, they must make choices: what counts as potential data and therefore what aspects of human behavior should be tracked (or ignored), how to track that data, where and in what form the data should be stored, and, most importantly, how that data, once stored, should be sorted, labeled, and analyzed. Because the discourse of digital phenotyping (and computational psychiatry as a whole) erases such marks of human decision-making from the process of turning digital data into symptomatic signs, ethnographic attention to the choices researchers make and the alternative possibilities and connections these choices foreclose upon is crucial. " Recall from Chapter 1 that digital phenotyping is an example of a data driven method for conducting psychiatric research, and therefore falls under the umbrella of Computational Psychiatry. 262 Researchers also make choices that could have been otherwise about how to make bipolar disorder audible. Packaged within these choices are claims about what bipolar is, and what kind of data human speech contains. While the research teams discussed in other chapters emphasize the ability of the technologies that they are building to capture sonic features of mental illness that surpass human attention or awareness, members of the BPU cell phone study takes a different approach. Their goal is to calculate and train an algorithm to pick up on changes in the voice that human observers can hear but cannot systematically describe or put into words. Therefore, in addition to a case study of the infrastructural arrangements, categorizing practices, and labor required to make digital phenotyping possible, in this chapter, I focus on the figure of bipolar disorder as a mood disorder that causes audible changes in the quality of~speech. As scholars such as Emily Martin (2007) have explored, bipolar disorder is riddled with assumptions about what mood and emotion even are-for instances, that they can be distinguished from and are opposed to reason and logic-and about the dividing line between normal and pathological affective experiences. In addition to grappling with these assumptions in operation in the cell phone study, I introduce and distill a third: the assumption that affect is suspended in speech and can be made knowable through listening. Unpacking this third assumption requires probing the relationship between affect and speech in the Euro-American imaginary and in biomedical framings of illness and the body. Anthropological studies of affect have, through a comparative lens, helped to clarify and situate the model of emotions as interior, private, and individual states within the cultural legacies of North American psychiatry and psychology (Rosaldo 1984; Lutz and White 1986; Lutz and Abu- Lughod 1990). As the linguistic anthropological literature on ideologies of linguistic opacity in Oceana suggests (Duranti 1992; Rosaldo 1982; Keane 2008; Robbins 2008; Silverstein 263 2001[1981]; Throop 2010) ideologies of linguistic transparency5-2 or the notion that speech has the potential to carry forth emotions and therefore set free an individual's unique, core, authentic self-reinforce a model of emotions as residing inwards and essentially linked to personhood. The foundational psychological research on emotions and affect in the U.S. define affective experiences as that those that exceed rational control, and as emanating from a panhuman core embedded in the body. Consider, for example, the influential work of psychologist Paul Ekman and his efforts to categorize affective experiences universal to all humans. The basis for his findings involved, in part, analyzing movements of facial musculature in response to visual stimuli (Ekman and Friesen 1971; Ekman 1989; Ekman 1999). Referring to these responsive facial expressions as "reflexes" places emotions so deep in the body that they are unreachable by culture, and unmovable by conscious, agentive control. In other words, according to this model, people do not learn to perform emotions through bodily movements, such as the patterned coordination of the speech organs-emotions are ofthe body and happen to the body.5 3 These formulations of emotions as interior, private, and traceable, bodily reflexes that operate beyond conscious control depend upon patently Eurocentric binaries and divisions (the body and the mind, feeling and cognition, matter and spirit, rational and irrational). Moreover, because it is beyond (or before) consciousness and rational control, the affective realm is beyond language. Spoken utterances signifying emotion are hence called "paralinguistic" cues- breathiness, speed, timbre, pitch, and so on-the body's "pre-lingual" ruminations, sounds that 52 These scholars contrast ideologies of linguistic transparency, in operation among speakers of Euro-American English, with ideologies of linguistic opacity more prevalent in Oceana. For instance, Jason Throop (2003; 2010) has found that among communities of speakers on the island of Yap in the Federated States of Micronesia, it is unethical to ask after or pursue the connections between interior states and speech produced in conversations or public settings. Speakers, in turn, should work to ensure that their speech is opaque as possible by carefully controlling the semantic meaning of utterances, and by evacuating from their speech any signs that might reveal inner states. s3 Dror (2001) provides a history of the role of the behavioral sciences in the U.S. in fomenting the notion that emotions have a bodily trace that can be tracked down and quantified. 264 bear communicative significance but are adjacent to language proper. Suspended in the body, affect is therefore traceable in the "grain of the voice" (Barthes 1977)-beyond words, beyond reason, but within speech. 5 The cell phone study reifies and reinforces the grain of the voice as affect's siting, entrenching it further in biological essentialism, promising to make pathological emotional experiences tractable and predictable by quantifying what is heard in affect. Researchers strive not only to quantify what is heard in affect. The goal is also to quantify intuitive interpretations of how affect sounds. The study takes as its starting point conventional therapeutic wisdom about the observable, indexical signs that indicate bipolar disorder's two pathological poles: quick speech evinces mania, and slow speech evinces depression. These were the same observations set down in Emil Kraepelin's Manic Depressive Insanity, 2002 [1921], which provides a long-term and detailed observational study of lafolie circularo r "manic-depressive disorder," what would come to be called bipolar disorder." However, it was the observations of "lay experts" (Wynn 1996) that inspired the BPU's PI: people close with patients and people who themselves are "living under the description of bipolar disorder" (Martin 2007:10). They would tell the PI that they could sense when their family member or loved one was on the brink of a pathological mood episode because there had been something ineffable about the person's voice-they had just sounded off The PI wanted to take this intuition and concretize it, in his words, "teach the computer to listen like a human brain." 5 Note that the distinction between language (grammar, semantic meaning) and speech (the act of producing vocal utterances), akin to the Saussurean distinction between langue and parole, maps on to the distinction between reason and affect, content and forn, and also on to the distinction between mind (immaterial, emergent) and brain (material, embodied). 5 Emil Kraepelin is the 2 0th century German psychiatrist to whom contemporary psychiatric researchers credit with developing the basic nosological infrastructure of U.S. psychiatry, in part by dividing mood disorders from thought disorders (Decker 2004). Emily Martin (2007) provides a comprehensive history of how the diagnostic category of bipolar disorder has transformed over time, in terms of its nomenclature, theories of its etiology, and its associated diagnostic criteria. 265 What is this intuitive audition? What work does it take to design a system that can listen for and distill that ineffable thing some people-loved ones, therapists-can hear? This brings back a question that resonates throughout the dissertation: what exactly does it mean to listen? In this instance, what kind of listening is required to "teach a computer to listen like a brain"? Although the PI's formulation takes the aural equivalent of the gut instinct or hunch and collapses it back into a biological reduction that the neuroscientists and engineers of East Coast University used regularly-the "listening brain"5 6- the clinicians and engineers working behind the screen on the cell phone study put into practice much more nuanced and complicated conceptualizations of what it means to listen, and what listening entails. The engineers and clinical team members butted up against the limits of what listening can capture from the voice, implicitly challenging the biological essentialism of the "listening brain" in their day-to-day dealings with the research subjects' voice data. In order to develop the app's predictive capacities, the BPU team would first need to gather and categorize data. This was the stage at which the study stood when I arrived for my fieldwork. The team had developed an app that recorded all of the research subjects' outgoing calls. The engineering team was asking a sub-level question of the audio data gathered using this app: can a non-machine listener consistently identify any common features in the voices of people diagnosed with bipolar disorder? To that end, the research study in its current phase revolves around two different listening practices, assigned to the two teams of experts involved: clinical assessment (conducted weekly by members of the clinical team) and annotation (or the 56Recall, from Chapter 2, that the listening brain is opposed to the hearing ear. In this formulation, the brain is germinal to and responsible for the act of listening itself, because the brain actively processes and analyzes sound, whereas the ear passively receives and absorbs sound. Therefore, the brain/ear distinction replicates the listening/hearing distinction that sound studies scholars, most notably Jonathan Sterne (2003), have located in strands of early Christian theology. 266 quantification of the sounds of research subjects' speech, conducted by members of the engineering team). I will compare these two listening practices, reviewing the skills they require and the assumptions about voice, emotion, truth, and listening itself that are constituted in each, thinking through them as two overlapping but conflicting acoustemological (and ethical) modes of attending to the data. This means, in part, distilling the disciplinary tensions between psychiatry and computer science, and their willingness to embrace or contest the "fuzzy" realness of emotions. To reiterate, part of what is at stake in the BPU's work is the semantic ambiguity and polysemy not only of emotional terms like "mania" and "depression," but of the term "listening" itself, especially with respect to agency and intentionality. One question that remains unanswered is why people like Jacob consented to having their phone calls listened to in the first place. Perhaps their willingness can be attributed to culturally specific expectations, discussed in the Introduction, about the distinction between "hearing" (an unintentional reflex) and "listening" (an intentional action) and the extent to which it was possible for researchers to turn off their hearing in the act of listening. Researchers themselves, especially members of the engineering team, grappled with the extent to which "listening like a computer"-detaching form from content, ignoring the semantic substance of speech altogether-was humanly possible. Training a computer to "listen like a brain" required demanding that the annotators (myself included) listen like a computer, through ultimately insufficient tactics of "pure listening" to sound alone. In lively debates about the intertwined relationship between speech form and speech content, members of the engineering team drew from their own experiences as non-native speakers of English struggling to understand emotional expression among native speakers. In these debates, the BPU engineers theorized listening (and hearing) as culturally mediated rather than reflexive 267 or biological capacities alone. Such conversations about the limits of "listening like a computer," and frustrations over the annotation task itself, provided engineers a means through which to subtly and quietly critique the technological prototype the study was supposed to produce, while also critiquing psychiatric conceptualizations of affect and emotion. The cell phone study does not only require collaboration between clinicians, engineers, PIs, post-docs, and research assistants. It also brought the research subjects and members of the research team into asymmetrical proximity and cooperation with each other. Aside from Jacob, I never met or encountered any of the research subjects' whose voices I listened to, as I split my time between shadowing clinicians (listening to their questions to the subject) and helping the engineering team annotate the audio data (listening to the subjects' responses). The undergraduate researchers and I found ourselves in the center of these faceless research subjects' lives, absorbed in their calls with clinicians and, as the study's scope widened during the course of my fieldwork, their personal phone calls. The nature of the annotation task was, contrary to Jacob's warning, to "forget" the human on the other end, to treat their voice as all form and no content, and to flag segments containing "identifiable" information that would anchor the speech to an individual person. But if the listening associated with annotation has biopolitical implications (Foucault 1978)-in that annotators are to listen to subjects as members of a population rather than individuals-the listening associated with assessment was not so different. Clinical team members had to listen in a way that would encourage rapport and therefore the disclosure of personal information, information that would help them to calculate assessment scores and place the subject in the category of symptomatic (either manic or depressed) or asymptomatic (i.e. euthymic, neither manic or depressed). Because clinicians could not conduct psychotherapy over 268 the phone and their relationship with the research subjects had to be "non-therapeutic," their operative was to gather data without providing treatment, avoiding as much as possible feelings of responsibility toward the wellbeing of the research subjects. Taking into consideration Puig de la Bellacasa's (2017) invitation to think through care as a matter of maintaining specific arrangements of relations-whether those relations be liberating, oppressive, or somewhere in between-I close the chapter by returning to my inquiry after the nature of the connections that the cell phone study enables or troubles. THEGAMECHANGER DSM-5 (American Psychiatric Association 2013), the most recent edition of DSM, classifies bipolar disorder as a mood disorder, characterized by the oscillation between two mood states- often referred to as "mood episodes" (Martin 2007:47)-that reach pathological levels in their depths and heights. The PI, an Icelandic man in his 60s who is also the director of the entire BPU, likes to say that the two "poles" of bipolar represent the poles of possible human experience: depression (devastating, debilitating sadness) and mania (soaring, incandescent euphoria). The basic diagnostic criteria for bipolar disorder stipulate that depression and mania must present in succession, sometimes spaced out across months, in order for a patient to be given a bipolar diagnosis. As Martin writes in Bipolar Expeditions (2007), this makes bipolar disorder a "meta-state" (47): not a member of a class of affective experiences but a condition that includes classes of affective experiences typically thought to stand in opposition to each other.57 57 Although previous editions of DSM posited a stark distinction between the disorder's associated affective poles, DSM-5 and contemporary research suggest that the division is not so easily identifiable, and that people can experience "mixed" pathological mood states. Some research even suggests that, because there is such wide variety in disease manifestation and due to the limitations of traditional diagnostic methods, people diagnosed with bipolar 269 As Martin notes, while bipolar is defined in DSM and in American psychiatry writ large by the conjoined presence of these opposing affective states and subsequent behaviors (like inhibition and excitement), basic diagnostic criteria for bipolar leave "emotion" and "mood" undefined. The black boxing of these terms had consequences for the research team. As we shall see, the distinguishing boundary between mood and emotion, the universality of these affective states, and their "fuzzy" ontological status are issues central to the disciplinary frictions between the engineers and clinically trained professionals working on the cell phone project. Nevertheless, the team did share working, relational definitions of mood and emotion, which they put into practice in the research design, data gathering, and data filtering practices. Given my background in writing and my lack of training in either engineering or computer science, engineering team members called upon me to help them write a training video and a training guidebook for the annotation task. Co-writing the guidebook was an exercise in distilling what mood and emotion meant for the team. However, since neither I nor my engineering co-authors had significant clinical training, nor could claim any expertise on human emotion, we relied on a series of texts from the Internet and a collection of the lead engineer's grant proposals to write the guidebook.5 8 Over several weeks, we worked on the same Google Document in the engineering office on our individual laptops with our backs turned to each other, never meeting eyes but occasionally sharing a few appraising chuckles at our collective ability to bricolage and bullshit. As the guidebook states, for the purpose of the study, mood is a disorder are actually experiencing biologically distinct, heterogeneous disorders that cut across the singular category (see, for example, Clementz et. al. 2016). Researchers often cite the heterogeneity of bipolar disorder as evidence for the necessity of an RDoC approach to studying mental illness. 5 The engineering team did not consult the clinical team in the drafting of the guidebook. Clinical team members tended to take on more administrative tasks and were stretched thin across several different BPU projects. For this reason, the engineering team was hesitant to contact them for assistance-they wanted to avoid adding more work to their already heavy load. Best intensions aside, the guidebook remained a source of unspoken tension between the two teams. 270 "deep and lasting, enduring cognitive state." Emotion, on the other hand, is fleeting and reactive, "action-oriented,observable expressed behavior that can be described in terms of valence (positive vs. negative) and activation (calm vs. excited)." Emotion changes from moment to moment, is more superficial and therefore easier to detect in speech than mood. The lead engineer, the PI, and other bipolar researchers theorize that mood and emotion are interrelated, such that changes in emotion precipitate changes in mood. Therefore, and since emotion is supposedly more observable (including more audible) than mood, these researchers theorize that detecting changes in emotion might be a way to anticipate and predict the extreme changes in mood that define bipolar disorder. In other words, as the guidebook explained, "emotion is a useful meta-feature for detecting changes in mood state using the speech signal." The cell phone study is the newest layer of the multi-modal "data onion," as researchers call it, of BPU's ten-year longitudinal study, all part of an effort to gather and analyze high volumes of data. Researchers hope such Big Data holds the key to novel understandings of the relationship between bipolar symptom expression and genetic predispositions for the disorder, the kinds of findings that traditional, smaller-scale research projects have hitherto been unable to produce. The active research cohort of over a thousand subjects includes entire families, because bipolar disorder is thought to have a strong hereditary component, and the PI, trained in genetic psychiatry, has been pursuing the genetic basis of bipolar since the start of his career. The longitudinal study requires multiple, multidisciplinary teams to gather, manage, sort and analyze its various streams and types of data: the neuropsychiatric team, the stem cell research team, the microbiome team (looking at brain-gut interactions), and, for the cell phone study, a team of engineers, a data scientist, and a mathematician, known throughout BPU as the "engineering team." The clinical team, comprising of around 15 members, assists and provides administration 271 support to each of these other teams. While all other research teams are made up of MWU faculty and students (including undergraduates, graduate students, post-docs, and visiting research students), the clinical team is made up of mostly female "staff" rather than students, people who have just completed their BS in psychology or BA in social work, or once-practicing licensed clinical social workers and psychiatric nurses who have shifted from practice to research work." Not unlike East Coast University and West Coast University, clinically trained team members or members with more qualitative rather than quantitative training tended also to be predominantly women, and were responsible for the face-to-face interactions and emotional labor necessary to the research. In order to enroll in the longitudinal study, members of the clinical team psychiatrically assess potential research subjects to confirm their diagnosis of bipolar. Subjects must agree to the collection of a wide variety of biological samples (blood, urine, saliva, skin, feces). In addition to providing this biological data, longitudinal subjects also participate in a yearly and lengthy life history interview and psychiatric assessment with a BPU clinical staff member, usually conducted over the phone. These interviews produce assessment scores that quantify the participants' fluctuations in bipolar symptoms from year to year. Although subjects do not always speak to the same clinical team member every year, the length of the longitudinal study and depth of these phone calls mean that research subjects become fairly used to sharing personal information over the phone with someone they have never met and may never speak to again. Thus, for the cell phone study, which requires subjects to undergo weekly phone assessments with a BPU clinician, BPU primarily recruits subjects from the longitudinal study 5 The three or four student members of the clinical team present during my time there were visiting students. Two thirds of them were men pursuing research-based graduate degrees either in clinical social work or clinical neuroscience. 272 cohort who already feel comfortable with phone-based assessments. Another advantage of drawing from the active subject cohort is that these subjects had been definitively diagnosed with bipolar disorder. The cell phone study's basic research question resembles the question that research teams at the other two sites are pursuing, with a unique emphasis on prediction: are there acoustic features in the speech of people diagnosed with bipolar disorder that indicate a pathological mood episode is forthcoming? In other words, are there vocal-acoustic harbingers of mania or depression? The BPU's PI envisions the study producing a kind of early warning system: a cell phone application that detects these predictive acoustic features using an algorithm that the engineering team has designed, trained on numerical ratings of the sound of subjects' speech and their weekly psychological assessment scores. If the app detects these telltale-warning sounds, it will send a "warning signal" (e.g., a text notification) that a mood episode is imminent to a designated list of individuals, like the user's clinician or their family and friends. Beyond research questions and the desired end use of the application, BPU researchers suggested to me behind closed office doors and in the privacy of carpooled rides between MWU and the Depression Center that the PI-with his fiery, entrepreneurial charm-was seeking a big breakthrough, a disciplinary "game-changer," a term he used often. He was nearing the end of his career and his pursuit of the "gene for bipolar," which had once been his life's project, had proved fruitless.o Moreover, because of the BPU's reliance on philanthropic funding and because the PI was constantly searching for commercial sponsorship, some guessed that the PI 6(In the shadow of the Human Genome Project, scientists have come to accept that the relationship between the genetic code and genetic expression is far less linear and much more stochastic than initially anticipated. This is especially the case when it comes mental illness, since the methods for phenotypic description of mental illness remain hotly debated. Hence, anthropologists and historians of the life sciences refer to the contemporary moment as the "postgenomic" era (e.g. Richardson and Stevens 2015). 273 cooked up the cell phone study in order to produce an innovative and flashy prototype that would satisfy donors and attract a high-powered sponsor in the technology or health insurance sector. Cynicism and office gossip aside, in our interview the PI told me, with his customary frankness, that his ultimate goal with the cell phone study was to acquire the funding and produce basic science findings that would help his patients as much as possible. To the PI, impressing donors and sponsors, accruing research funds, building a game-changer prototype, and helping his patients-and anyone else diagnosed with bipolar-were pragmatically and indissolubly entangled. LISTENING/NOT LISTENING Just as the cell phone study's existence cannot be traced back to a singular, motivating force "internal" to science but rather a series of interlinked initiatives-attract funders, acquire funds, help patients, innovate-likewise the project's sole emphasis on acoustic features of speech was not driven by pure scientific curiosity alone. Yes, the BPU was interested in better understanding the connection between speech, emotion, and mood in people living under the diagnosis of bipolar disorder. The lead of the engineering team, Meredith-an intense but kind-hearted New Englander in her late thirties-was especially interested in this connection, and the role that machine learning could play in locating it. But the emphasis on acoustic features rather than semantic or syntactical structure of language-i.e., speech form rather than linguistic content- had much to do with the team's interpretations of their own IRB protocol and concerns about violating subjects' privacy.6' " Recall that, within communication engineering and speech signal processing community, syntax is a language- dependent component of communication. Syntax provides structure to semantic meaning and so it closely hews to 274 Meredith did her graduate training at WCU and completed a post-doctoral fellowship at East Coast University under Ted, the prestigious behavioral speech signal processing scholar. In our one-on-one interview, she explained to me that the cell phone study was exciting from a speech engineering point of view because it offered an opportunity to capture relatively unstructured, "natural" speech, "in the wild" or in situ.62 That being said, she confessed that she herself would not consent to the ubiquitous monitoring and passive audio recording that the study required-she was a self-professed conservative when it came to data privacy issues. In the early days of the cell phone study, Meredith and the engineering team operated under the assumption that no humans could listen directly to any of the audio recordings of subjects' phone calls, and that the content could not be analyzed in any capacity, even by an automatic speech recognition algorithm. Proceeding forward with the interpretation that any form of analyzing audio content would infringe upon research subjects' privacy, the team used an "off-the-shelf" (preassembled, gold-standard) algorithm to analyze the rhythmic patterns of research subjects' speech in the assessment phone calls with the BPU clinicians. Two or three years into the study, however, Adele and others on the clinical team pointed out that, technically speaking, members of the clinical team could probably overhear the phone calls with research subjects and possibly even the research subjects' end of the conversation. The desks in the office where most of the clinical staff sat were spaced tightly together. The sounds speech content. Because syntax varies cross-culturally, it lacks a stable biological basis. Communication engineers and speech signal processors tend to find the Chomskian notion of a universal grammar apparatus to be a cognitivist historical relic with no scientifically observable basis. On the other hand, paralinguistic variations in the production of speech have a traceable, material existence-these features can be distilled down to sound waves, which have properties that can be analyzed mathematically without regard to the sound's association with the meaning of an utterance. 6 The majority of other data sets that speech signal processing researchers use to study the relationship between acoustic features and psychological or affective states are "unnatural": either structured responses to interview between two people with no known psychiatric diagnosis or actors acting out lines or scenarios, rather than someone with a clinician-confirmed diagnosis responding to an open-ended question or talking informally to someone they know (a form of "natural" speech). 275 of the calls flooded this shared space in a way that no one could control. It was inevitable, Adele and others argued, that members of the research team were "listening" in some capacity to the phone calls, simply because no one could not control when or if they overheard the subjects speaking on the other end of the phone, and because there was no way of knowing if the officemates were actively (and secretively) attending to or processing whatever content of the calls resounded throughout the cloistered office. The clinicians suggested, then, that the clinical researchers on the team might already be taking part in the very act that Meredith worried would overstep research subjects' privacy: listening to their phone calls. In other words, in the cramped space of the clinician office, the calls (in terms of their semantic content) were never really private to begin with, so there would be no ethically significant breach of privacy if the engineering team members were to start listening to the calls as well. This argument probed and dissolved the distinction between hearing (a happenstance absorption of sound that occurs automatically, simply due to being in the presence of sound) and listening (an agentive, intentional and directed auditory processing), underscoring that it is difficult to say when hearing alone morphs into listening. Heeding Adele and her colleagues, Meredith and the PI moved to petition the MWU IRB to add in a line to the study's protocol and informed consent form specifying that, at some point down the research pipeline, members of the study team might listen to their phone calls, though they would only listen to the sounds of speech and not the speech's content. The amendment was passed, without the specification as to how researchers might simultaneously listen (to sound) and not listen (to content), or which of the research team members would be doing the listening/not listening. Beyond the lack of details, however, the amendment pried apart listening from hearing once again. Meredith and the PI's edits distinguished listening from hearing 276 through the re-inscription of agency to the auditory act in the splicing up of speech into sound and meaning. Specifying that researchers would be directing auditory attention to speech sound while consciously re-directing attention away from speech meaning imputes agency and intentionality to the listener: the listener must work to break apart speech into distinct components, rather than absorb speech holistically in the way that "overhearing" implies. Anthropologist and sound studies scholar Tom Rice notes that different uses of the term listening in the U.S. and U.K. "imply subtle shifts in acoustical agency, which references nuanced varieties of active-receptive and passive-receptive auditory agency" (Rice 2015:100). The IRB amendment illustrates how rhetorically tweaking the valences of passivity, agency, and activeness involved in the auditory up-take of sound can be used to manage expectations for privacy and the team's commitment to it, in part by reinforcing the notion that the most meaningful core of speech resides in semantic content alone. Adele and others seized upon the vagueness of the hearing/listening distinction to push for making the recorded assessment calls available to the engineers' ears. Meanwhile, in the revised IRB protocol, "listening" loses its vagueness, and comes packaged with an implicit ethics: researchers will not attend to what subjects say but how they say it, giving the impression that the content of speech will be protected or somehow blocked out, and that form and content are distinct entities. Thus, while Meredith and the PI made the revision because clinicians realized that the calls, as an auditory event, could never be contained or kept away from the ears of others, the revision itself gives the impression that the analysis of speech data will somehow offer privacy by keeping the real meat of conversation-the content-out of audible reach. In so doing, the IRB revision reinforces the semiotic ideology (Keane 2003) that language is primarily referential-that speech form is an 277 accessory to real speech meaning, and that the sounds of speech are a superficial "garb" that can be stripped away from signification (Keane 2005). Immediately following IRB approval of the amendment, the engineering team's own attention shifted to the kind of data that humans, rather than preassembled algorithms, could gather if they listened to the calls. Hassan, a post-doc in his early 30s working under Meredith who had completed his PhD in the Middle East, insisted that the team gather together a group of undergraduates to join the engineering team and begin labeling the audio recordings so that they could start to build a predictive algorithm based on "human judgment." Making an application that captured and could replicate an "average" person's intuitive interpretation of how a voice sounds required a complex, heteromated assemblage and the coordination of many moving parts. There were the cell phones themselves, the databases through which the calls passed, the audio recorded call files which had to be sorted, filtered through, and categorized. There were the diagnostic inventories for interpreting the content of subjects' speech and determining how "bipolar" they were that week, the systems for labeling the sounds of subjects' speech, and the different and sometimes competing modes of listening and interpretation that these two quantification processes entailed. In order to lay out the stakes at play in the cell phone project and foreshadow the issues that will be probed in subsequent sections, the following section reviews the data collection process, tracing the research subject's speech as they utter it into their phone's microphone, as it moves across databases, between BPU offices, and through different classificatory regimes. MAPPING THE DATA PIPELINE 278 Subjects enrolled in the cell phone study are given a retrofitted smart phone within which the engineering team has installed the study app. The app, which is always running so long as the phone is powered on, appears discretely on the home screen of the study phone as a simplified version of the MWU logo. During 6 to 12 months in which subjects are active in the study, the team encourages subjects to use the study phone in place of their personal phone, as often as possible. In addition to making phone calls and sending text messages, this includes using social media applications installed on the phone and the phone's web browser. For some subjects, enrolling in the study afforded access to a smart phone for the first time, or access to their own smart phone for the first time. Many had only ever owned a flip phone with no Internet capabilities or had a single smart phone that they shared among spouses, partners, or dependents. The BPU paid for unlimited cellular and data plans on the study phones, and so most research subjects were enthusiastic to use the phone and its Internet capabilities without having to worry about running out of data or minutes. This indeed was one of the allures of participating in the study. It was not uncommon for research subjects to try and strike a bargain at the end of the study, proposing that they keep the cell phone in place of collecting the stipend they had earned for their participation.63 In addition to using the study phone as often as possible, subjects had to participate in a weekly phone call with a BPU clinician, which the team referred to as "assessment calls." During these 20-30-minute long assessment calls, the clinical team member or staff-person (typically, staff-woman) would ask the subject a series of questions based on two gold-standard assessment scales: the Hamilton Depression Rating Scale (HAM-D) and the Young Mania 63 Participation in medico-scientific trials often unlocks access to resources that are unevenly distributed- particularly access to medical resources (Rapp 2000; Petryna 2009; Nguyen 2010; Benton 2015). Cell phone study subjects' pleas to forgo cash payments in exchange for continued use of the Internet-enabled study phone speaks to how desirably and to how unevenly distributed digital communication resources are in the United States. 279 Rating Scale (YMRS). Based on the subjects' answers to questions from these scales, clinical team personnel would assign two numerical scores to the call, quantifying how manic or depressed the subject was that week. These scales are so-called clinician-rated inventories rather than patient-rated inventories like the Beck Depression Inventory (discussed in Chapter 2). In other words, clinical professionals (social workers, diagnosticians, clinicians, etc.) generate a score according to their interpretation of a patient's responses, rather than patients themselves generating the score. The accuracy of the scores generated through clinician-rated inventories hinges on the clinical professionals' interpretive abilities as well as their verbal strategies for establishing trust and rapport with research subjects. The clinical team's scores had broad implications for the study as a whole, and the design specifications of the study's eventual outcome: the cell phone application. Assessment calls were divided into three "mood" classes, concurrent with the scoring criteria associated with the YMRS and HAM-D. If the call had a YMRS score of 10 or more and a HAM-D score less than 10, then the call fell into the "manic" category. If the call had a YMRS score lower than 10 and a HAM-D score great than 10, then the call was "depressed." If both the YMRS and HAM-D score were less than 7, then the call was "euthymic," meaning asymptomatic or "normal," neither definitely manic nor depressed. Although all of the subjects needed to have a diagnosis of bipolar disorder in order to participate in the study, in order for their calls to be included in the dataset, the clinical team member had to score at least two of their calls in the "symptomatic" range. Otherwise, the subject and their hundreds of phone calls were excluded from the dataset altogether. 64 64Calls were also excluded if the audio quality was poor-if there was too much background noise that made it difficult for the engineering team members to hear the subject's voice. For instance, if the subject used headphones during the call or if they made the call on speakerphone, the microphone picked up more background noise than usual. 280 In this way, the clinical team played an incredibly crucial role. Their judgment of the subjects' weekly symptoms determined which calls were bipolar enough to even count as data. Their scores determined which of the calls the engineering team would annotate, which had implications for the capabilities and limitations of the predictive cell phone app's algorithmic infrastructure. If the data upon which the app was built only included extreme cases of bipolar symptom manifestation, then the app itself would only be capable of identifying these extreme cases. The clinical team's interpretive practices and the broader teams' classificatory strategies also concretized the image of bipolar disorder as a disorder of extremes, and of clear-cut binaries, an image that does not align neatly with the experiences of all people living under the diagnosis of bipolar disorder. The team excluded what one of Emily Martin's ethnographic interlocutors referred to as the "white spaces" of being bipolar: moments of stasis, in- betweenness that do not fall clearly on one end of the two pathological poles or another, or lapses in symptomal experiences altogether (2007: 187). As soon as the research subject presses the "call" button on their BPU-supplied phone, either answering or initiating a call, the app begins making a recording of the sound that the phone's microphone captures. The app is designed to only record the sound captured by the study phone's microphone. It does not record the audio of the subject's interlocutor or the voice of whomever else is on the line (because only the research subject has consented to have their calls recorded, and it would be prohibitive to try and consent anyone and everyone that the subject spoke to over the phone). After the call ends and either the subject or their interlocutor presses the "end" button, the recorded audio file is encrypted, transformed into a data file which cannot be listened to, and then stored directly on the phone. Once the phone is connected to Wi- Fi, the encrypted file is uploaded from the phone's storage to a secure server somewhere on the 281 main MWU campus. The files are then de-encrypted and arrive at their final destination in the form of a listenable audio file: a Depression Center database managed by Chen, a Chinese data scientist in his late twenties on the engineering team. The uploading and de-encryption process takes 24 hours, after which subjects are given the option to delete the data file from their phone's storage folder. When clinical personnel first sign research subjects on to the study, reviewing the IRB protocol and securing their informed consent in one of the BPU's windowless offices, Chen joins them, explaining this whole process, demonstrating how to delete the files, and how to connect to Wi-Fi if the subject has never done so. If they like, he helps them set up a passcode on their phone to help ensure their privacy. This is one of the rare occasions in which an engineering team member interacts face-to-face with research subjects. Otherwise, engineering team members interact only with a recording of the research subjects' speech. A considerable amount of time passes between the point at which the calls arrive at Chen's database and the point at which engineering team members begin listening to and annotating them. Only when the subject has completed the study and they have an entire 6-12- month corpus of calls does Hassan begin processing the calls of subjects that fit the "symptomatic" criteria. Hassan uses an algorithm of his choosing (the COMB-SAD algorithm) to split the calls into 3-30 second segments. This algorithm is supposed to help him by combing through the hundreds of calls and cutting up the audio to produce short, 3 to 30-seconds long segments that contain continuous speech. Despite the aid of the algorithm, Hassan's task of filtering and splitting the phone calls was time-consuming and laborious. He would work through the day and into the night until dawn, monitoring the algorithm as he ran it over a single research subjects' audio files, checking to see that the algorithm had segmented the calls correctly by 282 listening to samples of segments to confirm that the sample of segments he selected contained more speech than sound. As a final step and in an effort to discourage annotators from paying attention to the content of the call, Hassan shuffled the segments out of chronological order. Even after his long hours and careful efforts, Hassan and his algorithm did not always catch all of the errant, noisy or speech-less segments. Sometimes, annotators would come across segments that contained no speech and only contained ambient sounds (the sound of a car backing up, the sound of a door opening and closing, the sound of a dog barking). One subject had a large collection of pet birds. In many of that subject's segments, the overlapping conversations of their pet birds overpowered the sound of the subject's one-sided conversation with the clinical team member-the human's speech was inaudible over the birds' speech. Other segments contained elongated sighs, or laughs, or coughs. Hassan and Chen instructed annotators to manually mark these kinds of segments to be excluded from the dataset, categorizing them as either "too noisy" or "not enough speech." There had to be enough speech-like sounds for the annotators to listen/not listen to and judge; it was too difficult to determine the emotional feel of a sigh, cough, or laugh in abstraction, without the presence of other forms of speech. Thus, while the clinical team members controlled the boundaries and definition of what constituted symptomatic speech, the engineering team was responsible for distinguishing the boundary between significant sound and insignificant sound, defining what counted as "noise." It was up to the annotators-myself included-to demarcate the threshold between meaningless speech (containing only sounds) and meaningful speech (containing enough words). All of this interviewing, scoring, categorizing, filtering, excluding, and shuffling produced the segments that I helped to listen to and score yet again alongside two MWU undergraduates-Josh and Aubrey-in our role as annotators on the engineering team. When I 283 arrived at the BPU in May 2017, MWU's IRB had since approved the revised protocol and the engineering team had begun analyzing and adding their own numerical value to segments of the assessment calls that were symptomatic enough. In total, at the time of my fieldwork, the dataset contained audio-recorded calls from 43 participants, with about 21 weeks per subject, totaling at 39,445 calls featuring over 2,880 hours of speech. Within the dataset of symptomatic subjects, audio recordings were divided up into two categories: assessment calls and personal calls. Assessment calls, which made up 933 items in the data set, are calls conducted with a BPU clinical team member.65 Personal calls, on the other hand, are all other calls made with the study phone. Toward the end of my fieldwork, the BPU made yet another IRB amendment, asking for permission to access the personal phone calls of consenting participant. After a clinical staff person individually called and re-consented all 43 of the participants in the data set, during my last month at the BPU, the engineering team and I began to sift through these personal phone calls, processing them and preparing them for annotation by removing segments that contained indefinable information from the dataset. El ^mam is oxgf fmI t" ~usamwd 0 0 0 0 Q 0 0o 1 2 3 4 5 6 7 * * Ce -------------------------- -- ------------- . o o 1 C3 0a 0 0 D 1 2 3 4 s 6 7 U 65 These calls were conducted around 2015, when the IRB passed the amendment. A number of the BPU clinicians who conducted them had long since left the BPU to pursue graduate careers. 284 The user interface of the annotation software features two scales, visualized with the schematized outline of a person. On the activation scale, a small grain within the center of the person's torso grows more and more erratic as the numbers increase. The person above the 9 rating, the highest possible rating for activation, has exploded. On the valence rating scale, the person above the number one is frowning deeply. The frown slowly transforms into a smile, with the broadest smile occurring above 9. Although both modes of listening-annotation and assessment-were necessary to assembling the basic skeleton of what would one day be the predictive algorithm, clinical and engineering team members listened with different and at times conflicting ethical and acoustemological imperatives, ideas about the relationship between mood, emotion, and speech, and standards for achieving objectivity. The teams were physically disconnected, sat in separate offices, and did not cross paths often, aside from Chen's brief interface with research subjects during the consenting process. Although the PI held monthly meetings for everyone at BPU to attend for the purpose of updating each other on their work, the engineering team rarely if ever attended these meetings. Members of the clinical team involved in the cell phone study also attended infrequently. Josh, Aubrey, Hassan, Chen and I all sat in the same office, talked often, and at times passed the headphones around so that we could all listen to and discuss the same troublesome, noisy, or difficult to annotate audio segment. We would eat lunch together in the office or at a picnic table overlooking the rippling grass fields behind the geriatric hospital. Hassan would lead mini lectures on machine learning, guiding Josh, Aubrey, and Chen through calculating the agreement across our labels, and helping Chen troubleshoot data management issues. Afterwards, I would brew the team a new pot of coffee in the communal kitchen, ferrying the coffee back to the engineering office in old BPU mugs and single-use paper cups. The clinical team, on the other hand, was far larger and far less cohesive, consisting primarily of university employees rather than students, post-docs, or faculty. Though many junior clinical team members-like Lauren, who I will introduce in the next section-sat in the same shared, open-plan office, they were all responsible for different tasks. More senior team 285 members-Adele, Cheryl, and Rochelle-had individual offices and held managerial or supervisor positions over the junior women. If they visited the open-plan office, it was to delegate tasks to the junior women who worked there, or to partake in the snacks that women in the office left on a communal table in the middle of the office. ASSESSMENT: PROFESSIONAL LISTENING In this section, I focus on the work of conducting weekly assessments. This will serve two purposes. First, it will help set up a comparison between assessment and annotation, two ways of listening to research subjects' speech that are necessary to building the app's predictive algorithm. Secondly, it offers an expanded case study on the expertise and skills that conducting effective psychiatric assessment requires, a counterpoint to the treatment of assessment as unskilled labor. I hone in on the assessment call techniques of Lauren and Rochelle, two different clinical team members at different stages of their careers with different degrees of professional experience. In so doing, I underline that assessment is a skilled practice, while distilling the ethical commitments that the listening of psychiatric assessment-versus the listening involved in psychiatric treatment-entails for seasoned clinicians on the teams. I find the term "professional listening," a play on Goodwin's "professional vision" (1994) useful for making sense of what it is BPU clinical team members do when they lead an interview over the phone with a research subject who they may have never met and who they may never speak to again. Echoing Haraway's (1988) contention that there is no "god's eye view"-no objective or neutral standpoint from which to interpret the world around us - Goodwin defines professional vision as "socially organized ways of seeing and understanding 286 events that are answerable to the distinctive interests of a particular social group" (606). As Cristina Grasseni (2004) argues, seeing is always looking -the pointed directing of sensory attention to some components of a phenomena, marking them as significant while letting other components fall into the background and to the edges of the attention. Goodwin is not only interested in asserting that all sensory interpretation is situated and "perspectival" (606). His work is often concerned with detailing how ways of interpreting and making meaningful streams of sensory phenomena are "lodged within endogenous communities of practice"--how they are indebted to professional norms, and chained to historically articulated, disciplinary conventions (ibid.) In other words, he writes against the privileging of expert interpretations--like the testimonies of expert witnesses in court-as superior and objective simply because they bare the moniker of expertise. At the same time, Goodwin forwards an anti-essentialist approach to studying how professional frameworks of interpretation are related to a professional's object of scrutiny. He emphasizes that professional modes of sensory interpretation are reinforced and relayed through material artifacts and materially grounded practices-like coding schemes, pointing, and highlighting-which play a pivotal role in guiding, hewing, and regimenting what the professional expert ultimately sees or hears. For instance, Tom Rice (2010) provides an excellent example of the formation of "professional listening" as it unfolds in the day-to-day practices of medical auscultation training, in which students learn how to interpret the meaning of the body's internal sounds mediated by a stethoscope. Like my informants, for seasoned physicians and apprentices of medical auscultation alike, biological sounds are not inherently meaningful on their own. Instructors must work to make bodies audible to their students, directing them on how to listen, and how to listen through technical apparatuses. In the same way, clinical team 287 members at BPU always listen to research subject's calls through their disciplinary frameworks, such as psychiatric conceptualizations of mood, emotion, and personhood, and through the discipline's tools for quantifying fluctuations in affective states. Thus, "professional listening" is an especially useful analytic for understanding how the assessment inventories used in the interviews-HAM-D and YMRS-structure and guide how clinical team members attend to and interpret speech, shaping what the clinician is listening for. HAM-D and YMRS are tools of standardization and alignment, technologies for converting subjects' responses into numerical values that represent how depressed, manic, or "normal" the subject is on any given week. In order to complete the task of assessment, clinicians must learn to interpret subjects' responses through the inventories, and through the dual, opposing lenses of "mania" or "depression." Inventories not only shape what clinical team members listen for through the phone-what information is salient, what is less important and what should be foregrounded. Inventories and the task of assessment itself shape what the clinician says and guide how the clinician converses with the subject. Because clinical team members must figure out how to encourage the disclosure of personal information that will help them assess the subject's mood, they must learn to deploy verbal tactics for establishing rapport and trust in the absence of any other cues beyond their own voice. Lauren, a junior member of the clinical team in her early 20s who was new to conducting assessment calls, had begun working part-time at the BPU while finishing her Bachelors of Science degree in psychology at MWU. She had graduated in the spring and had transitioned to a full-time position in the summer of 2015, planning to take a year off from school while applying to graduate programs in social work. Outgoing and not the least bit self-conscious, Lauren agreed right away to let me sit next to her at her desk and observe her conducting assessment calls, 288 scheduled for Wednesday afternoons and early Friday mornings. Lauren sat in the open-plan office alongside five or six other junior clinical staff. Like Lauren, most of the women were white, recent college graduates who had majored in psychology. Their desks were positioned closely together, offering little personal space. All at once and without appearing distracted, Lauren and her office mates would take phone calls related to the longitudinal study: making or canceling appointments, clearing up billing errors, scheduling life history interviews, helping with recruitment, and so on. At any given time during the day the office was buzzing with their various phone conversations, or with their casual talk as they stood around a long, rectangular table in the center of the office, labeling and packing research subjects' blood samples into Styrofoam containers filled with dry ice. This table sat next to the table where they would gather snacks to be shared with everyone else at BPU. Aside from their desks, the table for sorting blood and the snack table, the room was filled with filing cabinets containing reams of old paperwork and the "swag" subjects received commemorating various participation milestones (a pen for one year, a water bottle for five years, a tote back for ten years). I struggled to concentrate on Lauren's voice alone as I listened alongside her every Wednesday and Friday. It was easy to become preoccupied with the other, unrelated calls or conversations going on around me, including the conversations of people milling in and out of the office to peruse the snacks or chat with the women sorting blood. Lauren's calls were short, sometimes under 20 minutes. Lauren blamed the short length of the calls on the subjects she was assigned to: two curt, middle-aged men who were not very symptomatic. She said the men seemed annoyed with the calls, which I found odd. If they didn't like conducting these weekly interviews, why had they signed on for the study? Eventually, one of the men dropped out, 289 telling Lauren that he found it tiring to constantly answer the same questions about how his week had been and how he was feeling, especially since he felt that he was doing quite well. Only when this subject exited the study did I start to consider how difficult it might be to keep an interlocutor engaged in a conversation that followed the same sequence and the same series of questions every time, including questions that covered extremely personal topics (bowel movements, thoughts of self-harm, shifts in libido, among other things). As time went on, I found that Lauren followed more or less the same pattern for each call, day after day, week after week, subject after subject. She would ask the questions in the numerical order in which they appeared in the HAM-D, would pause waiting for the subject to answer, and move on to the next question, asking a follow up if the subject gave a one-word response. Adele, Cheryl, and Rochelle told me that this rigid, recipe-like approach was a necessary step in the process of memorizing and internalizing the inventory. Novices had to be sure that the conversation was as standardized as possible and had to be sure that they asked all of the questions. Only after they had committed the inventory to memory by rehearsing and repeating it in the same generic format could they afford to be a little more creative with their approach. To conduct the assessment call, clinical team members used a supplemental HAM-D guide called the SIGH-D (Structured Interview Guide for the Hamilton Depression Rating). HAM-D contains 21 "areas" associated with DSM-defined symptoms of depression (such as depressed mood, insomnia, feelings of guilt, work and interests, motor control, suicide). The SIGH-D breaks these areas up into categories of questions and follow-up questions. For example, for the HAM-D "feelings of guilt" area, SIGH-D instructs the interviewer to ask the interviewee, verbatim, "Have you been especially critical of yourself this past week, feeling 290 you've done things wrong, or let others down?" Underneath the main questions, which should be asked using the exact language that appears on SIGH-D, are a series of suggested follow-up questions or "anchors." The interviewer can use the anchors to encourage the interviewee to respond more specifically to the primary question, for example: "if YES [to the first question]: what have your thoughts been?" "Have you thought that you've brought (THIS DEPRESSION) on yourself in some way?" "Do you feel you're being punished by being sick?" Anchors are tied directly into the scoring guidelines. For "feelings of guilt," a score of zero indicates the absence of feelings of guilt, a score of 1 indicates "self-reproach, feel he [sic] has let people down," a score of 2 is "ideas of guilt or rumination over past errors, sinful deeds," while a score of 3 is "present illness is a punishment, delusion of guilt." Note that the guidelines for a score of 3 mirror the anchor for the "feelings of guilt" area ("do you feel you're being punished by being sick?") Cheryl, Adele, and Rochelle told me that novices tend to rely more heavily on the anchors precisely because the questions prompt the interviewee to respond in a way that corresponds directly with a score. While bordering on the tautological, this correspondence helps train the novices in interpreting the subjects' responses. .-Am 4j 291 Mu has Vwmgy beentsiast Me= 55U Gomm 0 -rnn MR yu beentirad alltitTU m? I- tuaTLnmin Iiab. bark or head. sadeches. - " bT, mncieadres. 2 - playing wiUth~ hair. aty-. Thiae e, a (nou hod ay-bckshe. Lamnofaeerandfatigumility. 3 - moing it, omtasitftil .I , , orMusce es 2 - MVclew-ct ayfto (24) 4 - )a-whisng, n"I biting haiT- pAL-ing. biting of lips (32) This a. hem ycsufalt n nervines In yourlib. bw*o ed Noeyube empaily citicalof rULV= O CUU 10M 17-rm EMMM yoursef tipst waki. fastingicu'v 9MM: (33-3d) 0- IF YU: Watham yourtoghts bam? I - saif-repooa. fe.1..hkhmkmlt peopleadow istsyoubenfeling uilty ta y 2 -idesofgilt or rufftiati ove thingt ht yu'tIs or rotd om Pat rror ccsnful ed 3 - prn llness Isaa pnishmnt. Deluions .ofgilt (IMd MIK 21an yoursfInsm 4 - Tuar acoeatoy a rasicisoy mol w or aprie treaning visal ui a Do you felyu'rebeing pAs~Udby being nick? Trsa st wdL, bve ouad OW IUMm lindng or that yould hebotbotoff 0- ,a aae& abo~t haing atLm15of I1- foals life.inantwth liing hutdngr-akilayourslf? 2 - wishe heaes ea or wV dxx4. of Mesbi athatsamf 33Y US Wht ham yu d =4. abot? 3 - suicidal Ideasor gastur Hoe you am~ilyd we *rtingbo 4 - attiat Sicide (26) - you beenfeeling peially ANIfMWV RYO 0 o mdifficult Ham 'muban IstT" a( lotbout I - abjactie onste atn irriaility littlaecbumtpt-t th~thingsi. 2 - wmayingabotairmt 3 - avrvhanui~vattitue pprent In ITYU: Like what, fr xmpiA7 toosor speec 4 - fenar s wihot queticAMn (27) The SIGH-D (HAM-D interview guide), featuring anchors on the left column and scoring criteria on the right. Because they had mastered the HAM-D, senior staff took a more holistic approach to the assessment calls. They were required to ask questions in the exact language that appears on the SIGH-D at some point during the call. But in so doing, they would collaborate with the subject to craft a narrative 66 about the past week that just so happened to contain answers to the question, the information that they needed. This is what I heard when shadowing Rochelle, a senior clinical team member in her forties who was previously employed as a social worker. She had her own, private office, situated one long, air-conditioned hallway away from the junior staff 66 Mattingly uses the term "therapeutic emplotment" (1994) to describe how clinicians and patients co-construct meaning out of injury by collaboratively creating a narrative trajectory of the patient's experiences from illness to wellness. Discursively inserting the patient into this narrative structure and then accounting for their place within it throughout the process of treatment, Mattingly argues, is central to the healing process in and of itself. While interviewers like Rochelle cannot conduct therapy with research subjects over the phone, they display techniques of therapeutic emplotment in order to make meaning out of the subject's report of their symptoms over the past week. This is yet another method through which they establish rapport. Co-narrating the subject's experience transforms the conversation from a one-sided interview to a more collaborative endeavor, giving the sense that the interviewer and the subject are working together, rather than the interviewer guiding the interaction in pursuit of the information she needs. 292 office. On the wall outside her office door hung a single painting -Magritte's Golcanda- an eerie and lonely image that contrasted so strongly with Rochelle's warm personality that it must not have been hanging there by her choice. Like Lauren, Rochelle was responsible for conducting assessment calls with two research subjects. Rochelle would use the initial questions on SIGH-D but never used the anchors. She would weave through the SIGH-D, flipping back and forth between the pages of the guide as the conversation progressed, filling out the score as the subject spoke without interrupting. She would always respond to whatever the research subject said, and asked follow-up questions by inviting the subject to tell a story, even if what the subject had said had nothing to do with the question she was focused on ("what happened next? How did you feel after your boss canceled the meeting?") She opened and closed the calls with questions about things that had happened to the subjects the week before, or with topics that had nothing to do with mental health at all. When she learned that one of her subjects had the same breed of dog as her, she interlaced bits and pieces about their dogs into the call, sharing training tips or stories about trips to the dog park. Although I knew that Rochelle was using the SIGH-D and HAM-D to structure the conversation, it was hard for me to keep track of the inventory and guide as I listened-they would melt away. Senior clinical team members like Rochelle and Adele would tweak and manage the impression of what they were looking for, carrying on the conversation by responding to the subject's speech as if it was tied to the subject's self rather than tied to data or information pertinent to the assessment scales. Adele had been a social worker for years, not only caring for patients in a state psychiatric hospital but also conducting field research and site visits for a federal research institute. She was the most experienced interviewer on the team, and so she 293 oversaw assessment call training, which involved junior members shadowing senior members and then conducting a mock interview with a senior staff member. Adele admitted that novices struggled the most with quickly establishing trust and rapport with a research subject, which could have consequential results. Without a sense of rapport, in her experience, the subject was more likely to give single-word, recalcitrant responses to the interview questions. In my conversations with senior team members about their techniques I had observed, they would cite the notion of "rapport" as the grounds for ensuring that research subjects disclose private, personal information. In so doing, they would invoke the ideology of inner reference (Carr 2010), a language ideology that circulates in American mental health care contexts that is linked to, as we saw in Chapter 3, Social Penetration Theory (the "onion" theory of the self). As discussed in the Introduction, according to the ideology of inner reference, speech is primarily referential, and expresses a speaker's otherwise interior, hidden self. Adele often described personal details as interior, as hidden, or below the easily observable surface, and described the interactional achievement of trust and closeness in language that coincided with the psychology team at the West Coast University Research Institute. As Adele put it, the goal of rapport is to coax the speaker in to "open up," to create conditions that enable the plumbing and excavations of personal details buried "deep" within the subject. One way to open a subject up was to draw on what you knew about them (their age, their gender) and go "off script" by asking them about a subject matter that they might find interesting and that had nothing to do with their mental health or the SIGH-D questions. If the research subject was male and the same age as Adele's two sons, then she would draw on conversations she had overheard between them about the hottest, newest video game to make small talk with the subject. This, she explained, would impress the subject and put them at ease, giving them a chance to talk about something that they 294 knew and enjoyed-something that was personal but positive. Adele and others also explained that an interviewer could achieve rapport through the performance of symmetrical transparency of self, an explicit reference to techniques of Rogerian psychotherapy (see Smith 2005). According to Adele, the best way to get a subject to give the information that an interviewer needed was to "meet them halfway" in their excavation of self by "giving a little bit of yourself" in return-by sharing personal details. This could be as simple as sharing that you have sons, or as benign as mentioning that you have a dog of a certain breed. Rochelle and Adele both told me that mentioning these small details helped make the conversation feel more than one-sided, giving the sense that both parties were sharing personal information, rather than one party asymmetrically extracting data from the other. Note that these are the same verbal practices that animated the bodily movements and non-verbal communication of Abby, the virtual human interface of the system built at West Coast University. This highlights, once again, that empathy is not necessarily an affectively motivated state alone-it can be formulated through linguistic practices that shape a speaker's interpretation of the listeners subject position. At the same time, Adele was aware that sharing personal information to encourage disclosure was only viable because clinical team members did not have a "clinical relationship" with the subject. In the context of psychotherapy or psychiatric treatment, clinical professionals must tightly guard and maintain the boundary between their self and the subject's self, always attentive to how the professional's conceptualization of self might torque or contour the terms of their relationship and therefore the nature of the conversation and the act of therapy itself. Since the clinical team at BPU was not allowed to perform psychotherapy over the phone, they were in no way responsible for or accountable to the subject's mental health and how the weekly phone 295 conversations might impact it. Their task at hand was to gather the data they needed for that week: the HAM-D and YMRS scores. They could of course care about the subjects as people, but the nature of their task at hand discouraged them from caring about the subjects as patients, or as expressly clinical subjects. Listening alongside the psychology team members as they conduct assessment calls demonstrate that psychiatric assessment is a skilled activity, rather than a mechanical one. Indeed, assessment is most successful when the inventories (the tools that guide and shape what the listener is listening for) melt away into the background, giving the impression that conversation is "just talk" rather than a genre of interaction. Assessment is a complex practice that requires training, can be done poorly or well, and depends just as much on the abilities of the interviewer to internalize the assessment scales as it does on the willingness or openness of the interviewee. In observing and conversing with senior BPU clinical team members, it also becomes clear that conducting assessment over the phone requires constructing yourself as a very specific kind of listening subject vis-ai-vis the research subject. The listening subject of psychotherapy or other forms of psychiatric treatment is not the same as the listening subject of psychiatric assessment. Unlike psychotherapy, assessment requires a shallow investment in the research subject's wellbeing, not just at BPU but also across the discipline and practice of psychiatry in general. 67 Clinical team member could deploy verbal strategies for encouraging disclosure-such as the strategic performance of disclosing their own person details-precisely because they bore no responsibility over the terms of their relationship with the research " See, for example, the way assessment and diagnosis unfold in resource-poor public health settings, like the emergency psychiatric unit that is the subject of Lorna Rhodes's ethnography, Emptying Beds (1991). In these kinds of settings, clinical personnel do not treat assessment or diagnosis as epistemological practices but as bureaucratic ones, aimed at moving patients through (or out of) the hospital. Given the uneven ratio of patients to personnel and hospital rooms, in public mental health contexts, diagnosis and assessment become tools for divvying up vital resources, rather than illuminating the inner truths of psychic pathology. 296 subjects, and how this relationship might impact their mental health. It wasn't that they didn't care about research subjects. To say that they have a shallow investment in subjects' wellbeing is not meant to be an indictment of their individual moral characters. Rather, as a matter of doing their job and just as other people conducting psychiatric assessment do, they had to follow the moral framework that their professional task at hand dictated: guiding the interview in order to gather the data that they had been instructed to gather. In order to listen professionally, they had to avoid listening personally. Here, "thin listening" is less about respecting the speaker's privacy, and more about the listener taking care to not overstep professional boundaries, or perhaps, to protect their own psychological wellbeing: a form of distancing to avoid getting overly attached to the many research subjects whose troubles they cannot soothe. ANNOTATION: LISTENING LIKE A COMPUTER Recall the end goal of the cell phone study: map speech to emotion, and then map emotion to mood, so that BPU researchers can track changes in vocal emotional patterns to anticipate changes in mood. The purpose of assessment is not to provide treatment to research subjects but to calculate-and numerically represent-fluctuations in research subjects' mood states between mania and depression. In turn, when annotating the very same calls in which clinical team members interview research subjects, annotators are supposed to calculate the emotional nuances audible in research subjects' voices across the entire corpus of calls produced during their enrollment in the study. While clinical team members focus on the semantic content of research subjects' speech (their responses to the assessment questions), annotators are directed to ignore semantic content and assign a rating that corresponds to the emotional "activation" (energy) and 297 "valence" (positive or negativity) of the sound of the research subjects' speech. Altogether, working on the same calls yet listening to them in different ways and with different quantification scales, the engineering team and the clinical team render mood, emotion, and the relationship between the two calculable. As I have argued above, conducting assessment requires "professional listening" because the professional norms, guidelines, definitions, and tools of psychiatry dictate how clinical team members direct their conversations with research subjects, and how they coax out and interpret subjects' speech. Assessment also requires professional listening because team members are calculating changes in mood rather than changes in emotion, and mood occupies a more stable, well-defined position within psychiatry as opposed to emotion. Bipolar disorder is defined by and diagnosed due to radical changes in mood state, and mood as a category of experience therefore requires more professional expertise in order to spot and understand. By contrast, the listening associated with assessment-geared toward quantifying changes in emotion-is decidedly un-professional. As the guidebook I helped the engineers write states, emotion (in the context of psychological research on bipolar disorder) is easier to observe than mood. More than that, since the PI wanted to capture and operationalize a "gut instinct," the lack of professional attunement to emotion was a benefit rather than a barrier. The PI's directive of "training a computer to listen like a brain" suggests a mode of listening that is radically transparent-a mode of listening that is im-mediate. Training in clinical psychology and a familiarity with the various scales and instruments for measuring mode might present another layer of mediation. Thus, under the PI's direction, Hassan had selected two undergraduate interns in computer science to assist with the annotation task: Aubrey, a Chinese-American junior at MWU majoring in computer science, and Josh, her white American cohort-mate. Like Aubrey and Josh, my own 298 lack of training in psychology is part of what made me such a viable candidate to lend another set of ears to the annotation task. In order to teach a computer to listen like a brain, however, the annotators found themselves in a position of having to "listen like computers," a compelling phrase that Chen first evoked when directing us to avoid paying attention to speech content and base our ratings as much as possible on speech sound alone. Thus, annotation differs not only from assessment, but also from the other listening practices in operation at the two other fieldsites. For instance, at West Coast University, the goal was to build a machine interface that convincingly performed a certain image of "empathic" listening as an automatic act, an image that depended upon keeping hidden the young female researchers, whose own listening animated and guided subjects' interactions with the VHI, and upon reproducing a professionally and racio-ethnically marked listening habitus. At the BPU, on the other hand, rather than building a machine that appears to be listening like a human, human annotators were asked to build a machine that listens like a human by listening like machines: attending to sound without processing, internalizing, or understanding content, disavowing the person attached to the voice. This presented an insurmountable tension: the directive was to listen intuitively, to go with our guts, but also to listen to speech in a way that was completely alien if not altogether impossible. In trying (and failing) to listen like computers, the annotators eroded the model of language upon which the entire projects rests: a model of language in which form can be wrested apart and held separate from meaning. In their trials and tribulations of trying to listen like a computer, the engineering team also deflated the lofty promise of an im-mediated "listening brain." Not unlike Meredith, Hassan was a self-described conservative engineer. In his opinion, before the app could be built or deployed in a clinical context outside of an experimental set-up, 299 the team would first need to assess whether or not a human listener could consistently, auditorily interpret emotion in a uniform way. Thus, the many labels that Aubrey, Josh and I added to the audio segments would not be used to build the cell phone app-Hassan had us label the data instead to calculate the statistical agreement across our labels (quantifying the extent to which we all agreed with each other's labels). Within the first few weeks of meeting him, Hassan admitted to me that he had put together the annotation task precisely because he thought it was impossible. He did not anticipate that Aubrey, Josh and I would agree with each other's labels in a statistically significant way, and his hunch proved correct. If humans were incapable of consistently agreeing upon the emotional texture of the sounds they heard in speech, Hassan would say, it was unreasonable to expect that an algorithm could identify these features in a robust, meaningful, or accurate way. His cynicism with the project's overall goal was born from skepticism with the status of emotions as stable objects of analysis. He would compare automated emotion recognition with voice-to-text translation, and with speaker recognition, the two main problem spaces of his doctoral thesis work. Unlike automated voice recognition, automated speaker recognition "can be done, because at least we know that the speaker actually exists and we know who he is"-the speaker's identity, and his existence, can be confirmed. Emotions, on the other hand, reside in a world beyond the calculable, material realm. He would gesture to the door of the engineering office, which he left half-open if we weren't having a meeting: "We cannot say that the door is open or closed. It is something in between. It is fuzzy." To Hassan, the existence of emotions was an ontologically uncertain matter, not fit for computer science. Nevertheless, if Hassan was going to prove the entire premise of the study wrong, he was going to do it in a consistent, systematic way. Hassan designed the annotation task and software 300 interface we used to rate segments in an effort to make emotion less fuzzy. Just as HAM-D and YMRS make mood tractable, stable, and quantifiable, holding it in place, the annotation task required a technology through which the team could attempt to pin down, concretize, and reify emotion. Hassan opted to use the dimensional model of emotion for the annotation task, a model popular among computer scientists and speech signal processing experts studying the relationship between emotion and speech quality. In the dimensional model of emotion, all potential human emotions can be plotted within four quadrants, defined by an X-axis of activation (speech energy) and a Y-axis of valence (speech "color" or "charge"). 6 8 For the task, Aubrey, Josh and I listened to the same set of audio segments, and rated the "activation" and "valence" of each segment on a scale of one to nine. For example, a "one" corresponded to low activation and low valence (low levels of energy with direly negative sounding speech) while a "nine" corresponded to high activation and high valence (high levels of energy with effusively positive sounding speech). Over time as we annotated more and more segments and even began to re-annotate segments we had already listened to, we began to hear subjects' speech through these numbers, rather than listening to the speech and figuring out how it might fit in to the rating scale. What do activation and valence mean, and how did the engineering team come to make activation and valence meaningful for themselves? Early one morning about a month in to annotating the audio segments, the PI had walked in through the office's half-open door just as Aubrey and I had taken our respective seats in front of the two desktop computers designated for annotation. The PI had come looking for Hassan, who had not yet arrived, and in Hassan's " The dimensional model of emotion offers a more analog, less binary, and broader space for defining emotion as opposed to the more traditional linear model that posits a set series of possible emotions ("anger," "happiness," "disgust," etc.) 301 absence he struck up conversation with Aubrey and me. Groggy from the hour-long bus ride from the main campus area to the BPU, Aubrey and I vaguely relayed that our work was difficult, and we were often unsure of ourselves. It wasn't easy to concretely quantify activation and valence, two scales of qualification that we had never used (at least not consciously) when making sense of speech in our day-to-day lives. The PI wondered if we knew the historical origins of these terms and asked for a pen and piece of paper to diagram out a lesson for us. He explained that the terms can be traced back to the criteria that Kraepelin had used to plot out fluctuations in his patients' moods in his long-term study of "manic-depressive insanity." He suggested we read Kraepelin's book of the same title to help us swim through the annotation task with more ease. Kraepelin's category of "volition," he told us, is related to "activation," and what he called "emotion" is linked to "valence." Satisfied with his work, the PI left, and I set aside the college ruled diagrams he drew for us. When Hassan arrived two hours later, we presented him the drawings and asked if he knew of these origin stories, or if he had ever read Kraepelin's book. He took the ruled paper in his hands, turning it upside-down. "I myself have no idea what these words mean, activation and valence, and I can't even read this. Aubrey, when I hired you, did I draw this diagram and make you read some 300-page book from 200 years ago?" "All I remember," answered Aubrey, "is that you showed up like twenty minutes late." Despite his uncertainty, as the most senior member of the engineering team, Hassan trained incoming annotators. It fell upon him to define these terms for the people working under him. Like "emotion" and "affect," the team had its own working definitions of activation and valence. When I first met Hassan, he explained that, "activation means excitement-does the speech sound calm or excited? And valence means the negativity or positivity of the speech 302 signal-is the emotion in the signal negative, neutral, or positive?" He would demonstrate the distinction between activation and valence, as captured in the "speech signal," using himself and his own voice as an example. "For instance," he would say, "you might not know that I am depressed, because you can hear, right now, that I sound happy, relaxed. Low activation. High valence." His example resembled something he said often when he arrived in the office for the day, usually after a night of working late until sunrise and video-chatting with his wife, who was finishing her PhD in computer science many states and time-zones away. He'd throw up his hands, smile, and announce as he stood in the doorway, "Hey guys, life is great! I am miserable!" Considering what I knew about him-he was always working, he had to live a 3 hour plane ride away from his wife, his future job prospects were uncertain, he and his wife could not return to Iran for the foreseeable future and struggled financially-it was hard to determine when his sarcasm veered into genuine truth. I do know that his declaration was meant to make us laugh rather than pity him. Hassan's humor tended toward the dire, and often hinged on a rift between what he said, how he said it, and what he believed. This disconnect, he would tell the engineering team, was characteristic of what he called "Persian humor," akin to sarcasm in U.S. American English. He would often state something very serious and sincere sounding, in a grim, austere tone, only later to reveal that he had been joking, and that he did not hold dear whatever he had said. For example, although he had consented to be my research subject and agreed to allow me to record our day to day activities and conversations in the engineering office, every now and then he would pick up my audio recorder, point to it, and ask, in an astonished, accusatory tone, "what is this? You are recording me? I never agreed to this. Turn it off. This is a disgrace. How can you ask me to participate and give me nothing in return?" When I would frantically apologize and jump from 303 my seat to turn off the recorder, he would say, quietly, with a smile, "Beth, Beth. I'm kidding. I am joking," and everyone else in the office would groan and laugh. Like his demonstration of activation and valence, Hassan's joke inadvertently challenges the idea that a listener-like an annotator-can arrive at the sincere, authentic core of speech. It forwards a kind of opacity claim about emotion: there can be a discord between what a person says, how they sound, and how they feel or what they truly believe. Indeed, Hassan demonstrated that the correspondence between beliefs and practice can be shaky and murky with his participation in the entire project altogether: he did not believe its central mission was possible, and yet he helped the team pursue it nonetheless. Hassan and Chen also offered critiques of the project's central claims through their own inability to participate in the annotation task, which they found impossible to do. Both of these men were English-language-learners. They had only recently come to the U.S. and required to speak English in order to get through their days and move about the world. Discussion whether or not it was possible for them to "listening like a computer" enacted a critique of the universal subject position that the formulation of "the listening brain" implies. One such conversation came up during one of Hassan's machine learning training sessions. We were all huddled around the small whiteboard in the engineering office, taking notes as Hassan explained how to calculate the concordance correlation coefficient (or CCC) of all the annotation ratings that Aubrey, Josh, and I had produced (in the service of calculating the extent to which we agreed with each other). During a lull in the conversation, I suggested that it might be an interesting experiment, just for fun, to have Hassan and Chen annotate the segments, calculate their agreement, and then calculate their agreement with the three native English-speaking annotators. Though Hassan replied with his usual pessimism, Chen saw an interesting opportunity in my thought experiment 304 (my emphasis added): Hassan: of course it's make a difference [if it is Chen and Hassan annotating], I mean... Beth: ((laughs)) Hassan: It [the agreement] will be zero Aubrey, Chen: ((laughing)) Hassan: "The agreement will be zero"...I cannot even understand that sentence Chen: That's good!!! We don't want you [to] understand it! Beth: But that's what I mean! Maybe that's Chen: We don't want to lunderstand]! Hassan: --even, I cannot understand emotion Beth: yeah Hassan: I, I listened to a couple of them [segments], I ((ughh)) what is this? ((laughing)) Chen: That's computer! Hassan: for example Josh and Aubrey-- Chen: You're a good computer system! By Hassan's estimation, he and Chen will not agree with each other at all because of his difficulties with English. He jokes that he struggles to understand the semantic content of descriptive sentences ("the agreement will be zero") and so understanding the emotional nuances of a sentence is out of the question. Understanding the emotional nuances expressed in speech is an even higher order, demanding challenge. However, Chen recognizes Hassan's lack of understanding, especially of semantic content, to be a tantalizing opportunity, a resource for "pure" listening to sound alone. Chen imagines that it would be relatively easy for Hassan to disentangle sound from content, since he does not intuitively combine the two when he listens to and interprets streams of speech. Chen was obsessed with the idea of purity and achieving "pure" listening without "cheating," i.e., without paying attention to content. Chen's vision of unmediated, pure listening takes the form of what Chion (1990) Scheffer (1996) and others call "acousmatic listening": attention to sound without regards to its source, the cause of the sound, or the force motivating it (see also Kane 2014). Listening to the excerpts acousmatically was central to the task of building 305 the cell phone study's algorithm, which breaks the acoustic components apart from the denotational components of discourse. Chen was constantly chiding Josh, Aubrey and me for cheating, especially when we began discussing segments that we were struggling to annotate. Our conversations made clear to Chen that we were indeed listening to-and absorbing--the content of segments, since we would use contextual information about the speaker in order to refer to their vocal characteristics and the segment in question, like "the guy who works for Uber," or "the woman who owns many pet birds" or "the guy who has a difficult relationship with his mom and went to see a psychic about her." Chen would intercede in a hushed voice, "cut it out guys. Stop that. No more of that," jerking his head in the direction of the office door left ajar, worried that someone in the hallway might hear our frank discussion of not just the content of the segments but also the research subject uttering them. Chen imagined that an English-language-learning speaker could turn this "understanding" off, and much more easily exit the realm of semantic meaning, protecting the privacy of the subject and preventing agentive, focused listening from sliding into the unfocused absorption of hearing. Yet as our conversation in the engineering office progressed, Hassan shattered Chen's dreams of the American English language-learning speaker as a "good computer system." Instead, Hassan began to suggest that the capacity to identify and characterize "emotional" components in speech, or even discern the difference between sad speech and angry speech, depends on one's native language. In this way, being a non-native speaker is a hindrance rather than advantage. This also suggests that emotional features of speech are not universally produced or universally understood (emphasis added): Beth: So like [...]let's say you overhear Adele talking in her office you can't hear what she's saying, but you can like...would you be able to tell, oh she's having a good conversation. Or oh she's [angry] somebody's in trouble, like... Hassan: Ehh actually initially I suggested this to Meredith, I suggested that let's uh because we 306 don't wanna concentrate on content, so let's ask some...non-native-- Beth: Yeah Hassan: --Speakers to listen to it Aubrey: Mm Hassan: Then I, uh listened to a couple of them [segments] Beth: And it was too-- Hassan: And was like....I have no idea-- Beth: --hard ((laughs)) Hassan: --have no idea what that...so it seems that Chen: Maybe you're just being honest Hassan: Yeah we are not focusing on content Beth: Yeah Hassan: But we are not eh still...able to focus on acoustic eh acoustic features of emotion Beth: Yeah...because it's, because it's not, I think you were right Chen: --is Hassan: Because it's, it's behind the phonemes...I cannot pronounce phonemes correctly, so I don't know...the correct place of this phoneme Beth: Yeah Hassan: how can I know the correct place of [the] angry version of this phoneme? Hassan had initially shared Chen's hunch that a non-native speaker might be uniquely situated to perform the annotation task, but found himself falling up short when sat down and tried to listen to and rate the segments for activation and valence. Nevertheless, for a brief instant, Hassan is seduced by Chen's insistence that perhaps Hassan struggled with the annotation task because he was too "honest," again, because he was doing such a good job of not understanding the content, which is the basic requirement of annotation. This reading flips the typical connections of "honesty" in Euro-American English on its head: rather than corresponding with transparency, being "honest" in listening to sound rather than content keeps the semantic, referential meaning of speech opaque and inaccessible. But Hassan returns to his firm position that he and Chen cannot identify acoustic features of emotion so long as they do not know the standardized "placement" 6 9 of regular phonemes. If they themselves struggle with producing standard " Standardized placement here refers to the oral production of speech sounds in a way that corresponds with their representation in vowel and consonant charts, in which the sounds of American English are plotted according to the positioning of the lips and tongue associated with their production. For instance, a "low back vowel" is a vowel 307 pronunciation, then they will struggle to identify and interpret the meaning of non-standard pronunciation (an "angry" phoneme) in another speaker. Hassan turned the thought experiment onto us, stating that, "if I talk in my language you cannot say [if] I'm happy." Even if he were to laugh, this would not be a sure-fire indication of the affective charge of the conversation. He reiterated to us his struggles to understand the emotional nuances of paralinguistic components of speech, even when it was socially incumbent upon him to do so. This was a steep hurdle to cross when he first arrived to the United States: Hassan: [...] When I first came here, ah, I talk to you know American people and I was like....the person is mad at me? Beth and Aubrey: ((laughing)) Hassan: Somehow this person is- is-- it looks like the person is not really happy ((laughs)). Maybe he's happy maybe he's not happy so, so, it's all-- I always have this problem that-- still I have this problem that sometimes...I can't understand. Even when Hassan needed to interpret people's expressions of emotion to get through his day, he struggled to distinguish angry speech from so-called neutral speech, and he still struggles with this to this day. Much to the delight of the anthropologist, our conversation was edging toward an exciting conclusion: the engineers conceding that emotions are not universally expressed, might not be universal, and therefore cannot be universally interpreted. It was Chen, rather than Hassan, who put this breakthrough into words, describing what Hassan's experience implies for the system that the engineers are building: Chen: Yes, but then, I have a question to your computer system. Is your computer system, your neural network, [it] has American culture knowledge? Aubrey: ((laughing)) Beth: that's the-- Hassan: Yeah, yeah, yeah, exactly! It's language dependent Beth: --but that's the, that's the point produced with the tongue positioned low in the mouth (relative to the roof of the mouth) and bunched toward the back of the mouth (relative to the mouth's opening). 308 Hassan: --language dependent yeah Chen: so it has the culture, in it? Beth: yeah Hassan: Yeah it has a language specific knowledge [...]for example if you train a system based on, for example, English language? Then you test it for Chinese language-- Chen:yeah Aubrey: I guess [laughs] Hassan: --it doesn't work, I mean Aubrey: yeah Chen:yeah Finally, Chen comes out and says it: the neural network-the basis of the cell phone study's predictive algorithm-does not have general knowledge or does not "listen like a brain." Instead, the system will have "American culture knowledge" in it, because identifying emotion is a culturally specific ability. Hassan adamantly agrees, and Aubrey, at first finding the idea funny, concedes that technically Chen is correct because the system has "language specific knowledge," later agreeing that she and I have access to knowledge about the link between emotion, sound, and speech that the two men do not. The cell phone app, if it ends up being built, will be limited by the knowledge and experience of the people responsible for building it, and so it will identify and therefore define emotion according to the limits of this knowledge. As Hassan notes, the system could not identify the emotions of Mandarin or Cantonese speakers, because it is based on the language-specific knowledge of native speakers of English. Together, the engineers unpacked the "human," challenging the cell phone study's biological essentialism. Pushing this line of thinking to its next step implies that, if the vocalized expression of emotion is not universal, then perhaps what the ECU team calls "vocal biomarkers" may not even exist. Even acoustic features of speech are wrapped up in cultural mediation from which they cannot be entangled. Moreover, this implies that not everyone has access to the same "intuitive hearing" because listening is a cultural practice and listening to (and 309 distinguishing) emotion requires cultural or "language-specific" knowledge. They insinuate that form and content are connected through cultural mediation that requires communicative competence (which is more than just a "gut feeling") to grapple with and weed through. This implies that the central project of the BPU cell phone study -to splice apart form from meaning -is an impossible one; this goal ignores the very nature of language as both material and semiotic. Finally, Chen and Hassan's ruminations strike a chord with the observations of earlier anthropological studies of technologists and the automated systems they build. Algorithmic systems, then, are not capable of recognizing pattern beyond human capacities-they merely reiterate patterned associations between qualities and types that already have a sociopolitical life and historical trajectory (see Noble 2018). Engineers and computer scientists, the professional practitioners deep in the weeds of building these systems, have a keen understanding of the systems' limitations, and of the fact that they are subjective rather than objective eyes or ears from nowhere. INFRASTRUCTURES OF FEELING Josh, Aubrey, and I were asked to listen to the calls and "follow our guts," use our "best judgment" and "intuition" when scoring the activation and valence of a speech sound. We internalized this language of intuition, and when debating over what score to assign a to segment, we would tell each other that it reallyjustfelt like a 7 or a 3 or a 2. At the same time, Chen and Hassan ordered us over and over again to focus on speech sound alone and eschew or at least avoid discussing the content of the segments, an altogether counterintuitive mode of listening to 310 speech. In other words, our task was to treat the speech as familiar, drawing upon our intuitive sense of its emotional sound, while simultaneously treating it as unfamiliar, as pure, acousmatic sound rather than speech at all. In this section, I expand upon the discussion above about the "cultural knowledge" that annotators might encode into the infrastructure of the cell phone study's app. I also show the ethical, affective tensions that attempting to listening like a computer can bring, through the struggles of both "getting to know" how a research subject speech sounded (getting a sense of what their neutral, asymptomatic, or 5-level speech was like) while also disavowing the particularities and personal details of their conversations and circumstances. The team selected Aubrey, Josh, and me to annotate the segments in part because it was so time consuming, and we were all relatively unskilled and lacked the technical training that would've enabled us to help out with less menial tasks. Moreover, we were selected due to our simultaneous (lay) expertise (in American English) and our in-expertise (in psychiatry and psychology). Nevertheless, the annotation task was extremely difficult. Even as we began to internalize something about the relationship between vocal qualities and the pathological states of bipolar disorder, we struggled to put into words and verbalize what it was exactly that we were rating, what about the subjects' voices motivated our decision to assign them the ratings that we did. When I first began the annotation task, I spent a long time on each segment, replaying a single segment over and over again, mulling over my choice. As time went on and the rating scale began to inhabit my understanding of how bipolar disorder sounds, I could rate many more segments per day than I had initially been able to. I resigned my judgment to the scales. Annotation ratings were supposed to be "subject dependent." The numbers should be 311 specific to an individual subject's speech patterns, which lead to a problem. Ratings were not supposed to depend on some external, universal standards, such as a general understanding of how speakers of American English typically speak (in terms of the activation and valence of their speech). Instead, whenever it was time to begin annotating a new subject, we had to spend a good amount of time clicking through and replaying the segments without assigning a rating, all in order to develop a sense of what activation and valence sounded like for that particular person. This required determining how their most neutral speech sounded-speech that was not extremely energized or lethargic, and speech that was not clearly exuberant or sad. We referred to this as figuring out the subject's "five" speech: speech that was rated at af ive for activation, and a five for valence. The concept of five speech ratifies the notion that, for people experiencing bipolar disorder, non-pathological speech coincides with the absence of emotion, which insinuates that there is something inherently pathological about any sort of emotional experience. The annotation software interface reinforced this notion that five ratings coincide with the absence of activation and valence-the absence of emotion-while also visually reinforcing the meaning of the numbers themselves and a model of emotions emanating internally, from a person's self. The schematized figure of the human over the number five for valence wears a blank expression, and the grain of activation in the center of the torso of the figure above the five for activation is a reasonable size (unlike the figure above 9, which has been enveloped by an explosive cloud of energy). Five speech became an anchoring point for helping us figure out how to annotate segments in which the activation and valence were unclear. We would pass the headphones to each other, describing our sense of the segments' proximity or distance from the feeling of their five. Because we all annotated the same segments, we could consult each other regarding the 312 about troublesome ratings. Aubrey might insist, "well her five speech is kind of activated- sounding," or Josh would assert, "he sounds pretty close to neutral." The headphones and the annotation software also helped to calcify the feeling of fiveness. The ability to focus intently on the subjects speech, blocking out all other sounds in the office with the sound-canceling headphones, along with ability to pause, rewind, and replay the segment an endless number of times allowed us to auditorily scrutinize the speech in a way that the team members making assessment calls never could. In addition to passing the headphones around and the affordances of the annotation software, as we annotated, we would leave notes for each other on a communal notepad, like notes along a map: this particular subject was "good for depression" (they had many segments with low activation and low valence). This other subject had lots of noisy segments that were difficult to hear. In this way, we began to collectively establish the meaning of fiveness, forging tacit knowledge about the data set. We brought the rating scale into existences by turning annotation into a social activity. By the end of my four months at the BPU, we could describe subjects' speech to each other using the numbers alone. A calm and contended subject was a 6-8 (relatively neutral energy level in the voice, relatively positive sound of the voice). A disgusted or annoyed subject was an 8-4 (fired up with aggravated energy, slightly perturbed coloring to the voice). We would often tease each other by rating each other's speech or process a tense moment at a BPU-wide meeting by later joking with each other about the activation and valence of two people who had been arguing. I once said something incredibly embarrassing to the PI in the hallway and later, as I rolled around on the office floor in shame, the others stood around me laughing and debating about my speech: my activation was definitely at a 9, but valence was confusing-I was mortified but also, humbly, laughing at myself. 313 The numbers took on an affective charge, a sensation rather than, ironically, something that we could quantify. The annotation software with its 1-9 scales, our conversations, our notes, and our own ideas about the slowness of depressed speech and the quickness of manic speech scaffolded and constituted our "intuition." Altogether, these technologies co-constituted the affective texture of the subject's speech. This process of internalization underscores that affect is not something inevitable or pre-lingual, but a feeling that must be held together by systems of quantification. If assessment depends on professional listening, then annotation depends on annotators collectively building and fortifying infrastructures of feeling, practices and scales that make sound meaningful but that can also successfully meld into the background, disappearing altogether. In addition to a feeling for the five of the research subjects' speech, Aubrey, Josh and I shared something else, something much more intimate: an understanding of just how sick some of the research subjects were, how much some of them struggled, how much some of them appreciated their weekly calls with the psychology team member. We might also learn about how well they were doing, about high points in their lives, or about how much they wanted the phone at the end of the study rather than the participation stipend. While we could strive as much as possible to listen like a computer, it was impossible to fully reject the presence of the person uttering the sound. There was something weighty about the whole process, and while Aubrey and Josh appeared to be managing better than me, at times, the research subjects' audio excerpts would arrest me, piling up on me in an invisible way. One subject had such frenetic, frenzied speech, that only hours later when trying to fall asleep in my apartment did I realize with an ache that I had been clenching my jaw all day while listening, hiking my shoulders up to my neck. The subject's anxiety had made its way into my head, into my body. Some subjects spoke 314 frankly about their desire to take their own lives, with organized, detailed plans. For those subjects, I was relieved that the BPU is a mental health care facility and that while they cannot provide psychotherapy over the phone, the team does have trained clinicians who can and will assist subjects who are in this much distress. These kinds of segments were particularly hard for me to forget, to pull the sound away from the semantics, the sentiment away from the person. Kraepelin wrote of his patients, during bouts of mania, hearing auditory hallucinations, voices they described as emanating from God, from spirits, or voices as if through a telephone. During my four months annotating segments at the BPU, it was I who heard telephone voices at night, voices I was supposed to be ignoring and forgetting but voices I could not fully sever a connection from. As I wrote one sleepless evening in my fieldnotes, "like a drop in barometric pressure, they squeeze me, I am contained by them" (July 19, 2017). I lacked the kind of training that Adele and Rochelle had, the kind that enabled them to keep a part of themselves closed off and safe from the affective weight of conducting psychiatric assessment. The very thing that made me an excellent in-expert subject for annotation was also what kept me up at night. During these moments, I would remember what Jacob had told me, like an incantation: do not forget about the people on the other end. Even if absorbing the audio segment's contents was "cheating" and I was potentially violating the study's IRB protocol during these late-night remembrances, perhaps I was also honoring Jacob's request, holding space for the humanness that would be built into the study's algorithmic system. CONCLUSION: TECHNOLOGIES OF CARE 315 Some scholars argue that the ubiquitous presence of sensors and personal computers that track and capture vast volumes of data, or communication technologies that offer always-open channels of contact, leads to a world of disconnection, in which people lose the capacity to feel authentic, genuine intimacy (Turkle 2011). But, as I have hoped to show, technologies like cell phones offer a form of connectivity that can be life-saving, as was the case with Jacob, as well as the research subject whose participation in the study gave them the opportunity to chat with a mental health care worker once a week. That so many of the research subjects wanted to keep the BPU data and internet-enabled smart phone in lieu of the payment speaks to the fact that, like so many other resources, the hyper-connectivity of communication technologies is asymmetrically distributed. Access to a smart phone is a luxury for some before it even has the chance of transforming into a source of pathology. Other times, the kind of connectivity these technologies-and the making of them- requires is too close. In the annotation task, Josh, Aubrey and I had to fight a losing battle with disconnection. We were supposed to keep our selves separate from the lives and stories of the research subjects but listening/not listening to the calls granted us a strange, inescapable intimacy. The problem, then, is not that cell phone applications disconnect people. It is that, in building the kinds of apps and devices like my informants at the BPU sought to develop, users and builders become radically connected, and are strung together in a relation that (at least for me) can feel overwhelming. By ending on an ambivalent note and speaking honestly about how the annotation task at times disturbed me, I hope to emphasize the extent to which building any sort of voice-analysis technology, whether for mental health interventions or not, is no light or inconsequential matter. Annotating the audio segments fundamentally changed my perspective on devices like Google Home or the Amazon Echo. I now understand that, regardless of what the 316 companies producing these devices insist, human listening plays an unavoidable role in their development. The presence of a human listener somewhere in the data pipeline, who listens to and annotates audio segments, is a design feature. Recent investigative reporting (Day et al 2019; Van Hee et al 2019; Vincent 2019) has indeed revealed that both Google and Amazon rely on outsourced laborers to auditorily weed through and annotate the audio segments that users freely pass on to these companies through their use of the technologies-by interacting with them, speaking to them, the devices capture and process the user's voice segments. In this coverage, the "eavesdropping human" is contrasted with the "listening machine." I have hoped to show that this opposition is a false one. These two are one in the same-in order to make machines listen, you need human listeners. Unlike my informants at the BPU, who are ultimately committed to improving the lives of people living with bipolar disorder and interrupting pathological experiences before they can begin, Amazon and Google have no ethical review structure, no IRB to answer to. Under a neoliberal model of consumer choice in which consent is far murkier, and the terms of service are buried in pages of text, the protection of user's privacy (and the mental health of annotators who listen to their speech) is far shakier. Within this unregulated space, the outsourced annotators are at great risk as well. Just as scholars conducting ethnographic research with the content moderators who keep social media sites like Facebook safe, clean, and free of disturbing images have called for greater regulation and access to mental health services for content moderators (Roberts 2019) my fieldwork suggests the dire need for occupational hazard oversight in commercial applications of machine listening. Strides toward this goal are indeed being made in academia. Several researchers have testified that building voice analysis technologies for mental health applications does indeed 317 carry the potential for psychological harm (see Wolters, Mkulo and Boyton 2017). For example, writing in an article that reviews efforts to develop voice analysis technologies for suicide and risk assessment, a group of computer scientists and engineers warn, There are a range of potential health risks to investigators associated with collection of severely depressed and suicidal speech...direct interaction with depressed and suicidal individuals during collection or subsequent exposure to recorded data during tasks such as annotation can lead to research health risks including vicarious trauma and depress. The risk is magnified in researchers with non-clinical backgrounds, who might n be unfamiliar with either condition" (Cummins et al 2015: 37-38). The authors suggest a variety of best practices that involving explaining mental health risks to investigators, minimizing exposure to audio recordings and avoiding the use of headphones. They also suggest that investigators preview the excerpts and consult with a trauma psychologist before agreeing to take on the work, and regularly consulting with psychologists and colleagues while conducting the work. My fieldwork in the academic realm, with its ethically squeamish moments and my complicity in them, is a microcosm of the troubles and perils at play on the global scale of so-called smart speakers, listening devices, and the outsourced listening/not listening that enables them. The connectivity and closeness that unsettled me should spur us into action to call for the reform of Big Tech, and to suggest that Big Tech look to the academic realm for insight on how to build these technologies with greater concern for the privacy and safety of everyone involved. Moreover, it was only in shadowing the members of the psychology team at the BPU that I came to better understand-and respect-the complex skills that psychiatric assessment requires. I gained a respect for this job that informed how I looked at the data gathered at my other sites. It was only after meeting and shadowing Adele, Rochelle, and Lauren that I began to realize the extent to which automating psychiatric assessment delegitimizes this job. Learning from them led me to double back on the data I had gathered at the other sites and deepened my 318 analysis of the VHI. Even though they were not in a position to conduct psychotherapy over the phone-to care for the subjects as patients-they were all extremely committed to the larger project of finding a way to help people who live under the diagnosis of bipolar disorder. I often felt during my fieldwork that I was in no place to critique the cell phone study and some of its more ethically questionable components- listening to people's phone calls. Everyone on the team was committed to making a material difference in the lives of their patients, via the cell phone study. Like the team members at other sites, people working at BPU tended to be motivated by their own encounters with mental illness-among family members, classmates, siblings, partners, friends-who made their work on the cell phone study quite literally close to home. For instance, one of Adele's first jobs was at a long-since-closed state mental hospital. She witnessed the treacherous depths and teetering, dangerous heights of bipolar disorder first- hand while working this job. As she went about her day-to-day tasks, sometimes administering injections of the anti-psychotic drug Thorazine, she encountered severe cases of patients whose conditions were full-blown. She returns to these encounters, she told me, to keep her motivated. These memories compel her to work at the BPU, and to assist with the cell phone study. At the same time, it is key to avoid sentimentalizing efforts to provide care and to resist taking for granted that well-intentioned motivations absolves providers of care and those wrapped up in building mental health care interventions from critique. If anything, what my fieldwork at the BPU shows is that care itself is ambivalent, murky, poly-vocal, and contradictory. Adele and others could not, technically or legally, care for the research subjects on the phone. The annotators and I, technically, were not supposed to care about the content of the calls we rated. In an effort to pin down what "care" can contain and contradict, Martin, Myers 319 and Viseu (2015) write frankly about the ambiguous and sometimes violent "politics of care in technoscience": acts of care are always embroiled in complex politics. Care is a selective mode of attention: it circumscribes and cherishes some things, lives, or phenomena as its objects. In the process, it excludes others. Practices of care are always shot through with asymmetrical power relations: who has the power to care? Who has the power to define what counts as care and how it should be administered? Care can render a receiver powerless or otherwise limit their power. It can set up conditions of indebtedness or obligation. It can also sediment these asymmetries by putting recipients in situations where they cannot reciprocate. Care organizes, classifies, and disciplines bodies. Colonial regimes show us precisely how care can become a means of governance. It is in this sense that care makes palpable how justice for some can easily become injustice for others (627) The fact that Adele and her colleagues did not invest themselves in the research subject's emotional wellbeing is part and parcel of care. By this, I mean that neglect and harm are not opposed to care-they are care's constituencies. To parse out what we might call the attentional mechanisms of the two modes of listening (annotation and assessment, both of which require selectively ignore some components of speech will attending to others) of building the predictive algorithm for the cell phone study, is not to diminish the study's harmful consequences but merely to always "stay with the trouble" (Haraway 2016) with care. Thus, I hope to have made a case for the fruitfulness to be had in "unsettling care"-to poke at its taken-for-granted implications and, through my ethnography, to "situate affection, attention, attachment, intimacy, feelings, healings, and responsibility as non-innocent orientations circulating within larger formations, instead of as attributes of individual scientists" (Murphy 2015: 6). 320 References American Psychiatric Association. 2013. Diagnostica nd statisticalm anual of mental disorders (5th ed.). Arlington: American Psychiatric Publishing. Barthes, Roland. 1977. Image-Music-Text. Stephen Heath, trans. New York: Hill and Wang. Chion, Michel. 1990. Audio-Vision: Sound on Screen. Claudia Gorbman, trans., ed. New York: Columbia University Press. Clementz, B, and JA Sweeney, JP Hamm, El Ivelva, LE Ethridge, GD Pearlson, MS Keshavan, and CA Tamminga. 2016. "Identification of Distinct Psychosis Biotypes using Brain-Based Biomarkers." American Journal ofPsychiatry 1;173(4): 373-84. Cummins, Nicholas, Stefan Scherer, Jarek Krajewsi, Sebastian Schnieder, Julien Epps, and Thoams F. Quatieri. 2015. "A review of depression and suicide risk assessment using speech analysis." Speech Communication 71: 10-49. Day, Matt, Giles Turner, and Natalia Drozdiak. 2019. "Amazon Workers Are Listening to What You Tell Alexa." Bloomberg Technology, April 10. < https://www.bloomberg.com/news/articles/2019-04-1 0/is-anyone-listening-to-you-on-alexa-a- global-team-reviews-audio> (accessed July 23, 2019). Decker, Hannah. 2004. "The Psychiatric Works of Emil Kraepelin: A Many-Faceted Story of Modem Medicine." Journal of the History ofNeurosciences 13(3): 248-276. Dror, Otniel. 2001. "Counting the Affects: Discoursing in Numbers." Social Research 68(2):357-378. Duranti, Alessandro. 1992. "Intentions, Self, and Responsibility: An Essay in Samoan Ethnometapragmatics." In Responsibility and Evidence in Oral Discourse. Jane H. Hill and Judith T. Irvine, eds. Pp. 24-47. Cambridge: Cambridge University Press. Eckman, Paul and W.V. Friesen. 1971. "Constants across cultures in the face and emotion." Journalo fPersonalitya nd Social Psychology 17:124-129. Eckman, Paul. 1989. "The argument and evidence about universals in facial expressions of emotion." In Handbook ofsocialpsychology (Vol. 2). H. Wagner and A. Manstead, eds. Pp. 143-164. Chichester: Wiley. Eckman, Paul. 1999. "Basic Emotions." In Handbook of Cognition and Emotion. T. Dalgleish and M. Power, eds. Pp. 45-60. Sussex: John Wiley and Sons Co. Foucault Michel. 1978. The history ofsexuality. New York: Pantheon Books. Goffman, Erving. 1981. Forms of Talk. Philadelphia: University of Pennsylvania Press. 321 Goodwin, Charles. 1994. "Professional Vision." American Anthropologist 96(3): 606-663. Haraway, Donna. 1988. "Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective." Feminist Studies 14(3): 575-599. Haraway, Donna. 2016. Staying With the Trouble: Making Kin in the Chthulucene. Durham: Duke University Press. Insel, Thomas R. 2017. "Digital Phenotyping: Technology for a New Science of Behavior." JAMA 318(13):1215-1216. Kane, Brian. 2014. Sounds Unseen: Acousmatic Sound in Theory and Practice. Oxford, UK: Oxford University Press. Keane, Webb. 2003. "Semiotics and the social analysis of material things." Language and Communication 23: 409-425. Keane, Webb. 2005. "Signs are Not the Garb of Meaning: On the Social Analysis of Material Things." In Materiality. Daniel Miller, ed. Pp.182-205. Durham: Duke University Press. Keane, Webb. 2008. "Others, Other Minds, and Others' Theories of Other Minds: An Afterward on the Psychology and Politics of Opacity Claims." Anthropological Quarterly 81(2): 473-482. Kraepelin. 2002[1921]. Manic-depressiveI nsanity and Paranoia.R eprint, Birstol, U.K.: Thoemmes Press. Lutz, Catherine and G.M. White. 1986. "The Anthropology of Emotions." Annual Review of Anthropology 15: 405-436. Lutz, Catherine and Lila Abu-Lughod, eds. 1990. Language and the Politics of Emotion. Cambridge: Cambridge University Press. Martin, Aryn, Natasha Myers, and Ana Viseu. 2015. Social Studies ofScience 45(5): 625-641. Martin, Emily. 2007. Bipolar Expeditions: Mania and Depression in American Culture. Princeton: Princeton University Press. Mattingly, Cheryl. 1994. "The concept of therapeutic 'emplotment."' Social Science & Medicine 38(6): 811-822. Murphy, Michelle. 2015. "Unsettling care: Troubling transnational itineraries of care in feminist health practices." Social Studies ofScience 45(5): 717-737. Noble, Safiya Umoja. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. New York: New York University Press. 322 Onnela, J. & Rauch, S. 2016. "Harnessing Smartphone-Based Digital Phenotyping to Enhance Behavioral and Mental Health." Neuropsychopharmacology4 1:1691-1696. Puig de la Bellacasa, Maria. 2011. "Matters of care in technoscience: assembling neglected things." Social Studies ofScience 41(1): 85-106. Puig de la Bellacasa, Maria. 2017. Matters of Care: Speculative Ethics in more Than Human Worlds. Minneapolis: University of Minnesota Press. Rhodes, Lorna. 1991. Emptying Beds: The Work of an Emergency Psychiatric Unit. Oakland: University of California Press. Rice, Tom. 2010. "Learning to listen: auscultation and the transmission of auditory knowledge." Journalo f the Royal Anthropology Institute 6 1(s1): 41-61. Richardson, Sarah S., and Hallam Stevens. Postgenomics: Perspectives on Biology after the Genome. Durham: Duke University Press. Robbins, Joel. 2008. "On Not Knowing Other Minds: Confession, Intention, and Linguistic Exchange in a Papua New Guinea Community." Anthropological Quarterly 81(2):421-429. Roberts, Sarah T. 2019. Behind the Screen: Content Moderation in the Shadows of Social Medial. New Haven: Yale University Press. Rosaldo, Michelle Zimbalist. 1982. "The things we do with words: Ilongot speech acts and speech act theory in philosophy." Language in Society I (2):203-237. Rosaldo, Michelle Zimbalist. 1984. "Toward and anthropology of self and feeling." In Culture and Theory: Essays on Mind, Self and Emotion. R. Shweder and R. LeVine, eds. Pp. 137-157. Cambridge, U.K.; Cambridge University Press. Schaeffer, Pierre. 1966. Traite des objets musicaux. Paris: Le Seuil. Silverstein, Michael. 2001[1981]. "The Limits of Awareness." In Linguistic Anthropology: A Reader. Alessandro Duranti, ed. Pp. 382-401. Malden: Blackwell Publishing. Smith, Benjamin. 2005. "Ideologies of the speaking subject in the psychotherapeutic theory and practice of Carl Rogers." Journalo fLinguistic Anthropology 15:258-72. Sterne, Jonathan. 2003. The Audible Past: Cultural Origins of Sound Reproduction. Durham: Duke University Press. Throop, Jason. 2010. Suffering and Sentiment: Exploring the Vicissitudes of Experience and Pain in Yap. Berkeley: University of CalforniaP ress. 323 Throop, Jason. 2003. "Articulatinge xperience." Anthropological Theory 3:219-41. Torous, John, and Adam C. Powell. 2015. "Current research and trends in the use of smartphone applications for mood disorders." Internet Interventions 2(2):169-173. Turkle, Sherry. 2011. Alone Together: Why We Expect Morefrom Technology and Lessfrom Each Other. New York: Basic Books. Van Hee, Lente, Ruben Van Den Heuvel, Tim Verheyden, and Denny Baert. 2019. "Google employees are eavesdropping, even in your living room, VRT NWS has discovered." VRTNWS July 10. < https://www.vrt.be/vrtnws/en/2019/07/10/google-employees-are-eavesdropping-even- in-flemish-living-rooms/> (accessed July 23, 2019). Vincent, James. 2019. "Yep, human workers are listening to recordings from Google Assistant, too." The Verge, July 11. (accessed July 23, 2019). Wolters, Maria K, Zawadhafsa Mkulo, and Petra M. Boynton. 2017. "The Emotional Work of Doing eHealth Research." Proceedings of the '17 CHI Conference Extended Abstracts of Human Factors in Computing System. Pp. 826-846. Denver, CO, June 5. Wynne, Brian. 1996. "May the Sheep Safely Graze? A Reflexive View of the Expert-Lay Knowledge Divide." In Risk, Environment and Modernity: Towards a New Ecology. Scott Lasch, Bronislaw Szerszynski and Brian Wynne, eds. Pp. 44-83. London: Sage. Zhan, Andong, and Srihari Mohan, Christopher Tarolli, Ruth B. Schneider, Jamie L. Adams, Saloni Sharma, Molly J. Elson, Kelsey L. Speaker, Alistair M. Glidden, Max A. Little, Andreas Terzis, E. Ray Dorsey, Suchi Saria. 2018. "Using Smartphones and Machine Learning to Quantify Parkinson Disease Severity: The Mobile Parkinson Disease Score." JAMA Neurology 75(7):876-880. 324 Conclusion: An Ironic Dream of a Common Language "He listened with grave interest. 'It is strange to see the mysteries of my discipline from outside, through your eyes. I've only seen them from within, as a discipline.' 'If you permit-if you wish, Faxe, I should like to communicate with you in mindspeech.' Iwas sure now that he was a natural Communicant; his consent and a little practice should serve to lower his unwitting barrier. 'Once you did that, I should hear what others think?' 'No, no. No more than you do already as an empath. Mindspeech is communication, voluntarily sent and received.' 'Then why not speak aloud?' 'Well, one can lie, speaking.' 'Not mindspeaking?' 'Not intentionally."' - (Ursula L. Guin, The Left Hand ofDarkness, 1969: 56) In the exhibition hall where I stood with Hillary (WCU) to demonstrate the android for the study that never happened, we were met with many questions. Many of the attendees were horrified, showing their disgust on their face as they listened to Hillary and I ramble off the script we had agreed upon using to describe the android, its relationship to the VHI, and the study. These kinds of attendees accused us of attempting to build a therapeutic system that would replace humans, destroying job opportunities and outsourcing the fragile work of psychiatric care to a cold, uncaring, inert object. Sometimes, I shared people's disdain, and felt dissatisfied with the answers that I was supposed to give as an erstwhile member of the team, working alongside Hillary and lending her a hand. Secretly, I agreed with some people's moral outrage; I knew from our private conversations in the long car rides we shared together that Hillary did, too. Although we were mere, low-level representatives of the team, during the hours we stood flanking the android, attendees would question the ethics of the entire VHI system. They would ask us how the 325 privacy of research subjects could ever be fully protected; some guessed that Hillary and I had listened to the research subject's stories (which we had, because the team's IRB protocol enabled us to, and because we had to, in order to "get to know the data"). Others questioned whether or not it was right to use research subjects as data without providing them access to mental health resources. One person wondered why we were not focused on the elimination of war and imperial occupation itself, which is the root cause of veteran mental health issues. We were scratching the surface, they would imply, without remedying the underlying cause. One woman wearing a synthetic fur coat kept returning to ask these kinds of questions, over and over again, a gadfly disrupting our script. She expressed open dissatisfaction at the sound bites Hillary and I gave that promoted the study while re-directing people's fears and anxieties. She told us that she did not buy our answers-she did not believe what we were saying. It was her final remark on that day, however, that stunned me the most, shorting my ethnographic circuits, transporting me out of the thicket of my fieldwork and back into the broader, contemporary moment in which it was unfolding. "You're going to try to make them our slaves," she said, venom in her eyes as she jerked her chin toward the robot's placid face. "I've read enough sci-fi to know what happens next." Unlike the vast majority of the other attendees, and unlike anyone else I encountered in my fieldwork and the years following it, this woman was concerned not with the humans who robots will "replace," but with the robots who humans create to take up humankind's boring, dirty work. The woman's sentiment resonates with Rick Decker's, the protagonist of Do Androids Dream ofElectric Sheep, an android bounty hunter who develops empathy for the androids who have attempted to seek sovereignty from their human oppressors, and whom he is tasked with killing. This is his job, the source of income to provide for him and his wife, and 326 hence the central, ethical conundrum that fuels the story. As Haraway notes, "the boundary between science fiction and social reality is an optical illusion," an auditory hallucination (1991: 149). The accusation from the woman in the synthetic fur coat were perhaps the most accurate of all, in as much as we take the figure of the robot to align with an abject genre of the human: dehumanized, skilless, suited for repetitive labor and servitude. Nevertheless, this woman most likely relies on and interacts with heteromated systems as a seamless part of her life. Like so many of us, she no doubt depends upon and benefits from mechanized labor and hidden, dehumanizing work, whether from the content moderators who keep social media spaces like Facebook free of graphic images (Roberts 2019) or the miners who pry from the earth bits of minerals that will be used to form the miniscule batteries and lenses of an iPhone (Joler and Crawford 2018). This is precisely the point: the kinds of technologies my informants are building are an abundant feature of contemporary life in the United States. Even if we find them-and what it takes to make them-morally abhorrent, they are impossible to avoid, and it is imperative to understand them from the ground-up, ethnographically, to do the demystifying work of showing the humans in the loop. To conclude, I settle upon this seeming contradiction-the woman, like Decker, holding sympathy for the robots and by extension, the people who run and resemble them. I meditate on some of the dissertation's larger themes, stumbling toward a diagnosis of the present state of language, linguistic labor, psychiatry, automation and care in the United States. Or perhaps, I offer not a diagnosis but an assessment. After all, the purpose of psychiatric assessment is to determine what kinds of questions to ask next. Given these readings of the data, where do we go from here? What is the next move? If care is a selective mode of attention, where (and with whom) should we direct our attention? 327 UNCANNYVALLEYS Part of the tension I felt as an ethnographer at the symposium, and throughout my fieldwork, came from my inability to answer some of the questions posed to me, and a realization that the accusations thrown at my informants included my own actions as well. My position as a meta- scientist, a hybrid ethnographer-researcher, was not an innocent one. As a member of the team, I was implicated in their work, including the ethically squeamish portions of it. In the exhibition hall, I could not respond authentically or honestly to the attendees' commentary-I could not respond the way that I normally would. This was not the time to be openly critical about my interlocutors' work. My job at the symposium was to assist Hillary and make life easier for her and everyone else back at WCU, to do my part in avoiding bad publicity or any negative press coverage of the project or the Institute. Given the precarious nature of funding at WCU and, to an extent, across the three sites, people's livelihoods-and sometimes also their immigration statuses-were on the line. Still, people held critical opinions about the very work that sustained their livelihood and kept them in the country in which they wanted to live. They shared these critiques with me, either explicitly or implicitly, through study design choices, small acts of refusal, or in our everyday conversations about living in the United States. I have captured some of these voices throughout the dissertation. Sometime, for the sake of my interlocutors, I speak them in my own voice. When my interlocutors would share their critiques with me explicitly, the disclosure was often followed by an insistence that I dig critically into the very work that we were all participating in. As one informant remarked to me, there were many difficult stories to tell about their technologies, the teams, the treatment of research subjects in psychiatric research, and the 328 institutions they were tangled with, but the stories needed to be told. The question that some people I interviewed at WCU would ask-"am I allowed to say this?"-itself discloses much. The question was a rhetorical one, because they were going to say whatever "this" was anyways, regardless of my answer. People told me their secrets knowing that it was my job to tell others- or at least, to tell anyone who is reading this thesis. They trusted that I would do so in a way that could protect them from individual criticism as best as I could. Thus, to conclude from reading this thesis that my interlocutors are all bad people, doing bad things, is to miss the point, and to let everyone else (myself, you, dear reader) off too easily. To return to Goffman's participation framework, in fieldwork as a member of the teams, I was a mouthpiece-an animator-of their logic, of the technologies' proposed positive impacts. I am also an animator of the critiques that they preferred I utter for them. Like my role at the exhibition hall promoting the study, like my volunteer work at the community forum in the Midwest, and when tucking subjects into the scanner on the East Coast, my language was not always my own. To assist the teams with their research in exchange for participant observation, to learn alongside them, required adopting and becoming conversant in my interlocutor's epistemological life worlds (regarding psychiatry, assessment, language, signals) as well as their ethico-moral life worlds (regarding the distinction between hearing and listening, the pragmatic function of the IRB protocol, research subjects' sensitive, intimate stories). As Carol Cohen (1987) writes with reference to her fieldwork alongside nuclear strategic analysts with whom she caught herself understanding, getting along with, and even liking, I often caught myself in a moment of slippage, in which the initial absurdness of my interlocutors' research melted away. In trying to follow and understand vocal biomarkers, virtual humans, and telephone voices, I would internalize the researchers' worldviews. I would justify that listening to people's phone 329 calls was not spying, because they had consented and the data had to be gathered somehow, that it was inevitable to laugh at subjects in the scanner, and that I could watch as many videos of research subjects as I deemed necessary because I was a member of the team. I trulyfelt the feeling of a five. Writing fieldnotes in the various apartments I occupied over these twelve months, I'd drop my pen in revelation, wondering: maybe there are indeed biologically universal components of mental illness. If so many therapists report that "everyone knows" depressed people speak more slowly, then maybe there are universal vocal biomarkers, and unlocking them maybe really could improve the lives of thousands, if not millions. My interlocutors were trying to make a difference in the world, while I merely watched, an outsider come to gawk. Who was I to learn from them and then walk away, only to critique the very people who had shown me kindness and trust, with whom I had built rapport? Michael M.J. Fischer uses the concept of the "ethical plateau" to describe "domains of ethical challenge" in which it is difficult to know which direction to go in; ethical plateaus arise when "new technological politics that initially seem like warning flags rapidly become absorbed into routine markers of a changed common sense" (2001: 362). Think, for instance, of smart listening devices like the Amazon Echo, discussed in Chapter 4, and how my informants' work can be read as a sentinel, warning us of these far less regulated devices, attuning us to the chains of labor and histories of de-humanization to which they are attached. These kinds of technologies form "the ladder of the ethical plateau," which, Fischer suggests, "might provide a way to think about how traditional critical social theories are being challenged to evolve in new directions" 2001: 368). In addition to the ethical plateau, I encountered another topological formation in my fieldwork, what I call ethical uncanny valleys, in which ethical frameworks are at once familiar "M 330 but strange. Like the example Freud uses in his original essay (2012[1912]), being lost and returning to the same place, again and again, in an attempt to find our way, provokes a feeling of the uncanny-a sense of strange return, of I've been here before. My fieldwork was full of eerily familiar terrain, including the crisscrossing histories of psychiatry and computing discussed in Chapter 1. My use of "uncanny valley" offers a playful stretching of the original meaning of the term. Masahiro Mori, a Japanese roboticist, originally developed the term in 1970, and the translation into English as "uncanny" linked this concept with Freud's (Mori 2012).70 Mori developed this term to describe the relationship between human-likeness and human affmity for non-human objects. The closer a non-human object approaches a living, healthy human being in its movements, appearance, and sound, the more grotesque it becomes, plunging below the level of neutral affinity and into the negative zone. In Mori's uncanny valley, we find horrific distortion with recognition at its center. For Freud, the uncanny is about a confrontation with the darkest, seamiest, and inescapable part the self. + Uncanny Valey Healh Person Toy Robot lndust Robot HwManrUkenes 50% 100% Prosthetic Hand 70 Norri Kageki (2012) suggests that the relationship between Mori's original essay and Freud's has been over- determined, due to a less-than-accurate translation. 331 Mori's graph of the uncanny valley, depicting "the proposed relation between the human likeness of an entity and the perceivers affinity for it" (2012: 2). At a crucial vector, an object approaches almost complete human likeness and the perceiver's affinity for it plummets. The uncanny valley exists in this below-zero zone; the resemblance provokes horror and disgust. In ethically uncanny valleys of my fieldwork, there were gravitational wells that pulled me deeper in, places in which to get stuck. Once stuck, I could better attune myself to the forces that had drawn me there while also better understanding my own position and the surrounding architecture. In these sinking spaces that are simultaneously home but also not home, where fieldwork is also homework, it was difficult to tell: was I studying up? Studying sideways? Who wielded power over whom? My informants occupied this position with me, a position that is also a complicit but an ironic one: one that is not entirely sincere. People-myself as an ethnographer, my informants as research subjects with research subjects as their own-do not always say what they mean or mean what they say. Through their actions, silences, and their own misdirection, they can enact subtle critiques, subversions from the inside, flipping the script. Moreover, rather than "studying those study us," to use Forsythe's (200 1) description of doing ethnography with computer scientists who employ social science methods, I was studying those who study like us. In observing my interlocutors build devises that captured people's speech and enabled its circulation far beyond its context of utterance, subjecting it to analytic and theoretical re-mediations that the person could never have anticipated, it felt like looking in a mirror, like seeing my own discipline (anthropology) from the outside-in, even as I participated in it. ETHICAL SOUNDSCAPES AND THE GOOD LISTENER How to pull oneself out of an uncanny valley-that is, to recognize that alternative arrangements are possible, ones that are new rather than oddly familiar? The first step, I believe, is to sit with 332 the queasiness these moments cause, taking them as opportunities for reflection rather than cause for recoil. This means recognizing that the formations of these dips and dark places are not of individual doing, but are structural, epochal, and tied to broader forces. As Puig de la Bellacasa notes, "the purpose of showing how things are constructed"-and connected-"is not to dismantle things" by denying their reality (2011: 82). To show how things-facts, affects, ideologies, connections between states and sounds-are constructed rather than existing ab ovo is not to reject their reality nor to "undermine...the powerful (human) interests they might reflect and convey" (ibid). Instead, showing these connections is to "affirm their reality by adding further articulations" (ibid). In this spirit, Hirschkind's (2006) notion of an ethical soundscape-a sonic landscape that surrounds us all-offers not an exit strategy from ethically uncanny valleys per say, but an invitation to tune in to the ways that modes of listening and modes of self-fashioning run together with politics and power, and how they impact our interactions with and response to the people we share our spaces with. Hirschkind uses the ethical soundscape to describe how aural media-in his case, cassette tape Muslim sermons-contribute to the "shaping of the contemporary moral and political landscape" (2). He invites us to think through listening not as passive reception, but active process that can be shaped by and also shape one's "ethical sensibilities under-gridding moral action" (9). The ethical soundscape does not just surround us-it is more than ineffable milieu. Like habitus, it is constituted by, circulated, and shaped through repetitive practices, attuning one's mind, body, and affect in a way that encourages a "technique of self-fashioning" (22) and "ethical sedimentation" (28). Think, for instance, of Nava and Taylor repeatedly listening to research subject's stories, and how this experience informed the way in which they would interact with subjects after the interview. Think as well 333 about the other annotators and I, whose listening/not listening suggests the impossibility of protecting user privacy in machine listening systems. What else was so familiar, in its strangeness, about the teams' efforts to develop speech analysis technologies for psychiatric screening, and the crashing together of language ideologies and listening practices that they imply? What other modes of listening and ethical sedimentations do the listening practices of the teams-and the intended mode of listening of their technologies-point to, and reproduce? As already referenced, there is Alexa, the voice-activated assistant of the Amazon Echo. The gendering of Alexa-the work it takes to avoid calling the device a "she," and referring to what the device does as "listening"-displays the same kind of coalescence of gender and labor engineered into figures like Abby. Moreover, scholars like Virginia Eubanks (2017) argue that the automation of decision-making work in the service sector and the corresponding devaluing of people conducting that labor (and devaluing of those who are on the receiving end of the service work) is a feature rather than a bug in the United States. Her discussion of how the automation of the welfare eligibility process changed the relationship between caseworks and their clients is another warning sign for what the automation of psychiatric assessment might look like. Under eligibility automation, caseworkers no longer have a single case assigned to them based on their location and the location of the client; a loss of shared locality leads to a loss of contextual information about the client and their case. Caseworkers are instead assigned a case through a workflow management system. Remarks one of the caseworkers Eubanks interviewed for her study, "'If I wanted to work in a factory, I would have worked in a factory...You were expected to produce, and you couldn't do that if you listened to the client's story" (Eubanks 2017: 63). Tweak the terms of the relationship, and the caseworker must listen in a different way-listening to move the call along, listening 334 pragmatically and strategically rather than "to the client's story," to its personalized, narrative texture. The duplicitous nature of the listening involved in building speech analysis technologies is also not unique to my informants' projects. They would prompt the subjects to participate in producing speech or interactional encounters by emphasizing the very components of speech they sought to downplay: its referential function. The study and development of deception work in psychiatry has a rich and varied history in the United States. Like signal processing, psychiatry is also wrapped up in the military industrial complex, and the extraction of "intelligence" (meaningful, enemy data) from information. In 1997, reporters from the Baltimore Sun retrieved the KUBARK Counterintelligence Interrogational Manual through a Freedom of Information Act (FOIA) request. Originally produced in 1963, KUBARK (the CIA's codename for itself) references the rapport-building skills of psychotherapists as a source of inspiration for the interrogation tactics described in its pages. For instance, in the annotated bibliography reference for Harry Stack Sullivan's guidebook on the psychiatric interview (1954), the KUBARK authors note, Any interrogator reading this book will be struck by the parallels between the psychiatric interview and the interrogation. The book is also valuable because the author, a psychiatrist of considerable repute, obviously had a deep understanding about the nature of the inter-personal relationships and of resistance. The release of the Hoffman report? in 2015 detailed, with no minced words, the extent to which the CIA relied on the American Psychological Association to justify and promote torture interrogation tactics that violated international human rights standards. The APA also provide the 71 On July 2, 2015, David H. Hoffman of Sidley Austin, LLP, published an independent review that he conducted with his legal team. The investigation uncovered, among other things, that the APA had re-written its code of ethics to enable its members to participate in Bush Administration-sanctioned torture, and that members of significant influence had been tapped to lead interrogation training efforts and develop torture tactics. See: https://www.apa.org/independent-review/revised-report.pdf 335 government with psychologists-in-training (its younger members) to conduct interrogation interviews. Under the Obama Administration, starting in 2009, the CIA moved away from coercive (i.e., torture-driven) interrogation practices and toward non-coercive, research-based strategies that emphasized rapport building, listening rather than questioning, and guiding a suspect's impression as to how they were being listened to (Watkins 2017). This brings a whole new, suspicious reading to the emphasis on rapport building and trust that is built into the VHI's interface. Researchers designed their studies, as I have shown, in an effort to grasp the truth of speech through practices that restrain the speaker's agency. The research validates a language ideology-and attendant set of practices-which implies that the heart of language lies within a secreted space, which must be wrenched open sometimes against the speaker's will. We should be cautious in considering how this research might be taken up to justify tactics that the researchers themselves would not agree with, without regard to their agency or their desire to help rather than harm people. If a central point of my ethnography has been to explore what it means to listen, issues of ethics and responsibility bring a related question: what makes a good listener? I invoke "goodness" here with reference to measures of skill and expertise within a professional framework, and measures of moral and ethical goodness (the "good listener" in pursuit of eudemonia-human flourishing, "the good life"). Within the cultural legacy of psychotherapy in the United States, these two are interlinked. That is to say, the ideology of inner reference (psychiatry's hegemonic language ideology) implies a moral framework and a set of attendant linguistic and listening practices, with the implications that language (being primarily referential and anchored in a speaker's self) is the grounds of intersubjectivity, and therefore leads to empathy. At the same time, as the skilled listening of interviewers like Adele and Rochelle as 336 well as Abby illustrated, the intersubjective sharing of empathy can be an illusion, a strategic performance. Being a good listener is both a professional practice and a skill that needs to be socially reproduced. In my fieldsites, where the linguistic labor of being an empathic listener is assigned to less experienced, lower-level researchers, because it a task that "anyone can do," we can see the consequences of these two frameworks fusing together. To be a good listener as a social worker becomes indistinguishable from being a good listener as a human being. Listening to the content of speech (and giving the impression that this is how speech is being listened to) no longer appears to be a professional practice, or a professional skill that must be cultivated. Good listening appears as a human capacity-something anyone who is human can do, and yet also, at the same time, something that can be easily replicated, mimed, and performed by a non- human machine. FURTHER ARTICULATIONS This dissertation has shown how dominant Euro-American language ideologies (of speech's relationship to interior states) are the guiding logics behind the automation of psychiatric screening, even while this ideology gradually unravels in the building of the technologies. Through these technologies and their attendant labor relations, researchers seek to peel from the core of speech a "layer" of pure affect, of the most universal components of mental illness. On the one hand, machine listening implies a radical mode of im-mediation, and one that defeats the human exceptionalism of language. To attempt to render speech into mere sound-a wave-is to equalize it, to empty it of its species-specific particularities, to equate it with any other kind of sound. On the other hand, what we might call "human listening"-listening to language 337 semantically, as an emanation of the speaker's self-is posed as the unique domain of the human, a mode of listening that a machine might mimic but never fully recreate. This contradiction highlights something crucial about the nature of language ideologies, which are, as Kathryn Woolard writes, "the mediating link between social form and forms of talk," between ideals and practices (1998: 3). Woolard reminds us that the "ideology" of language ideologies has multiple meanings. One such meaning implies the commonsense, that which is known and taken for granted about the world, "derived from, rooted in, reflective of, or responsive to the experiences or interest of a particular social position," although ideologies often move through the world as if "universally true" (6). There is also a Marxist tradition to the study of language ideologies: language ideologies provide distorted, illusory rationalization of how language works, including the relationship between language and interaction. They are, as Foucault might put it, "power-linked discourses" that map incompletely onto the world as people experience it (Woolard 1998: 7). Thus, language ideologies justify practices that are attached to power, but they are not inevitable. My dissertation has largely argued that the language ideologies my interlocutors pursue and that fuel their efforts are entrenched in histories and hierarchies of value that stretch beyond the individual projects, that are calcified and continually reinforced in psychiatric practices, and in the practices involved in gathering and classifying research subjects' speech data. But are there any new, further articulations to be found? What are the contemporary inflections of these ideologies, at a time when speech has moved from the medium of the air to the medium of the Internet, an era of "alternative facts," fake news, Deep Fakes, conspiracy theorist "crisis actors"? And in an era where the relationship between utterances, intentions, authenticity, and truth, is newly in discussion? 338 Perhaps the projects do indeed point to something new. But I also believe that what feels new about the current state of things in the United States for some, feels old for others. Maybe what is new is the dawning realization that the relationship between utterances and action, speech and sincerity, is a figment of the liberal democratic imagination that has never universally held true for all. Following the election of Donald Trump, indigenous STS scholar Kim Tallbear (2017) has asserted that native people living in the place sometimes referred to as the United States have long been living in a post-truth world since the arrival of settlers, and since the blatant, open, and continual violation of land treaties. Likewise, for so many living in the United States-disabled, trans, gender non-binary, non-white-the word of the law extends asymmetrically. My informants, in their technological intervention into the relationship between speech and agency, show the fabricated nature of this connection. There is justice work to be had in the exposing of these seams. This is also why the invocation of "post-humanism" gives me such caution and pause. The post-ness of the post-human implies a completion, a sense of finishing. In the words of Ruha Benjamin, "posthumanistv isions assume that we all have had a chance to be human. How nice it must be...to be so tired of living mortally that one dreams of immortality. Like so many other 'posts' (post racial, postcolonial, etc.), post humanism grows out of the Man's experience. This means that, be decoding the racial dimension of technology and the ways in which different genres of humanity are construed in the process, we gain a keener sense of the architecture of power-and not simply as a top-down story of powerful tech companies imposing coded inequality onto an innocent public. This is also about how we (click) submit, because of all that we see to gain by having our choices and behaviors tracked, predicted, and radicalized" (Benjamin 2019: 32; emphasis original). Looking ahead and imagining alternative visions of the present is a productive exercise. Sci-fi- science fiction, speculative fabulation-invites us to imagine life in the future. It also can 339 provide a means of doubling back-have I been here before?-reflecting on the connections between what seems new and unprecedented, and the lesser-examined portions of the past. 340 References Benjamin, Ruha. 2019. Race After Technology: Abolitionist Toolsfor the New Jim Code. Cambridge, UK: Polity Press. Cohen, Carol. 1987. "Sex and Death in the Rational World of Defense Intellectuals." Signs 12(4): 687-718. Dick, Philip K. 1968. Do Androids Dream ofElectric Sheep? New York: Random House. Eubanks, Virginia. 2017. Automating Inequality:H ow High-Tech Tools Profile, Police, and Punish the Poor. New York: St. Martin's Press. Fischer, Michael M.J. 2001. "Ethnographic Critique and Technoscientific Narratives: The Old Mole, Ethical Plateaux, and the Governance of Emergent Biosocial Polities." Culture, Medicine, and Psychiatry 25: 355-393. Forsythe, Diana. 2001. Studying Those Who Study Us: An Anthropologist in the World of Artificial Intelligence. Stanford: Stanford University Press. Freud, Sigmund. 1919 [2003]. The Uncanny. D. McLintock, trans. New York: Penguin. Kageki, Norri. 2012. "An Uncanny Mind: Masahiro Mori on the Uncanny Valley and Beyond." IEEE Spectrum, 12 June. (accessed August 7, 2019). Haraway, Donna. 1991. Simians, Cyborgs, and Women: The Reinvention ofNature. London: Routledge. Hirshkind, Charles. 2006. The Ethical Soundscape: Cassette Sermons and Islamic Conterpublics. New York: Columbia University Press. Irani, Lilly. 2015. "The cultural work of microwork." New Media and Society 17(5): 720-739. Le Guin, Ursula K. [1969]2016. The Left Hand ofDarkness. New York: Penguin Books Joler, Vladen, and Kate Crawford. 2018. "Anatomy of an Al System: The Amazon Echo As An Anatomical Map of Human Labor, Data and Planetary Resources," AlNow Institute andShare Lab, September 7. (accessed August 6, 2019). Mori, Masahiro. 2012. "The Uncanny Valley." K.J. MacDorman and Norri Kageki, trans. IEEE Robotics andAutomation 12(2): 98-100. Puig de la Bellacasa, Maria. 2011. "Matters of care in technoscience: assembling neglected things." Social Studies of Science 41(1): 85-106. 341 Stack Sullivan, Harry. 1954. The PsychiatricI nterview. New York: W.W. Norton and Co. Roberts, Sarah T. 2019. Behind the Screen. Content Moderation in the Shadows of Social Media. New Haven: Yale University Press. TallBear, Kim. 2017. "Interrogating 'the Threat'." Presidential Plenary, Society for the Social Studies of Sciences Annual Meeting, Denver, CO, August 30. Watkins, Ali. 2017. "Elite terrorist interrogation team withers under Trump." Politico, December 5. (accessed August 7, 2019). Woolard, Katheryn. 1998. "Language Ideology as a Field of Inquiry." In Language Ideologies. Practicea nd Theory. Bambi B. Shieffelin, Kathryn A. Woolard, and Paul V. Kroskrity, eds. Pp.3-47. New York: Oxford University Press.