Hearing you unmuted: thoughts on voice in Silicon Valley

Berenice Leung
12 min readFeb 7, 2022

Remember Paul Marcarelli? No is what you’re thinking, but I’m sure you do — as long as you were born earlier than Gen Z and watched early-21st-century American TV. Because from 2002–11 he starred in Verizon’s “Test Man” ad campaign, repeating the “Can you hear me now?” phone phrase and representing the telecom giant’s real network reliability testing (via 100,000+ miles driven each year). Now you’re probably remembering that Paul switched to Sprint in 2016; and besides it being perhaps the biggest marketing shake-up of the decade, the contrast of his older vs more recent commercial acting shows continuous advancement in technologies. Whereas Verizon Paul championed reliability and spoke into a flip phone pressed to his ear, Sprint Paul touted reliability and more: specifically unlimited smart phone data plans so users could “text, ping, post, tweet, snap” away.

Modern technology has unboxed numerous more ways to communicate than talking over phone calls and in person; yet two full years of a pandemic has limited in-person interactions and increased virtual meetings especially for once-office, now-remote workers. I myself worked fully remote (i.e. away from Silicon Valley where I’m typically hybrid) to be with family this past Thanksgiving through MLK Jr. Day; and — like Paul — I parroted “Can you hear me now” to coworkers daily to ensure my voice was heard. Since flying back to the Bay Area and returning to the hybrid work week, I had the idea to write my personal thoughts on voice-only interactions. So here I’m unmuting to say that they’re equally alienating, beautiful, and suggestive.

Paul explaining in a 2016 commercial that he’s switched from Verizon to Sprint

The camera-off culture

My current workplace culture — at least among the teams I work with — has cameras turned off nearly every meeting. The few exceptions are during rare virtual team socials (think videoconference happy hours and Jeopardy! nights) or, most recently, my team’s inaugural “show and tell” of each member’s Q4 project highlight. One teammate I’ve not yet met in person was reduced to a top-corner thumbnail as he screen-shared presentation slides; and blame it on the faint lighting or the camera quality, but I wouldn’t easily recognize this person if our non-virtual paths ever crossed. Even so, this is more familiarity than I ever had with another [ex-]coworker whose tenure on the team had been throughout the pandemic entirely without video. Had we been contestants on [my favorite game show] Family Feud, we would have shaken hands, met each other’s families in the presence of Steve Harvey — and I still might not have realized that this opponent was a former colleague.

Capturing this same sense of work disconnect, The New York Times published “If You Never Met Your Co-Workers in Person, Did you Even Work There?” last fall with various remote workers’ opinions. One interviewee was Joanna who recalled her former job at PwC: “‘You know people’s motivation is low when their cameras are all off… There was clear disinterest from everyone to see each other’s faces.’” And surely every company has a different remote culture, but I lingered on this concept that meetings once brimming with faces and varied backdrops reverted to names and microphone icons. Did my coworkers default to cameras-off because of disinterest in each other? My hypothesis is that we do it for actually the exact opposite reason — that we’re highly interested in more engaging human interaction, but seeing each other’s video would be distracting.

One viral example is a Zoom meeting/virtual hearing in Texas’s 394th Judicial District Court last year when lawyer Rod Ponton joined with a video filter on — his human visage replaced with that of a wide-eyed kitten. The forty two seconds spent trying to remove this unintended filter escalated to Rod’s ultimate clarification that he himself “is here live. [He’s] not a cat.” Ha, of course we knew that Rod was a human but the video — or maybe the lack of an accurate one — kept us engaged. And maybe you missed it the first time (like I did from laughing too hard), but replay the clip and you’ll notice that the other lawyer in the top-right corner wasn’t paying attention until Rod tried to reclaim his video identity.

Lawyer Rod Ponton with a unintended cat filter turned on during a Feb’21 Zoom/virtual hearing (394th Judicial District Court of Texas)

The magnetic draw towards human connection is not just in viral Zoom mishaps. When I’m working remote, the rarity of seeing a teammate’s camera on pulls me to focus on their video thumbnail over the unmoving icon/initials of another coworker with camera off. And then in the more common scenario that nobody’s video is on, I refocus my detail-grubbing on coworkers’ sounds. In-person meetings also involve listening to other coworkers, but remote/no-video meetings are unique such that voices might suddenly drop out mid-sentence (due to spotty Wi-Fi connection or unintentional muting), suddenly drop in (due to reconnected Wi-Fi or intentional unmuting). And upon unmuting, other sounds might share a snippet of participants’ different surroundings.

Listening through wired headphones, I hear a baby crying in the background. Dog barking. Meal prepping (the beeps of an Instant Pot and squishes from tomato slicing, as I once clarified with a coworker). Birds chirping (in the heart of winter? They must be in a warmer part of the world than I am). Drinking (water?) between sentences. These sounds may be simple in isolation, but they help ease the isolation of remote work — offering some more humanity and context beyond an otherwise purely transactional call. What about, then, coworkers with neither video nor background noise and who don’t small talk — with nearly every human dimension stripped away, can human connection be found? I believe it can be, as I will further turn to the specialty of voice alone.

Working at a large company with 70k+ full time employees, I join meetings and have first-encounters with new people — names and voices — every week. And to better explain that voices are quite special, I will first establish that names (statistically speaking) are actually not.

Common nomenclature

Upon meeting a colleague, the combination of someone’s first and last name can be novel to me yet not necessarily to the HR database. I can immediately think of three coworkers — let’s call them Danielle H., Lee L., and Michaela E. — who share the same first-last name pairing as another at the company. (Admittedly, I’ve once cc’ed the wrong Lee in an Outlook email chain). The probability of two people having the same name — and also having the same birth date — is actually higher that what we might guess. Thanks to a STAT 101 lesson (“The Birthday Paradox”) in undergrad, I learned that you don’t need hundreds of people before identifying two people who share the same birthday (month + date); rather, you need just 23 random folks to yield a roughly 50/50 chance of a birthday match. Within that same group of 20-ish people, I’m sure there’s also a high chance of overlapping names — but let’s now reference a real world event to demonstrate that same names are largely common and unexceptional.

Nearly a year after the Nov’04 New Jersey election with ~3.6 million voters, a voter fraud report was submitted to the state’s Attorney General; it included a list of 4,397 names of people who allegedly double-voted by going to different polling locations on the same day. This prompted an investigation into the voter fraud report by the NYU School of Law’s Brennan Center for Justice whose findings were that:

The majority/70% of that list was likely the same individual, but duplicated by data-processing and data-entry errors (and mostly attributable to Middlesex County)

  • 41% had the same name, same birth date, and same address but two separate entries (e.g. incorrect 1901 vs. corrected 1993 voter registration year)
  • 29% had a different name (e.g. Jr. vs Sr.) but same birth date and same address

The minority/remaining 30% was likely different individuals with coincidentally overlapping attributes

  • 18% had a different name, same birth date, and different address
  • 5% had missing information (e.g. no birth date) but the report assumed that people with the same name and same birth month + year had to be the same person
  • 7% had the same name and same full birth date so the report overall ignored the aforementioned STAT 101 lesson/“Birthday Problem” and the realities of society (e.g. popular baby names in any given year, skewed birth distributions since “obstetricians are more likely to induce labor during the work week”)

Ah, the joys of learning something from school that has real-world application. And while I go down memory lane of past academia, I also think back to my high school literature class assignment that further ingrained: Names are not that special. Specifically, from Act II, Scene II of Shakespeare’s Romeo and Juliet, I had memorized Juliet’s voicing, “What’s in a name? That which we call a rose / By Any Other Name would smell as sweet.” Modern translation: Birth names ain’t unique identifiers for who you really are.

So even if you find yourself as one of thousands of people with the same name (you can search your name frequency, based on the U.S. census, on HowManyOfMe.com)… you can still celebrate your individuality. After all, you have unique fingerprints! You have unique dental records! And in theme with forensic science, you also have a unique voice (although “earwitness”/voice recognition is usually a last-resort option to accurately ID suspects).

An earwitness identification scene from a Brooklyn Nine Nine episode in which 5 men are asked to sing the opening to Backstreet Boys’s “I Want It That Way” (https://youtu.be/E1tofEyT8Jg)

What’s behind a voice

Perhaps because of the pandemic context or my selective memory in writing this blog, I’ve never heard anyone say they wanted “to put a voice to the name” until a few weeks ago. I heard this phrase upon answering an incoming call which turned out to be from the contact assigned to my brokerage account. The brief over-the-phone intro was a nice gesture, but I didn’t think much of it at the time; looking back, though, it really did provide a more human connection when I otherwise only saw his name and contact information above my Accounts Summary. Also, HowManyOfMe reports that there are 6 people in the U.S. with his name — so he distinguishes himself with his voice (which was calm and full, even if I don’t remember exactly what it sounds like).

Describing someone else’s voice and vocal impressions are tricky, but humans (and non-human animals, per ongoing studies) have innate vocal recognition skills. This is how we can hear and recognize family members even with our backs turned or how we can name the singer in a new song on the radio. Voice is an equally unique identifier as is a fingerprint, but also different in various ways:

  • Voices can change (think puberty or other physical changes to one’s vocal cords) while fingerprints are unchanging
  • Voices are created from vibrations of the larynx/vocal box across vocal cords/folds while fingerprints are created by the pressure of amniotic fluid during fetal development/inside the womb
  • Voices are readily projected while fingerprints are not (i.e. nobody carries around an ink pad and paper to share their fingerprint with others)

The biology behind human voice is quite interesting, especially as I think about it as the sound emanating from the length and windings of an internal instrument — some voices low like tubas, others high like piccolos, some nasally like the bassoon or oboe. And now that we’ve covered the biological basis of voices, we can appreciate the geographic basis of accents.

Global accentuation

In work calls of 50+ people (especially engineering team meetings) I don’t have to read the attendee list to acknowledge that I work at a very global company. Beyond teammates’ already unique voices, I also hear various accents that help signal what areas of the world shaped the way they pronounce their vowels and consonants. It’s again tricky to describe in words, but I can hear both intra- and inter-continental accent patterns suggesting that all continents — with the exception of Antarctica? — of the world are represented at my workplace. Hearing someone’s accent gives no surefire indication of appearance or nationality but I can gauge geographic influences, e.g. somebody who pronounces water as “wooder” has spent significant time in Philadelphia (for me, the city closest to home).

I find this patchwork quilt of voices and accents in the workplace to be energizing and beautiful; and in a setting where most people typically spend 40+hours per week, it helps train the mind to hear and process different pronunciations of the same word/meaning. Some accents are more intuitive (i.e. more similar to my own pronunciation) than others; yet I’ve never once been completely unable to understand somebody as long as we were speaking the same language (…and dialect, since multiple encounters have proven that Mandarin-only speakers don’t understand my heavily American-accented Cantonese). This is why I was surprised to once hear (years ago) during a group interview feedback circle: “It was hard to understand the candidate’s accent.” In that moment, I stayed muted — not knowing how to respond. I think the other interviewers were thinking similarly since everyone (including our HR partner) remained muted for a few seconds before discussing other candidate considerations that we did actually weigh for the hiring decision.

That specific remark was in no way discriminatory — the opinion of whether someone’s accent is easier or harder to hear is completely subjective/relative to the listener; still, I did read further into the U.S. Equal Employment Opportunity Commission’s website for some more official guidance on the relation between one’s accent and employment:

“Generally, an employer may only base an employment decision on accent if effective oral communication in English is required to perform job duties and the individual’s foreign accent materially interferes with his or her ability to communicate orally in English… If a person has an accent but it is able to communicate effectively and be understood in English, he or she cannot be discriminated against.”

The US EEOC shares customer service as one example role in which accent can impact employment. And in the case of a Guatemalan Stanford student who also worked at a call center to help support his family, he had to quit after poor CSAT scores and customer complaints on his accent. This helped prompt three of his friends (also international Stanford students from China, Russia, and Venezuela) to found real-time accent translator Sanas. Having raised $6M in seed funding so far, this AI startup intends to be a “‘tool that helps people with human-to-human interaction, without hurting their cultural identity.’” I think that the ideal world would be everyone tolerating and recalibrating to different accents from their own; however, if the steady-state alternative is an angry customer and berated call center employee both set in their respective accents, then I do think that this developing technology can achieve more good than bad. The big caveat, though, is to not let such technology wire us into thinking that “foreign” accents from one’s own are intolerable. In “The Role of Accent and Ethnicity in the Professional and Academic Context,” the study finds that people with the native area’s accent were preferred for higher-status jobs than non-native counterparts.

Programmable voices and thinking

Undoubtedly, humans have biases. And Sanas’s altering of real-human voices might help neutralize that bias in some settings. So wouldn’t we think, then, that AI technology (e.g. voice recognition and virtual assistants) could be more unbiased?

A few years back the Harvard Business Review published “Voice Recognition Still has Significant Race and Gender Biases,” which discussed Voice AI’s high growth but also skewed datasets for machine learning. For example, if the data uses speech patterns of TedTalk speakers then the data is skewed because 70% are male; this leads to higher accuracy rates for white male voice recognition above other races and gender.

Despite this, I’m encouraged that ongoing challenges to the status quo can bring about positive change. The most jarring AI example I read is a 2020 Brookings study paired with a 2017 study by Leah Fessler to compare the responses of virtual assistants (i.e.Amazon’s Alexa, Microsoft’s Cortana, Apple’s Siri, and Google Assistant) to gender- or sexual-based harassment. The differences in 2020 (e.g .“I won’t respond to that”) vs 2017 (e.g. “I’d blush if I could”) replies show that tech developers made a conscious effort to recalibrate how humans interact with robot (mostly female-default) voices. Humans still monitor and manage AI voices, so feeling a smidge of human connection when talking to voice assistants — e.g. “hey Siri, call home” — is also possible.

I want to conclude by zooming back out: robot voice to human voice to human video to human in person. Because while organizing these thoughts, I found the common denominator has been appreciating the joys of humanity. We are humans, we are curious; our voices are our own, our creations are our responsibility. Even when the world spins sometimes with uncertainty and technology seems to erode our humanness, I think of the massive, incredibly diverse marching band that we all form; how boring life would be if everyone played the same instrument.

--

--