Zoom and the Segmented Subject

This text was originally published in Volume 8 of Inflection Journal, alongside images by Ciro Miguel. The issue was edited by Michaela Prunotto, Kate Donaldson, and Manning McBride and published in December 2021 by Melbourne Books.

↓

Zoom was never intended to foster intimacy. Launched in 2011 as an enterprise tool to monitor and optimise the productivity of a distributed workforce, it has transcended the narrow scope of its original design in a way that parallels the accelerated pandemic era collapse of divisions between labour and leisure, digital and physical, public and private, in which it has played a key role. Between December 2019 and March 2020, when global lockdowns and social distancing became an instant new feature of everyday life, Zoom’s daily users exploded from 10 million to 200 million. Even early in the pandemic—a crisis “tailor-made for Zoom”—commentators opined that “we live in Zoom now.”¹ Today, the company is actively building towards a pervasive platform future “where you live and work and spend your day” in Zoom; where its proprietary interface and architecture controls time, perspective and participation.²

Hybridised lifestyles predated the pandemic, driven by the non-stop connectivity of the smartphone as digital beacon and bodily appendage. But Zoom is hastening this migration to a cloud-first world. Lydia Kallipoliti has described the city today as “a vast array of disconnected bedrooms, microcosms that come together in an abstract digital space.”³ It is Zoom that constructs the architecture of this space, an architecture that places us in “multiple different rooms at once.”⁴ Remoteness is recast as a simulacrum of face-to-face proximity, until the grid multiplies and we are reminded that IRL interaction rarely involves a collective constant gaze. That is, if we even see beyond our room at all. Research suggests that people spend most time on a video call distracted by their own face. Or—to maintain the illusion of eye contact—staring at the small, glowing green light above their computer screen, “completely alone.”⁵

The primacy of the distorted and disembodied face in a world of virtual meetings is shaping a new sense of self. Physicians have reported a surge in individuals with ‘Zoom dysmorphia’ seeking plastic surgery to alter their minutely examined appearance.⁶ Three years ago, French cosmetics giant L’Oréal acquired an augmented reality (AR) filter company called Modiface, ostensibly to develop a ‘try on’ tool for future purchases. Last November, L’Oréal launched Signature Faces, its first line of virtual makeup. The software offers ten products compatible with a range of videoconferencing platforms (including Zoom), allowing customers to “sign [their] digital look with confidence and audacity.” While the advent of AR-enhanced face filters on platforms like Snapchat had already led to the phenomena of users wanting to edit their physical appearance to match their augmented image, their application in this context constitutes a more habitual blurring of physical and digital identity.

If the face is the currency of Zoom, the background is where the framing of the everyday veers from reality to representation. The unprecedented incursion by Zoom into our private sphere is emblematic of a domesticity in flux. As technology and the demands of 24/7 availability undermine the idea of the home as a space of autonomy and disconnection, Zoom has erased the last pretence that domestic life exists as a world apart.⁷ Amidst performative ‘credibility bookshelves’ and how-to guides to curating your ‘Zoom corner,’ the radical visibility at the edges of a video call reveals the vast economic and social disparities in the capacity for individuals and households to seamlessly adjust to the new normal of working from home.⁸ These disparities are often expressed spatially; in the gloom of a light-starved bedroom, a housemate wandering through the frame, a living room strewn with toys.

Zoom’s answer to the disorder of the home interior is the virtual background. First launched five years ago—though requiring a standalone green screen setup until early 2019—this now ubiquitous feature allows users to resist a forced intimacy by blurring their background, or replacing it with an image or video integrated in real time. An advance on simplified edge detection algorithms pioneered in the 1980s, the technology relies on AI-based neural networks trained to isolate a person in an image from the surrounding background through a computer vision technique called semantic segmentation. Each video frame must be extracted, segmented and added to the virtual background. To ensure a relatively high degree of accuracy in identifying the subject and maintaining frame-to-frame continuity, deep learning models are fed with large datasets of thousands of annotated images containing pixel-accurate locations of human bodies.

The role of AI in enacting our shared realities extends to the stock photography supplying the generic scaffolding for countless calls. Zoom offers three default virtual background options: the Golden Gate Bridge, dewy grass, and the earth seen from space. Designed for universal appeal, constantly updated ‘microstock’ databases are the source of much of the nondescript imagery that saturates contemporary online culture. As their scale grows exponentially, these databases rely on machine learning to remain searchable, using pixel patterns to identify visually or thematically similar photos. Such spatially aware visual search tools sort images based on abstract composition, and increasingly predictive pre-screening for ‘high performing’ content, determined by aesthetic and technical parameters. The mass outsourcing of image curation to AI represents a surrender to computer vision over human ways of seeing, in the face of inconceivably vast streams of visual media.

Last November, Nvidia introduced a new platform, Maxine, built on a machine learning technique called generative adversarial networks (GAN), which can produce real-time video content (for instance rotating a person’s face to correct for off-centre camera angles in video calls). In January, OpenAI announced DALL-E, a text-to-image engine capable of generating plausible images from simple text prompts. Each represents a more fundamental shift towards computational photography—a form of photography both “speculative and relational.”⁹ Trevor Paglen has suggested that at this moment in history “most of the images made in the world are made by machines for other machines,” the learning fodder of so many AI datasets.¹⁰ As systems improve, unreal images will inevitably be made by machines for humans. These photorealistic renderings of digital dreams will become the backdrop to video calls where participants willingly untether themselves from spatial reality.

Despite the sophistication of its underlying technology, the Zoom virtual background embodies a narrowing but persistent lack of contiguity in how we experience the intersection of the digital and physical. Smartphones and other devices increasingly challenge this distinction, through a layering effect enabled by pervasive networked communications infrastructures, which results in an urbanism shaped by “augmented cognition.”¹¹ In contrast, within the static frame of the Zoom window, the frayed edge of the segmented subject produces a kind of cognitive dissonance—clumsy cut-outs and warped glitches that betray presence, experienced as a tear in the virtual fabric. As gaps open and resolve themselves, exposing otherwise invisible algorithmic fingerprints, we are afforded fleeting glimpses of a closed-off world. These glimpses heighten the sense of interacting in a liminal territory, not fully rooted in the physical, and not yet entirely floating in cyberspace.

The economist Edward Glaeser defined cities as an absence of space between people that produces “proximity, density, and closeness.”¹² But architect Andrés Jaque has highlighted how contemporary social settings are not defined by physical space, rather by technological networks of exchange and interaction.¹³ On March 29th, 2020, Tinder users swiped 3 billion times—the most the dating app has ever recorded in a single day—at the exact moment when cities across the world were imposing open-ended lockdowns and strict social distancing measures. Jaque has argued that platforms like Tinder and Grindr have “become the city,” a form of architecture operating at multiple scales that has “redefined what being in a room means, the notions of proximity we live by, what density is about.”¹⁴ Unlike the atomised cells of Zoom, these dating apps construct proximity between strangers, collapsing perceptions of intimacy and distance.¹⁵

Evidence of widespread ‘Zoom fatigue’ is mounting.¹⁶ The next generation of online meeting platforms are focused on breaking free from the grid to recreate experiences of serendipity and immersive spatiality closer to real life. Gather, which recently announced a $35 million investment from Sequioa Capital—the same firm that backed Zoom and Slack—allows users to pilot a personalised digital avatar around a scrolling 2D environment, designed to evoke nostalgia for the pixelated aesthetics of early video games. As you walk towards another avatar, a live video chat window appears within the screen, simulating a more ‘fluid’ form of conversation. The integration of spatial audio technology means that a person’s voice seems to emanate from a defined location in space (including during more complex group interactions), growing louder and softer as your avatar approaches, shifts orientation or retreats.

In 2018, Space Popular’s Lara Lesmes and Fredrik Hellberg declared 10 Propositions for Virtual Architecture as part of the exhibition Value in the Virtual at ArkDes in Stockholm. Contending that as the virtual world gains a third dimension it becomes a matter of architectural concern, these propositions included that “Virtual worlds will intensify our interest and appreciation of physical environments” and “Planetary scale virtual worlds will coexist with their physical counterparts.” Lesmes and Hellberg have suggested that when interacting in social virtual spaces, it is not our image that is the priority, but rather a combination of the natural speech patterns supported by spatial audio and formal and gestural body language facilitated by inhabiting non-realistic avatars.¹⁷ In this reading, the removal of the body from the gathering experience can create a more equitable space, while building empathy and understanding.

The use of the word ‘avatar’ to describe onscreen virtual bodies was coined in 1986 in the massively multiplayer online role-playing game Habitat, a first attempt at a large-scale commercial virtual community. The term was famously popularised by Neal Stephenson in Snow Crash—his 1992 science-fiction novel that also introduced the concept of the ‘metaverse.’ A Silicon Valley obsession, the metaverse represents a collective, interactive and immersive virtual space—an always-on virtual reality that subsumes the mobile internet of 2D web pages and apps. In April, Epic Games, creator of the online game and cultural phenomenon Fortnite, developer of the Unreal Engine (one of two dominant platforms for building virtual worlds) and owner of the pandemic hit Houseparty, announced it had raised $1 billion towards constructing its version of the metaverse. Epic is already integrating features into Fortnite to bolster its use as a social platform, including Party Royale, a gathering space designed explicitly for shared experiences outside gaming.

While Fortnite recently hosted a live in-world concert by rapper Travis Scott ‘attended’ by 12 million people, players could only see and interact with a group of 50 people at one time. Current network technology is incapable of hosting an entire synchronous metaverse, separating users into ‘shards’—siloed sections that limit the population of a server-defined area. Truly planetary scale virtual worlds will require a programming model that does not yet exist.¹⁸ This type of decentralised and collaborative approach, which reflects the open standards and protocols crucial to the development of the internet itself, is antithetical to the proprietary ‘walled gardens’ that have been produced via the monopolising tendencies of platform capitalism.¹⁹ Even as Zoom aspires to become the singular digital setting for our embryonic hybrid lives, the race to control the emergent architecture of the metaverse reflects a more all-encompassing idea of captive markets.

In the field of immersive technology, ‘presence’ refers to the experience of believing you occupy a virtual world. This is connected to individual agency: a sense of control and ability to influence that world. Presence does not explicitly require agency, but agency enables higher levels of presence. For many, Zoom will be an involuntary first step towards a more immersive convergence of the digital and physical, where new forms of digital labour, the creeping influence of AI, and the personal and societal implications of platform logic driving entire world building will continue to raise countless issues, including of individual and collective agency. If tech platforms have become the city, operating simultaneously at multiple scales (from the bedroom to the planetary), then we must be alert to how they are restructuring what it means to inhabit that city, and the role of ‘architecture’ in the future spaces of everyday life.

* * * * *

1. Taylor Lorenz et al, ‘We Live in Zoom Now’, New York Times (17 March 2020).

2. Eric J Savitz, ‘Zoom is Adding New Features to Prepare for a Return to Offices’, Barron’s (3 February 2021); Jeremy Neideck et al, ‘The Iconography of Digital Windows—Perspectives on the Pervasive Impact of the Zoom Digital Window on Embodied Creative Practice in 2020’, Body, Space & Technology 20, no.1 (2021).

3. Lydia Kallipoliti, ‘Zoom In, Zoom Out’, e-flux Architecture (April 2020).

4. T. Nikki Cesare Schotzko, ‘A Year (in Five Months) of Living Dangerously: Hidden Intimacies in Zoom Exigencies’, International Journal of Performance Arts and Digital Media 16, no.3 (2020) 277.

5. Sherry Turkle quoted in Victoria Turk, ‘Zoom Took Over the World. This is What Will Happen Next’, Wired (August 2020).

6. Shauna M Rice et al, ‘Zooming Into Cosmetic Procedures During the COVID-19 Pandemic: The Provider’s Perspective’, International Journal of Women’s Dermatology 7, no.2 (2021).

7. See Jonathan Crary, 24/7: Late Capitalism and the Ends of Sleep (2013).

8. Amanda Hess, ‘The Credibility Bookcase is Quarantine’s Hottest Accessory’, New York Times (1 May 2020).

9. See Hito Steyerl, ‘Proxy Politics: Signal and Noise’, e-flux (December 2014).

10. Quoted in ‘The Autonomy of Images, Or We Always Knew Images Can Kill, But Now Their Fingers Are on the Triggers’, Hito Steyerl: I Will Survive (2021) 240.

11. Benjamin Bratton, The Stack: On Software and Sovereignty (2016) 148.

12. Edward Glaeser, The Triumph of the City: How Our Greatest Invention Makes Us Richer, Smarter, Greener, Healthier and Happier (2011) 6.

13. Andrés Jaque, ‘The Agency of Networks’, Volume 53: Civic Space (2018) 64.

14. Andrés Jaque, ‘Grindr Archiurbanism’, Log 41 (2017) 84.

15. See Alexis Kalagas, ‘Satellites of Love’, trans 26 (2015).

16. See Jeremy N Bailenson, ‘Nonverbal Overload: A Theoretical Argument for the Causes of Zoom Fatigue’, Technology, Mind and Behaviour 2, no.1 (2021).

17. Lara Lesmes & Fredrik Hellberg, ‘From Scrolls to Strolls’ in Guillermo Fernandez-Abascal & Urtzi Grau (eds), Learning to Live Together: Humans, Cars, and Kerbs in Solidarity (2021) 134-135.

18. Dean Takahashi, ‘Tim Sweeney: The Open Metaverse Requires Companies to Have Enlightened Self Interest’, VentureBeat (27 January 2021).

19. See Nick Srnicek, Platform Capitalism (2016).