The original plate of “View from the Window at Le Gras”, a heliograph made by Nicéphore Niépce around 1827.

Although it isn’t normally thought of in these terms, taking a photograph involves recording a four-dimensional block of space-time and projecting it down to a two-dimensional representation. With sufficiently sensitive material (and a fast enough shutter), one can produce images more or less instantaneously, but longer exposures reveal the inherent temporality of this process, showing us something that is clearly based on the world, yet quite different from our experience of it. Today, the ability to create images is so commonplace, of course, that we easily take it for granted, but early commentaries on photography reveal just how extraordinary it once was. Indeed, the history of photography provides both a compelling example of the power of representation, and a useful parallel to more recent forms of technological magic, especially that of machine learning.

As with many inventions, the birth of photography is impossible to pinpoint precisely, as many of the component pieces, such as lenses, chemical manufacturing, and the camera obscura had existed for many years before they were brought together as a coherent image-making system. Nevertheless, 1839 — the year in which the methods of both Daguerre and Talbot were first publicized — provides a convenient marker for when practical photography became widely known and available to the world.

A Daguerrotype of the Boulevard du Temple in Paris, made by Louis Daguerre in 1838. Due to the long exposure, nothing that was in motion was captured, but a standing figure who remained stationary (while getting his boots polished) can be seen.

The reaction among the public was swift and dramatic. Within months of Daguerre’s announcement, the first advertisements for home photography kits were appearing in magazines. Dozens of affordable portrait studios quickly sprung up around cities like Paris, undercutting the small number of professional portrait painters that had previously subsisted on commissions from an elite clientele. Within a few years, millions of Daguerreotypes were being made each year.

Further refinements and mechanization of the process gave photographers the ability to make images ever more cheaply, under a greater variety of conditions, and photographs soon began to circulate widely, especially in the form of collectible cartes de visite. While this new technological marvel gave almost anyone the ability to preserve the faces of loved ones or create their own personal archives, photography was also rapidly incorporated into government bureaucracies, as in Alphonse Bertillon’s method for systematically documenting and preserving a record of everyone who was arrested. Before long, as historian John Tagg observes in The Burden of Representation, “it was no longer a privilege to be pictured but the burden of a new class of the surveilled.” [1]

Part of a composite image made by Alphonse Bertillon in an attempt to catalogue variation in features. “Tableau synoptic des traits physionomiques: pour servir a l’étude du ‘portrait parlé’” (1909).

Beyond photography’s remarkable ability to capture seemingly infinite detail, the most widely-noted feature by early commentators was its supposedly scientific or objective nature. Although skilled draftsmen had the ability to create remarkably detailed hand-drawn illustrations of a scene, there was no guarantee that their depiction would be free from bias or subjectivity. Photography, by contrast, was seen as pure and pristine, simply the action of sunlight. To Daguerre, they gave nature “the power to reproduce herself.” [2] Talbot similarly described his images as being “impressed by the agency of Light alone.” [3]

Such notions now seem quaint, of course, especially in the era of “deep fakes” and other artificial but deceptively realistic image and video. Even before Photoshop, however, it was obvious that photography was far less objective than people initially supposed. Almost immediately, clever artists began to manipulate images in entirely mechanical ways, either by using multiple exposures, or by combining multiple negatives into a single print, as in Oscar Rejlander’s The Two Ways of Life (1857).

More importantly, as has been noted by everyone from Susan Sontag to Errol Morris, even unadulterated images are clearly the product of human decisions, including choices about light levels, exposure, framing, and so on. Beyond explicit manipulation, a photograph is always a view from somewhere, a particular angle with a particular focus. Despite their fidelity to the original source, photographs quite obviously remain a representation, a partial view which necessarily elides some details, and obscures others.

If there is one technology today which can claim similarly mythical status and broad societal effects, it is surely that of machine learning. Much like photography, this relatively new technology is essentially an assemblage of components that have either been extant or in development for many years, including the principles of statistical inference, advances in algorithms and computer hardware, and the normalization of mass data collection. Moreover, like photography, machine learning comes clothed in the robes of witchcraft, seemingly giving rise to the impossible, such as cars that can drive themselves, computers which can translate between languages, and programs which can defeat their own inventors at the most challenging human games.

Indeed, the metaphorical links run deep. While photography was initially hailed as a tool which could be used to document the wrongs of the world, expose hypocrisy, and make visible the conditions of the worst off, machine learning has similarly been promoted as a way to extend human life, replace fallible judgment, and prevent atrocities. Similarly, just as photography was quickly adopted by state interests for the purposes of surveillance and control, new monitoring systems tied to predictive models are now being built on an unprecedented scale.

Claims about the supposed objectivity of algorithms are also easy to find, and, of course, equally misleading. It is true, of course, that any appropriate machine learning algorithm will faithfully compress a dataset into a model, just as a camera will faithfully refract the light bouncing off a scene so as to produce an image. The choices made during this process, however, cannot be ignored. While a photographer makes choices about light levels, camera position, exposure time, and most importantly, what to photograph, the use of machine learning involves choices about what sort of model to use, what assumptions to make, where the data will come from, and above all, what models are worth making.

Rather than capturing a moment from a single vantage point, machine learning takes data extracted from multiple sources, distributed through space and time, and crystallizes them into a static construction, a kind of miniature from which predictions can then be made. While it is possible to continue updating models with new data ad infinitum, most real-world systems eschew this more arduous task, and simply generate a frozen representation which reflects the data and assumptions from which it was built. Extending the metaphor, one could say that a machine learning model is a photograph, but one made from data instead of light.

The first published image of a camera obscura, from “De Radio Astronomica et Geometrica” by Gemma Frisius (1545).

Many of the early debates about photography are now somewhat dated, but one question endures, namely, what exactly is it that photographs tell us? On one hand, they quite explicitly represent only the surface of whatever was in front of the camera. Nevertheless, it is commonly asserted that photographs can reveal something deeper, somehow capturing the true nature of the subject. While the best photographs do seem to reveal and preserve something unique about those we know best, this line of thinking was also taken to its extreme by eugenicists such as the statistician Francis Galton, who created multiply-exposed composite portraits in hopes of identifying the essential character of various groups, such as criminals, or the “Jewish type”.

Multiply-exposed composite photographs made by Francis Galton.

In machine learning, the modeling of latent properties is much more explicit. One of the fundamental ideas in statistics is to assume the existence of “latent variables” which explain the observed data. By combining observations from many instances, we can infer values for these unknowns, even if they are never directly observed. The validity of these inferences depends, of course, on whether or not we have specified the correct model. Even improperly specified models can be enormously useful for making more accurate predictions, but it is easy to forget that high accuracy does not in itself prove the validity of the underlying assumptions.

Although the recent rise of deep learning has entailed a step away from this sort of explicit latent variable modeling, (replacing it with layers of nonlinear functions which are without any specific intended meaning), even deep models nonetheless assume the existence of at least one named variable which is mostly unobserved — that of the label which the model is trying to predict. While there may be cases in which this sort of classification is relatively benign, (perhaps, for example, identifying the types of galaxies present in astronomical images), predicting a label, especially when it comes to making predictions about people, entails a class of harms which Kate Crawford has described as “harms of representation”. [4]

Marketing material form the company Faceception, which claims to do “Facial Personality Analytics”.

The difficulty is not only due to the possibility of predicting the “wrong” label, (such as misgendering people, or even failing to recognize them as human), but rather in the very creation and reification of the categories themselves. Gender and race are the examples that are most readily at hand (each of which is usually treated as a set of discrete, mutually-exclusive categories within machine learning models), but the space of possible harms here is vast. Importantly, it is not just the predictions themselves that matter, but the ways in which they are thought about, described, and in some cases, mythologized.

In Believing is Seeing, one of the best modern commentaries on photography, Errol Morris explores in depth how the meaning of a photograph emerges not just from the image, but from the accompanying text, and the stories we tell about it. [5] Indeed, while people often worry about photographers manipulating their images through mechanical means, it is in describing photographs that the most important manipulations often occur.

One of Morris’ best examples of this is a photograph taken by Arthur Rothstein during the great depression under the auspices of the Farm Security Administration. Shot in South Dakota in 1936, the photograph shows a cow’s skull on a dusty landscape. It was reprinted widely, including in the Washington Post, where it appeared with the headline “Drought Damage Mounting in Western States”. A caption below the image read “From Pennington, S. Dak., comes this photo of a bleached skull on a sun-baked grassless plain, giving solemn warning that here is a land the desert threatens.

One of Arthur Rothstein’s photographs of a cow’s skull in South Dakota (1936).

However, it turns out that Rothstein took several photographs in the same series, repositioning the skull between shots. Rothstein admitted as much, stating that he was looking to create a particular effect.

Additional photographs from Rothstein showing the same skull in different positions.

Once this became known, the first image was suddenly alleged to be a “fake photograph” or government propaganda, especially by those opposed to the Roosevelt administration. The suggestion is basically that Rothstein staged the scene so as to create the false impression of a severe drought. But there was no doubt that there a severe drought was in fact taking place. Moreover, nowhere did Rothstein claim that the photograph was the landscape exactly as he had found it, or that he was trying to provide evidence of the drought. Rather, we make implicit assumptions about how images come into being, and how we are supposed to interpret them. The caption in the Washington Post claims that this image is a “solemn warning”, but this is not something claimed by the image itself. Rather, this is a meaning projected onto the image by the words we use to talk about it.

As Allan Sekula puts it in Photography Against the Grain, “The only “objective” truth that photographs offer is the assertion that somebody or something … was somewhere and took a picture. Everything else, everything beyond the imprinting of a trace, is up for grabs.” [6] Or, as Rothstein himself has said, “Both photographs were accurate and truthful. The words, written or edited by others, in conjunction with the photographs, were wrong and incorrect.” [7]

My point, of course, is that the same phenomenon holds for machine learning. It is one thing to collect a dataset, train a model, and predict probabilities associated with a set of invented categories. It is quite another to assert that this is a model of recidivism, or of criminality, or of creditworthiness, or of any number of other possible labels. Such simple and suggestive names necessarily elide the important details which contributed to the creation of such a model — where the data came from, how it was collected, what was excluded, and what assumptions were made. In most cases, by the time a model gets created and deployed, the world has moved on, leaving only the model, and its description, as a trace of some data and the choices that were made in turning it into a representation of the world.

Ultimately, the most important impact of a technology lies in its potential to reshape relations of power. As Susan Sontag notes, for example, photography not only gave us the ability to document the lives of others, it also redefined what it is acceptable to observe. [8] Initially, only a small, technically-sophisticated elite had the ability to make photographs. That power quickly became massively distributed, however, with the development of simple and affordable hand-held devices at the end of the nineteenth century, such as Kodak’s early cameras.

Both photography and machine learning have at least some potential to be democratizing technologies — giving more power to individuals to document, infer, and make known the misbehaviour and hypocrisy of the powerful. Early photographs did (at least momentarily) shock elites by showing them the conditions of deprivation in which many lived. Similarly, the mass proliferation of camera phones has meant there is now much more likely to be video documentation of any use of force by the state. Nevertheless, there tends to remain a fundamental power imbalance in image making. Just as we never get to see the person (or computer) behind the surveillance cameras that document our movements through public space, the largest companies now control data about us on a scale that will never be equalled by individuals.

The extremes of this imbalance have shown up many times in the history of photography, such as the well-known photographs of slaves in the Southern United States, taken in the 1850s under the direction of Louis Agassiz, who wanted to prove that there were different races which had been “separately created”. [9] Recent years have seen similarly offensive uses of machine learning, such as the paper proposing to predict criminality of Chinese men from photographs on government-issued IDs. While the machine learning community has readily condemned these and other transparent wrongs, it is also necessary to guard against the subtle but persistent division of individuals into more or less arbitrary groups.

In the end, whether we are talking about creating images or making models, the most relevant question is still, who has the power to document and define the behaviour of whom? Although he was writing about photography, Sekula could equally have been referring to machine learning when he wrote how it led to “representational projects devoted to new techniques of social diagnosis and control, to the systematic naming, categorization, and isolation of an otherness thought to be determined by biology and manifested through the “language” of the body itself.” [10] It is this sort of “instrumental realism” (as Sekula calls it) that we should continue to resist. Despite the undeniable magic of all these technologies, the map is not the territory, a photograph is not a looking glass, and machine learning is not an oracle.

References

[1] The Burden of Representation: Essays on Photographies and Histories by John Tagg. University of Minnesota Press (1993).

[2] “Daguerreotype” by Louis Jacques Mandé Daguerre. In Classic Essays on Photography. Edited by Alan Trachtenberg. Leete’s Island Books (1980).

[3] Insert to The Pencil of Nature by William Henry Fox Talbot. Longman, Brown, Green & Longmans (1844).

[4] “The Trouble with Bias” by Kate Crawford. Keynote talk at NeurIPS (2017).

[5] Believing is Seeing (Observations on the Mysteries of Photography) by Errol Morris. The Penguin Press (2011).

[6] “Dismantling Modernism, Reinventing Documentary (Notes on the Politics of Representation)” by Allan Sekula. In Photography Against the Grain: Essays and Photo Works 1973–1983. MACK (2016).

[7] Quoted in Believing is Seeing (Observations on the Mysteries of Photography) by Errol Morris. The Penguin Press (2011).

[8] On Photography by Susan Sontag. Picador (1973).

[9] “Black Bodies, White Science: Louis Agassiz’s Slave Daguerreotypes” by Brian Wallis. In American Art, Vol. 9 (1995).

[10] “The Traffic in Photographs” by Allan Sekula. In Photography Against the Grain: Essays and Photo Works 1973–1983. MACK (2016).