Volume 21, Issue 1 | Spring 2022

Practicing Art History
Techniques of the Art Historical Observer

by Elizabeth Mansfield, with Zhuomin Zhang, Jia Li, John Russell, George S. Young, Catherine Adams, and James Z. Wang

Beginning in the nineteenth century, the relation between eye and optical apparatus becomes one of metonymy . . . the limits and deficiencies of one will be complemented by the capacities of the other and vice versa.

– Jonathan Crary, Techniques of the Observer (1990)

Readers of Nineteenth-Century Art Worldwide may recognize the epigraph above and rightly assume that it, along with the title of this essay, signals that the following paragraphs are concerned with epistemologies of sight.‍[1] The relationship that Jonathan Crary identified between popular visual entertainments like dioramas or zoetropes and the emergence of a distinctly modern viewing subject in the nineteenth century may be analogous to changes happening today. Only this time, instead of stereoscopes and kaleidoscopes, the new technology is computer vision. And, in the case of this essay, it is the visual agency of the art historian, specifically, that is of primary interest. But that is not the sole intent of the epigraph. Crary’s observation that “the limits and deficiencies of one will be complemented by the capacities of the other and vice versa” perfectly describes the potential for collaborative research between humanists and scientists. This essay provides an account of a collaborative project from the perspective of one member of the research team, the art historian listed first in the byline, but that perspective is complemented by, and indebted to, the team as a whole.‍[2]

John Constable’s cloud studies perfectly demonstrate the epistemological concerns of nineteenth-century Realism. Or so I wanted to argue. My interest in Constable’s clouds, which grew out of a study of Realism that sought to anchor the movement in the aesthetic imagination of British empiricism, was due to their potential as evidence for a particular approach to lifelikeness in European art of the late eighteenth and early nineteenth centuries. I was seeking examples of visual representation that functioned implicitly as exacting records of observed reality that were nonetheless largely if not wholly drawn from the imagination. Illustrative of the phenomenon I was seeking to trace is the enthusiasm for making cloud studies in the decades around 1800. This is especially the case for Constable, where there is so much documentation of his empirical practice: his habit of executing oil sketches of the sky en plein air was described in his correspondence, remarked upon by others, and has been substantiated by technical analysis of his cloud studies. What is more, Constable noted location, time, and meteorological conditions on many of his studies. The cloud studies’ basis in the direct observation of specific clouds is irrefutable. Yet, the very nature of clouds makes the exercise of painting them faithfully from nature impossible.‍[3] Clouds are constantly changing, their appearance shifting as the water vapor that constitutes them condenses and evaporates unceasingly. It was the widespread appeal of making cloud studies directly from nature as much as the results of these efforts that initially drew me to the subject.

figure 1
Fig. 1, John Constable, Cloud Study, 1822. Oil on paper laid on canvas. Yale Center for British Art, Paul Mellon Collection, New Haven. Photo courtesy of the Yale Center for British Art.

That said, Constable’s clouds remain astonishingly convincing in their uniqueness and naturalism (fig. 1). What impressed his contemporaries continues to impress viewers today. I believed I had found in Constable’s “skying” exercises a consummate display of the kind of empirical imagination I was seeking to document around 1800: instances where visual representation exceeded the then prevailing belief that all pictures—no matter how novel they might appear—were merely rearrangements of forms observed in nature.‍[4] If each of Constable’s cloud studies really was unique—and produced by an imagination so empirically fluent that they appeared to be directly copied from nature—then my hypothesis might stand. But then I started to have doubts. Am I just so accustomed to perceiving clouds painted the way Constable paints them as realistic that I am naively trusting their accuracy? Alternatively, was Constable relying on a discrete set of patterns or formal structures that successfully captured the appearance of a cloud and then simply repeating them in slightly different arrangements to give the effect of particular clouds when he was really painting essentially the same few clouds again and again? Confronted with these doubts, I wondered how to assess the degree to which Constable’s striking effects of naturalism might be due either to pictorial conventions that had become naturalized or to formal repetitions indiscernible to most observers. Computer vision struck me as uniquely suited to the task.

Computer vision has proven a useful tool for the technical analysis of art, yielding valuable information about condition, artistic processes, and materials, as well as insights into provenance and authenticity.‍[5] When it comes to art historical interpretation—by which I mean the diverse practices used to elicit historical meaning from works of visual or material culture—computer vision has been applied more tentatively, at least by art historians.‍[6] Certainly, access to the resources necessary to pursue this research—especially opportunities for collaboration with computer scientists—is an obstacle for many. But few seem to be lamenting these barriers, suggesting that computer vision research is not seen as especially promising in contrast to other emerging approaches grounded in technical analysis.‍[7] This apparent apathy is likely due, at least in part, to a disciplinary ambivalence toward computer vision’s key affordance: pattern recognition. The ability to show convincingly that one thing looks like another thing is not incidental to the practice of art history: the discipline earned its academic credibility in the late nineteenth and early twentieth centuries precisely by deploying methodologies that revealed and sought to explain formal relationships, whether in a given artist’s oeuvre, within national or ethnic or racial borders, or across cultures and through time.‍[8] The ability to recognize and compare formal features was essential for stylistic and morphological analysis, as foundational for Heinrich Wölfflin’s principles and Alois Riegl’s concept of Kunstwollen as it was for Giovanni Morelli’s method of connoisseurship. It is easy to understand why computer vision might be viewed as a belated tool for art historical investigation. Connoisseurship is now rarely viewed as an art historical end in and of itself but rather as subsidiary to historical interpretation and analysis. The theories of Wölfflin, Riegl, and other ambitious thinkers of a similarly structuralist turn of mind were largely shunted aside by the generation of the New Art History as either hopelessly naive or dangerously essentializing. Given this historiography, suspicion of interpretive methodologies that rely on schema and patterns is understandable. Art historians are wary of becoming reenchanted by a reductive formalism.‍[9] Even so, to replace intellectual curiosity with methodological vigilance risks normalizing a narrow understanding of both the practice of art history and the research potential of computer vision.‍[10]

My own epiphany regarding the research potential for computer vision occurred at a 2018 symposium hosted by the Digital Art History Lab at the Frick Art Reference Library. Searching Through Seeing: Optimizing Computer Vision Technology for the Arts included a presentation by Elizabeth Honig, an expert on the Brueghel family and one of the first art historians to apply computer vision to her own research. Honig demonstrated the Image Investigation Tool (IIT), initially developed in collaboration with computer scientist Charles Henderson, which uses computer vision to identify formal relationships among the thousands of images comprising the Brueghel dataset assembled by her and her research group.‍[11] For Honig, computer vision facilitated a better understanding of Netherlandish and Flemish workshop practices by bringing to her attention previously overlooked instances of exact repetitions of figures or other compositional elements in paintings that emerged from the various Brueghel ateliers.‍[12] She also described an experiment with machine learning undertaken in collaboration with computer science colleagues at the University of California, Berkeley.

When applied to image analysis, recent machine learning involves preparing a Convolutional Neural Network (CNN) by “teaching” it to recognize salient features shared across different images.‍[13] This class of techniques is often referred to as deep learning. Applied to art historical research, deep learning can readily be used for iconographic analysis or to identify recurrent motifs. For instance, Honig showed how a CNN trained to look for windmills in the Brueghel dataset identified several repetitions where the same windmill appears—often at a different scale or even in mirror-reverse—in paintings and drawings attributed to the workshops of different members of the Brueghel family. What would have taken a human viewer hours of work sorting and comparing images took the computer seconds. It is easy to imagine scaling up and identifying similarities across millions of images, which could be a real boon to scholars engaged with iconographic studies or Bildwissenschaftliche research.‍[14] As interesting as all of this was, the most illuminating moment for me came at the end of Honig’s talk. One of the machine-learning experiments focused on a narrative subject, the Adoration of the Magi. Programmed to find likely scenes of the Adoration, the CNN’s results included among the highly probable matches not only six other Adorations but also a landscape showing a few cows in an otherwise vacant forest. “The computer did pick up a cow in the last run,” Honig observed, suggesting that the CNN might have settled on the presence of a cow or two as a defining feature of the crowded Brueghel Adoration scenes, “so it’s not yet perfect.”‍[15]

Here is the tricky thing about deep learning: there is no way of knowing exactly how the CNN is doing its task. When we ask for Adorations and get Adorations, we think the CNN is successfully trained; when we ask for Adorations and get cows, we see it as a mistake. What I realized when Honig showed the results from the machine learning experiment was that the CNN had not made a mistake (of course—computers cannot really “make mistakes”). The apparent outlier actually shared a major formal feature with the Adorations: all of the compositions had a brightly illuminated clearing in the fictive middle ground. We humans were so busy looking for mangers and magi that we missed the subtlety of the CNN’s operation. It was at this moment that I realized the potential for computer vision not only as a means for efficient formal analysis at scale but also as a gauge for art historians’ own perceptual biases and blind spots—a kind of disinterested but visually attentive interlocutor.

A few months later, a colleague familiar with my curiosity about the morphology of cloud studies introduced me to computer scientist James Wang at Penn State, where I had just started working.‍[16] There are only a handful of computer vision and machine-learning experts who are actively engaged in research directly related to art historical study. Two barriers stand in the way of computer scientists’ pursuit of academic research in the arts. The first is profit motive, or, rather, the lack of profit potential in most cultural heritage research. Within an academic setting, where grant funding is essential for scientists who need to maintain and staff laboratories, “translational research” that promises commercial, industrial, or military use is what gets funded most often. Contributions to art historical research are unlikely candidates for external funding at the scale needed to sustain science research. The second reason is that the research problems presented by art historians are sometimes not technically challenging enough to push the boundaries of computer vision or machine-learning research. Scientists—especially early career researchers—cannot afford to devote much time to projects that may not result in papers that can be presented at top conferences or published in high-impact journals. Wang, who has long made visual arts research one of his areas of specialization, is among the exceptions. Even so, I was uncertain whether my research on paintings of clouds by Constable would offer enough of a challenge to pique his interest. Luckily for me, clouds are a tricky thing for computers to recognize and analyze. Their edges are indistinct and their forms are infinitely diverse. Wang was intrigued. My pitch was further aided by the fact that one of his regular collaborators, Jia Li, specializes in statistical modeling and has applied her techniques to weather systems.

Art history research involving digital or computational methods is, by necessity, a collaborative endeavor. Diverse sorts of specialized knowledge are required.‍[17] Even with the scientific and technical expertise of Wang and Li, the project needed collaborators with experience managing digital humanities projects and image datasets. Penn State’s Digital Humanities Librarian John Russell and Assistant Visual Resources Curator Catherine Adams thus completed the initial roster of collaborators on the Seeing Constable’s Clouds project.‍[18] At our first few meetings, along with discussing how computer vision could be used to analyze Constable’s clouds, the group spent a lot of time just looking at his paintings and oil sketches. These close-looking sessions began in a seminar room with projections of high-resolution digital images and eventually took us to the Yale Center for British Art, where we could see some of Constable’s cloud studies in person and examine the skies in several of his finished landscapes and consult with additional specialists.‍[19] Concerns about the reliability of the images in our dataset as faithful records of Constable’s clouds were alternately amplified and assuaged by Mark Aronson, the deputy director and chief conservator, and Paul Messier, Pritzker Director of Yale’s Lens Media Lab. For instance, it became clear that relying on color as a salient feature of Constable’s clouds was going to be challenging if not impossible given the vagaries of impermanent pigments, relative exposure to light, historic retouchings, and conservators’ interventions.‍[20] Our best bet would be to focus on compositional features other than color, such as line and modeling. Aronson and Messier also helped us think through our plan to use photographs of clouds as training images, a conversation that ultimately led us to use “naive” photographs commissioned by the National Oceanographic and Atmospheric Agency (NOAA) for scientific use as opposed to deliberately artistic photographs of clouds.

Doctoral candidate Zhuomin Zhang, a member of Wang’s research group, conducted the training of the CNN.‍[21] To test the CNN, we needed a research problem that was both reasonably challenging and verifiable by other means.‍[22] As already noted, a CNN does not show its work after solving a problem. It is therefore helpful to start by training the CNN to perform a task that a human viewer can reliably verify (e.g., windmill, not windmill). It quickly became clear that my original research question was not well suited for preliminary trials. To train the computer to find possible instances of repetition of forms across different clouds, we would first need to identify the substructures that Constable might have used to assemble his clouds and then define the salient features of the substructures for the CNN to seek. To use a Morellian analogy: it was not enough to train the CNN to recognize artists’ depictions of fingernails; to answer the kind of question I was asking, the CNN would need to analyze the dataset for similarities in the rendering of the lunula, the little sliver-of-a-moon at the base of the fingernail. We were not even sure yet whether the CNN could discern a painted cloud from a painted sky. So training the computer to identify substructures of clouds before it could recognize clouds was clearly not a reasonable starting point.

This moment in our collaboration is worth highlighting here. Flexibility and patience are essential for collaborations between humanists and scientists. Our research cultures as well as our methodologies are quite different. To make any progress, everyone needs to be willing to depart from the accustomed paths for research in their disciplines, and the collaborative endeavor takes precedence over individual research aims. This can be frustrating, especially for humanities scholars who are used to working alone and having complete control over their research agenda.‍[23] As counterintuitive as it sounds, one of the first things I had to do in order to advance our project was to leave aside, at least for a time, the very question that led me to computer vision in the first place. Fortunately, a research question more suitable for testing Zhang’s training model readily emerged from the existing scholarship on Constable’s clouds.

An intriguing (and contested) explanation for the striking naturalism of Constable’s clouds was put forward by Kurt Badt in his 1950 book, John Constable’s Clouds.‍[24] Badt argues that Constable’s ability to paint clouds improved dramatically around 1821 and that this change was due to the artist’s belated introduction to cloud taxonomy. The classification of clouds into different types—cumulus, cirrus, stratus, etc.—was proposed by British chemist Luke Howard at an 1802 lecture in London and published the following year (figs. 2, 3).‍[25] In Badt’s view, the difference in quality he discerned in clouds painted before 1821 and those made later could only be explained by a profound change in Constable’s understanding of clouds around that time—a new understanding based on a belated encounter with Howard’s cloud taxonomy. The concentrated season of “skying” Constable undertook from 1821 to 1822 was not sufficient to effect the improvement in his painted clouds, according to Badt; instead, it was an epistemological shift in Constable’s relationship to clouds, not a change in technique or a consequence of sustained empirical study.‍[26] Badt had no direct evidence to support his contention. Constable does not mention Howard’s meteorological studies in his correspondence and there is no evidence that he ever possessed Howard’s book or attended his lectures.‍[27] Given the rapid and widespread popularization of Howard’s system in the years after its 1803 publication, it is altogether possible that Constable was familiar with the taxonomy well before 1820. But it is just as possible that he learned about it later.

figure 2
Fig. 2, Title page for Luke Howard, “On the Modifications of Clouds, and on the Principles of Their Production, Suspension, and Destruction, Being the Substance of an Essay Read before the Askesian Society in the Session 1802–3,” from The Philosophical Magazine, October 1803, n.p. Image courtesy of the British Library.
figure 3
Fig. 3, [?] Lewis, Illustration for Luke Howard, “On the Modifications of Clouds, and on the Principles of Their Production, Suspension, and Destruction, Being the Substance of an Essay Read before the Askesian Society in the Session 1802–3,” from The Philosophical Magazine, October 1803, n.p. Aquatint. Image courtesy of the British Library.

Over the years, Badt’s theory has been alternately dismissed and accepted by art historians.‍[28] Badt’s argument relies mostly on his own visual assessment of the relative naturalism of Constable’s clouds before and after 1821, so it is precisely the kind of art historical conversation to which computer vision might helpfully contribute. To test Badt’s theory, we first sought to determine whether Constable’s clouds could be readily classified by type. In other words, do Constable’s clouds exhibit the defining features that Howard ascribed to cumulus, stratus, and cirrus? The first step in this experiment was to train the CNN to distinguish clouds by type using a large dataset of cloud photographs created by the National Oceanic and Atmospheric Administration (NOAA) that is labeled according to the taxonomy first proposed by Howard and still in use today. The classification capability of a well-trained CNN depends on the learned relation between extracted visual features of one image and its corresponding cloud type. These features can be low-level image characteristics, such as color, edge, or location, and some high-level semantic information. Since the classification model is trained using photographs, painted clouds can be accurately classified only if they follow the same feature distribution documented in the photographs. Once Zhang had trained the CNN to accurately distinguish “cumulus” from “cirrus” from “stratus,” the CNN was ready to assess Constable’s clouds by scoring each painted cloud as a probable match with one of the standard types. But, before putting the trained CNN to work on Constable’s clouds, we needed to create a ground truth against which to compare the CNN’s output. Otherwise, we could not confidently assess the CNN’s results for accuracy. For this, we enlisted the expertise of a meteorologist to identify—if possible—Constable’s clouds by type. George Young, professor of meteorology and atmospheric science at Penn State, assessed each painting. Along with indicating whether the cloud or clouds represented conformed to a standard type, Young described the weather conditions under which the cloud represented would have been formed—to the extent that he could derive such information from the painting.‍[29] He also ascribed to each cloud painting a subjective score for its “realism,” that is to say, its overall accuracy as a convincing representation of an actual meteorological event.‍[30] Young’s cloud identifications served as the ground truth against which the CNN’s performance was measured.

The CNN then took its turn assessing the paintings. Constable’s clouds performed well in terms of conforming to typological characteristics and the computer’s assignment of cloud types largely matched Young’s. So Zhang’s training of the CNN appeared to be successful. The results suggested that Constable recognized the distinctive features of cumulus, stratus, and cirrus clouds and noted those features in his paintings.‍[31] While these results might weigh in favor of Badt’s argument, the CNN (like Young) did not register a change in Constable’s performance in this area around 1821.‍[32] In other words, Constable’s clouds could be classified by type even before the period of intense skying in 1821.

To add texture to the experiment, we expanded our dataset to include another two hundred images of clouds painted by Constable’s known emulators and other contemporaries noted for their illusionistic skies and their plein-air paintings of clouds.‍[33] The former included the artist’s son, Lionel Constable (1828–87); Frederick R. Lee (1798–1879); and Frederick W. Watts (1800–1870). Pierre-Henri de Valenciennes (1750–1819), David Cox (1783–1859), and Eugène Boudin (1824–98) comprised the latter. Young labeled these images according to cloud type—when discernable—and apparent conditions along with his subjective “realism” score. The results produced by the CNN again aligned with Young’s ground truth: Constable’s clouds were “better” than his contemporaries in that they could be classed pretty consistently by their features into Howard’s typology. Interestingly, Pierre Henri de Valenciennes’s clouds (mostly painted in the 1780s and 1790s) scored nearly as well as Constable’s in terms of typological consistency. This result, while not dispositive of Badt’s argument, does suggest that empirical study and long practice of plein-air painting of clouds could equip artists striving for a certain kind of naturalism with the ability to depict clouds accurately by type—even if they are not familiar with Howard’s taxonomy.

Now confident in the CNN’s ability to compare individual clouds—both photographic and painted—across our dataset, we decided to embark on an experiment that engaged directly the concept of Realism. The new experiment also introduced a more qualitative aspect to our research with computer vision. As already stated, our dataset was trained to recognize clouds using photographs of clouds. We therefore decided to run an experiment in which the painted clouds were assessed according to their similarity or difference from photographic clouds from NOAA, proceeding from the hypothesis that photographs of clouds are the most realistic images of clouds. Here, it is worth pausing a moment to acknowledge what is presumed by this hypothesis. Photographs—even unartful, climate research photographs—are no less forms of representation than are paintings. NOAA’s cloud photographs may not involve a human photographer (many are taken automatically), but they are nonetheless mediated by the positions of the cameras, by the decisions that led to the photographic project in the first place, by whatever editing may have been done, and by the technology of digital photography itself. This list of ways in which photographs of clouds are different from real clouds could go on. That said, our experiment was founded in what I think is a reasonable belief that photographic clouds are the most empirically faithful visual representations of clouds we have. Additionally, the representational biases of photography are not irrelevant to the study of naturalism of cloud paintings by Constable and his contemporaries. The invention of photography in the 1820s and 1830s was a response to already emergent European notions about what representation could or should be as much as it was an outcome of specific technological innovations and aspirations. Naturalism emerged from this same milieu. That artists like Constable might have been working toward a mode of representation in some ways akin to that produced by photography is probable.‍[34] So to posit photographs as a kind of standard for comparison raised interesting possibilities for computer vision as one way to analyze immediately prephotographic and postphotographic pictorial strategies.

Zhang ran this experiment, too, and her results proved interesting. During the image translation process from paintings to photos, the CNN-based encoder can disentangle the “style” features from each painting. Here, “style” is used in reference to a set of formal features that can be described mathematically and thus compared. For any given set of images (e.g., NOAA photographs of clouds, painted clouds by Constable, or painted clouds by all artists in the dataset), distinctive formal features are identified by the CNN and these features can be further used to measure formal similarity between paintings and photos.‍[35] Valenciennes’s clouds were scored by the CNN as the most similar to photographic clouds (fig. 4). Constable ranked quite closely behind Valenciennes, with Constable’s emulators Lionel Constable, Watts, and Lee trailing them slightly.‍[36] Historians of nineteenth-century art will perhaps not be surprised to learn that the computer deemed Eugène Boudin’s clouds comparatively “unrealistic” (that is to say, unphotographic) and, in the first experiment, accorded lower probability to Boudin’s clouds’ classification as a specific type of cloud. Painting later in the century, Boudin is famous today as the teacher of impressionist Claude Monet, and Boudin’s looser technique can hardly be described as photographic (fig. 5). Even so, his contemporaries saw great truth in his views, and he was known in his lifetime as the “king of skies.”

figure 4
Fig. 4, Pierre Henri de Valenciennes, Classical Landscape with Figures and Sculpture, 1788. Oil on panel. J. Paul Getty Museum, Los Angeles. Digital image courtesy of the Getty’s Open Content Program.
figure 5
Fig. 5, Eugène Boudin, On the Beach, Sunset, 1865. Oil on wood. The Metropolitan Museum of Art, New York. Image courtesy of the Metropolitan Museum of Art.

What have these experiments with computer vision contributed to my practice as an art historian? For one thing, I have become more attentive to my own habits of looking, with a sharpened awareness that my perception is inflected by both disciplinary and personal biases. I am trying to be more alert to what is just outside my perceptual field—to look past the Adorations, so to speak, to see what else may be in plain view without being apparent. I also have a renewed appreciation of the cognitive as well as historical distance separating me from the subjects I study. Of course, narrowing that distance is arguably the aim of art history, and my own attraction to computer vision is found in its potential to help me see a little more historically. Do I believe computer vision has the potential to serve as a kind of prosthetic “period eye”?‍[37] Not exactly. But I do think there’s potential for computer vision to aid art historians in gaining a better understanding of perception as well as vision.

Second, our experimentation with computer vision has brought to my attention new ways to study Realism and its relationship to modernism. Helpful in this respect have been experimental results like the comparable scores given to Valenciennes and Constable in terms of their approximation to photographs of clouds. The degree to which this finding provides accurate information about Valenciennes’s and Constable’s relative “realism” matters less, I think, than the invitation it extends to look more closely, not just at their art but also at their respective historiographies. Art historical narratives tend to treat the English “Romantic” Constable quite apart from the French academician Valenciennes, aligning Constable with the origins of naturalism and modernism in French landscape painting. Valenciennes, on the contrary, is remembered as a rigorous theoretician devoted to classicism and as a stalwart academician in the postrevolutionary era. In other words, he is regarded as a bit behind the times in contrast to the progressive Constable. Yet, as the experiments with computer vision remind us, the two painters share a good deal in the way of formal concerns and representational strategies. If computer vision can point toward avenues for further inquiry—in this case, ongoing transnational research on European art around 1800 and a reconsideration of Constable not just as a precursor to modernism and impressionism but as a culminating painter of the eighteenth century—its use as a complement to existing techniques of art historical analysis has arrived.


[1] Epigraph: Jonathan Crary, Techniques of the Observer (Cambridge, MA: MIT Press, 1990), 129.

[2] Authorship, like vision, is an expression of subjectivity in a particular moment and place. Likewise, the modality of scholarly discourse reflects disciplinary norms and cultural assumptions. This essay departs from the norms of humanistic discourse by listing all seven members of the research team, some of whom did not participate directly in writing or editing it but who contributed to the research it describes. The essay also falls outside conventions for scientific discourse in its use of the first-person perspective. The difficulty in accommodating scholarly conventions across several disciplines (art history, computer science, meteorology, digital humanities) and academic roles (tenured or tenure-line professors, PhD candidate, technical support staff) is an indication of some of the well-known barriers to collaborative research among humanists, scientists, and technologists. But the bending of these norms here is also suggestive of a growing flexibility in at least some venues for research. On the history of authorship in relation to humanistic and scientific discourse, the standard account remains Michel Foucault, “What Is an Author?,” originally delivered in French as “Qu’est-ce qu’un auteur?” The standard English translation appears in Paul Rabinow, ed., The Foucault Reader (New York: Pantheon, 1984), 101–20.

[3] Constable’s plein-air cloud studies were drawn and painted using a variety of media, though. He often executed them with oil on paper.

[4] For more on the concept of invention in relation to the practice of making cloud studies, see Elizabeth Mansfield, “Cloud Studies as Romantic (and Realist) Fragment,” Word & Image 37, no. 1 (2021), esp. 56–58, https://www.tandfonline.com/.

[5] The following provide excellent summaries of the history of computer vision as applied to art history: James Z. Wang, Baris Kandemir, and Jia Li, “Computerized Analysis of Paintings,” in Digital Humanities and Art History, ed. Kathryn Brown (New York: Routledge, 2020), 299–312; and Amalia Foka, “Computer Vision Applications for Art History: Reflections and Paradigms for Future Research,” Proceedings of EVA London 2021, Science Open, July 2021, https://www.scienceopen.com/. For a more evaluative, and critical, assessment of some recent projects that apply computer vision to the study of art, see Sonja Drimmer, “How AI Is Hijacking Art History,” The Conversation, November 1, 2021, https://theconversation.com/.

[6] Scientists, on the contrary, have been energetically applying computer vision to evaluate works of art. Examples are too numerous to cite. An indication of the scope of research is suggested by the more than 1,500 results produced by a search of Penn State University Libraries’ holdings and journal subscriptions using the combined search terms “computer vision” and “artwork.” As the “state of the field” essays by Wang et al. and Foka make clear, projects involving art historians represent a tiny fraction of this research output.

[7] The chief example here is technical art history, which has become a major trend in art historical research in the past decade. Technical art history poses similar barriers to scholars in that it often requires access to expensive instruments and benefits from collaborative research teams with diverse forms of expertise.

[8] It is difficult to reflect on the potential for computer vision to aid in the study of nineteenth-century art without calling to mind Jonathan Crary’s account of the emergence of the modern viewing subject around 1800. In Techniques of the Observer (1990), Crary observed a shift away from a kind of stable, Cartesian conception of vision toward a subjective, somewhat disembodied viewing subject at the start of the nineteenth century, an observer that exemplified Foucault’s “empirico-transcendental doublet.” The art historian—as a distinct viewing subject—arguably came into being in the midst of this transition and has been negotiating the competing allures of objective, authoritative vision and subjective, unique insight ever since. Parallels between the moment Crary identifies around 1800 and our current negotiation of the advent of computer vision and machine learning bear consideration that is beyond the scope of the present essay.

[9] This is not to say that the theories of Wölfflin or Riegl were reductive, only to acknowledge that this is how their methodologies are sometimes characterized today.

[10] To cite just one obvious example, formal comparisons are still used to help identify pottery shards. For a case study on how computer vision was used for this work, see Jun Zhou, Haozhou Yu, Karen Smith, Colin Wilder, Hongkai Yu, and Song Wang, “Identifying Designs from Incomplete, Fragmented Cultural Heritage Objects by Curve-Pattern Matching,” Journal of Electronic Imaging 26, no. 1 (January 5, 2017), https://doi.org/10.1117/1.JEI.26.1.011022.

[11] Software design group Agile Humanities further developed and refined the IIT.

[12] Honig cites the example of a windmill in a painting by Jan Brueghel that is a repetition of a windmill in a painting by his father, Pieter Brueghel the Elder, stating “This was news to us.” Elizabeth Honig, “Human Vision, Computer Technology and the Image Investigation Tool,” The Frick Collection (website), April 13, 2018, https://www.frick.org/.

[13] For instance, a computer scientist might program a CNN to recognize horses by tagging the salient features of a horse in hundreds or thousands of images with horses. When correctly programmed and trained, the CNN will develop an algorithm on its own to satisfy the task “find horses.” The CNN also provides information about the closeness or probability of an exact match. In other words, in the universe of horse pictures, how close of a match is this image of a horse to that image of a horse. These are the rudiments of machine learning, which is now being explored for use with image datasets derived from commercial and military contexts, among others. There are abundant sources on these topics. See, for instance, Gopal Ratnam, “Pentagon Aims to Spread Artificial Intelligence across Military Services,” Roll Call, January 14, 2021, https://www.rollcall.com/; Sara Castellanos, “Pinterest Harnesses AI for Visual-Based Shopping; Company Says Its AI Technology Can Identify More than 2.5 Billion Objects in Photos of Fashion and Home Décor,” Wall Street Journal, September 19, 2019, https://www.wsj.com/.

[14] A pioneer in the area of Big Data as it relates to visual culture research is Lev Manovich. His projects are described here: “Selected Projects & Exhibitions,” Lev Manovich (website), accessed January 17, 2022, http://manovich.net/.

[15] Honig, “Human Vision.”

[16] I am indebted to Andy Schulz, former associate dean for research in the College of Arts and Architecture at Penn State, for introducing me to James Wang in 2018.

[17] The likelihood that a single researcher might possess all the requisite art historical, technical, and scientific knowledge and skills needed to carry out this kind of research is very small, hence the expression “DH unicorn.” Such researchers exist, but they are rare. To move research forward in this area, collaborations are essential.

[18] What makes a dataset suitable for this kind of research is the number and quality of images. The more images you have, the better the training of the CNN can be. In terms of quality, the goal is to reduce as much as possible any image “noise” (digital artifacts of the imaging process itself) that could inadvertently become salient features for the CNN. Our dataset was created by Penn State’s Visual Resources Centre, which secured nearly one hundred high-resolution images of clouds painted by Constable. By machine-learning standards, this is a tiny dataset.

[19] One of the benefits of these kinds of collaborations is that, contrary to what one might assume, it actually promotes a kind of slow, close looking that can be revelatory in and of itself. Our March 2019 visit to the Yale Center for British Art also benefitted from conversations with Damon Crockett, Jessica David, Nicholas Robbins, and Scott Wilcox.

[20] Commissioning hyperspectral images of all the paintings we wanted to analyze was seen as cost and time prohibitive.

[21] Wang’s Penn State research group is Modeling Objects, Concepts, Aesthetics and Emotions in Big Visual Data. See “About Us,” James Wang Research Group, accessed January 17, 2022, http://wang.ist.psu.edu/.

[22] Ultimately, because we are working with machine learning, it is only the results that are verifiable at this point, not the methodology.

[23] For an insightful account of interdisciplinary, collaborative work with scientists written by an anthropologist of research cultures, see Park Doing, Velvet Revolution at the Synchrotron: Biology, Physics, and Scientific Change (Cambridge, MA: MIT Press, 2009).

[24] Kurt Badt, John Constable’s Clouds (London: Routledge and Kegan Paul, 1950).

[25] Luke Howard’s taxonomy was first disseminated in print as a short article in a journal read by natural scientists and keen amateurs: “On the Modifications of Clouds, and on the Principles of Their Production, Suspension, and Destruction: Being the Substance of an Essay Read before the Askesian Society in the Session 1802–3,” The Philosophical Magazine, October 1803, 5–11. See D. E. Pedgley, “Luke Howard and His Clouds,” Weather 58, no. 2 (February 2003), https://doi.org/10.1256/wea.157.02.

[26] “after Constable had become familiar with this classification based on visible differences of shape, it could not help but influence his own observations and the painter’s vision in him, since it superimposed a comprehensive and distinguishing principle on the haphazard observation of constantly changing phenomena; or, to put it differently,—because it had made unreflecting and amateurish observation of clouds impossible.” Badt, John Constable’s Clouds, 50–51.

[27] As evidence that Constable was aware of Howard’s taxonomy, Badt cites the artist’s December 12, 1830, letter to George Constable that apparently accompanied a “book you lent me long ago.” The letter then discusses “My observations of clouds and skies” and observes, perhaps in reference to the book he has just returned, “Forster’s is the best book; he is far from right, but still has the merit of breaking much ground.” Quoted in Badt, 50. The first chapter of Thomas Forster’s Researches about Atmospheric Phenomena (1813) is an explication of “Mr. Howard’s Theory of the Origin and Modifications of Clouds.” But Badt himself raises doubts about Forster being the source of Constable’s knowledge of the taxonomy, instead speculating that Constable most likely read Luke Howard’s book, The Climate of London, Deduced from Meteorological Observations (London: W. Phillips, 1818), around the time it was published—not long before the painter embarked on his most sustained period of skying in 1821–22. Badt cites no evidence for this other than the book’s appeal to “a wider public” than Howard’s Essay on the Modifications of Clouds, first published in 1803. Badt, 50–61.

[28] Nicholas Robbins is the most recent scholar to engage with Badt’s theory, and Robbins provides an excellent summary of the relevant literature. See his “John Constable, Luke Howard, and the Aesthetics of Climate,” Art Bulletin 103, no. 2 (2021): 50–76, esp. note 7, https://doi.org/10.1080/00043079.2021.1847578.

[29] Young did not have access to Constable’s notations, but Young’s descriptions of the conditions suggested by Constable’s cloud studies generally accorded with the notes made by the artist.

[30] Young recorded his assessments as audio files, ranging in length from ca. 1–4 minutes depending on the complexity of the painting.

[31] Of course, Young’s assessment of Constable’s depiction of specific types of clouds was the first measure of the artist’s accuracy in this vein; the CNN confirmed Young’s analysis.

[32] Our dataset of Constable clouds includes eighty-six individual paintings comprising both cloud studies and studio paintings where at least one-third of the composition is given to a cloudy sky. Of these, twenty-one were created before 1821. We plan to analyze our results further both in relation to chronology and also in relation to studio versus plein-air execution.

[33] We limited our dataset to representations of clouds executed in oil paint for the sake of consistency in medium.

[34] The standard studies of the cultural and epistemological history of photography are Joel Snyder, “Picturing Vision,” Critical Inquiry 6, no. 3 (1980): 499–526; and Geoffrey Batchen, Burning with Desire: The Conception of Photography (Cambridge, MA: MIT Press, 1997), along with Crary’s Techniques of the Observer.

[35] Y. Bar, N. Levy, and L. Wolf, “Classification of Artistic Styles Using Binarized Features Derived from a Deep Neural Network,” in European Conference on Computer Vision (Cham, Switzerland: Springer, 2014): 71–84, https://link.springer.com/; W.-T. Chu and Y.-L. Wu, “Image Style Classification Based on Learnt Deep Correlation Features,” IEEE Transactions on Multimedia 20, no. 9 (2018): 2491–502, https://ieeexplore.ieee.org/.

[36] Zhang’s account of the results as well as precise (including mathematical) descriptions of the programming of the CNN, her analysis of the results, and the steps she took to confirm the results can be found in Zhuomin Zhang, Elizabeth C. Mansfield, Jia Li, John Russell, George S. Young, Catherine Adams, and James Z. Wang, “A Machine Learning Paradigm for Studying Pictorial Realism: Are Constable's Clouds More Real than His Contemporaries?,” arXiv preprint arXiv:2202.09348 (2022), https://doi.org/10.48550/arXiv.2202.09348.

[37] I refer here of course to Michael Baxandall’s concept of the “period eye,” introduced in Painting and Experience in Fifteenth-Century Italy: A Primer in the Social History of Pictorial Style (Oxford: Clarendon Press, 1972).