Archivism in the Context of AI
Preventing Distortion of the Past Through Analog and Digital Stewardship
Part 1 of "Archives and AI"
CICR-ICRC-PublicArchives HQ-Geneva RomanDeckert09062020.jpg was taken by RomanDeckert and shared under a CC BY SA 4.0 International License.
Why Archives and AI?
This series of posts was inspired by several occurrences in the past week. Firstly, one of my colleagues recommended me as a speaker for an excellent podcast, Shifting Schools, hosted by Tricia Friedman and Jeff Utecht. Both facilitators are educators, and so I was expecting that they wanted me to focus on the typical “teachers using AI to create and grade assessments” topic.
However, their first and strongest request was that I focus on “how the role of the archivist might be evolving due to AI” and student research incorporating AI tools.
The other impetus was a post by
on LinkedIn. A week ago, he discussed the “future of looking back.” Google’s AI Overviews had a well-documented appalling start. It gave misleading, harmful, and in some cases illegal advice. While there are some improvements to the current iteration of Overviews, they are not perfect. I tell everyone who asks not to trust summaries created by any AI tool, Google or otherwise.Comparable issues could occur in regard to history, Furze worries. Overreliance on AI, which seems to be a potential future norm, could result in a distortion of public narratives about past events. We could quite literally believe in a false version of the past because a network of computers is telling us a certain narrative, while historical records tell quite a different story.
“How,” Furze questions? “can we prevent the technologies of the near future from steamrolling over the past?”
Archives, and the archivist mindset, are the answer I propose to any fears about historical revisionism. All types of false narratives, whether AI-generated or not, can be refuted by the correct interpretation of authentic artifacts interpreted in an informed and articulate manner.
I have written in several posts, most prominently in “The Danger of Complete AI Automation,” about research and information literacy with AI tools. Additionally, my work in other contexts discusses AI-enhanced research strategies.
I have never, however, written on the topic of AI and Archives before. The intersection of these facets of my work has always been something that intrigued me, but I never thought to write about it and really flesh out my perspective on the subject. This series of posts, at least 2 in number, will give me the opportunity to do so.
What are Archives?
Archives are repositories for collections of historical materials. These collections include both records (text, images, sound recordings, audiovisual materials), or “manuscripts” (“written [created] by hand [human]”), and artifacts, which are physical objects (“created through art [or skill]”).
Archival materials are preserved because of the enduring value of the information they contain or as evidence of the functions and responsibilities of their creator. In my archives, I focus on the value of the information, the cost of retaining the material, and the implications of selecting or discarding the material on the archives and the collection as a whole. I also analyze the uniqueness of the material.
What is Archivism?
Archivism is the selection, curation, arrangement, management, preservation, and presentation of archives collections according to:
field best practices,
organizational procedures,
internal procedures, and
expectations of various stakeholders.
Generally speaking, if an archivist is following best practices, which are determined by the Society of American Archivists in the United States, they are fulfilling their duty as an archivist and should meet most expectations of stakeholders.
Archivists are typically involved in helping to create general organizational procedures (if their archives is part of an educational institution or organization) and internal procedures for their archives (as the manager). Related functions are performed for private institutions by “records managers,” and they are distinct from archivists in some ways, but we will just refer to archivists here.
An archivist is responsible for maintaining the integrity of the individual records and the collections in which they are kept. They are also responsible for communicating to the public about these records’ existence. Furthermore, they are expected to provide reasonable access to archival records for research purposes. Archivists can also arrange and present exhibits with archival records on any number of topics. In a way, the archivist serves as the mediator between the record and the researcher (whether an academic writer or a casually interested member of the public).
All of these functions are done according to established procedures, many of which are communicated to the public. The public can also interact with archives records according to strict policies, which they agree to through a signed document (usually, although this practice has been challenged on several ethical bases in recent years).
While archivists are responsible for mediating interactions between researchers and historical artifacts, they stay as far away from interpreting and appraising those materials. These last two activities would involve assigning historical or financial value to a record, which is something an archivist never does.
How Do Archivists Influence Historical Interpretation?
There are two ways that archivists, while not meant to be the primary interpreter, can influence the interpretation of historical materials found in their archives. An archivist can create exhibits on variety of topics, individuals, and institutions. They also create interpretive records of a limited scope when they create archival records for their collections.
Exhibits
An obvious exception regarding interpreting is when the archivist prepares exhibits. Even this activity, however, is tempered by having peers or historians review the exhibits’ interpretations for historical validity or acknowledgement of nuance. Sometimes, archivists have historians create all interpretive materials so they are not responsible for them.
Archives exhibits can focus on a variety of topics, depending on important issues to the local community or a subject that is covered in multiple archival materials. Exhibits can center on the contents of one collection. Or, it can cover the same subject or event using materials from multiple collections. Or, it can cover the history of an important community institution using materials from multiple collections and other materials on loan (or donated) from external sources.
Through proper historical interpretation of these materials, provided by the archivists, historians, and other stakeholders, archivists communicate the complex truths of many topics as illustrated by the contents of their collections. Ideally, an archivist only uses archival materials so that the interpretations will have internal verification. An interpretation of archival materials ideally is corroborated in the following order:
by the material itself,
by other materials found in the same collection, and
by other materials found in the same archives.
Scope and Content, Biographical/Administrative History Notes
The other way that archivists affect the public’s knowledge of history and the complex truth is through the creation of contextual notes in the metadata records of their collections. There are two types of “Notes” in every record: Scope and Content Notes and Biographical (human collection) or Administrative History (institution collection) Notes.
These notes should not be too long. Some archives limit these notes to a paragraph. Others let the Biographical Note go to two or three paragraphs.
The main requirements are that the Scope and Content Note should cover the provenance of the documents, the internal relationship between its contents, their physical characteristics, and the information they hold. The Biographical Note should only have enough information to supply context to a researcher’s understanding of the contents of a collection. Ideally, the Biographical Note should only contain information that is explicitly stated in the documents, or that can be inferred from the collection as a whole. External sources should be explicitly stated, and those sources should be interpreted according to the material in the archives, not the other way around.
In all aspects of an archivist’s work, from selecting to arranging to describing to exhibiting, their purpose is to ensure vital historical resources for others to interpret and use as sources for their historical arguments. Archival materials can also be useful to resolve other legal, cultural, and social problems. In this way, archivists serve to preserve not a particular historical narrative, but the sources from which narratives are derived.
What Potential Does AI Have to Alter Historical Narratives?
As Leon Furze said in his LinkedIn post, AI (like other technologies) has potential uses that can distort technology, whether or not the user intends to do so. This could be a result of deliberate programming on the part of the AI creator who wishes to support a particular historical narrative. It could be because of faults in the prompts or resources given by the user to the AI tool. It could be because of erroneous information on a website on which the AI tool has crawled.
Lastly, the AI could “hallucinate,” or fabricate, information. Even if the information or data is valid, the AI tool will most likely give the uninformed and amateur historian a generalized, high-level, or inaccurate interpretation of historical information. These potentialities are all the more reason for the user to “bring the human” to human-AI interactions.
I had an alarming experience about a year ago, when I heard about the “English Channel” test that users were giving AI tools. As a historian and archivist myself, I was examining ChatGPT and decided to test it out myself.
I asked ChatGPT, “What is the world record for crossing the English Channel on foot?” ChatGPT responded with a fictitious record holder, date, time, and a brief anecdote about the Channel being very windy that day, which made the walking very difficult. I could not believe this, so I tried it also with Claude and again with another ChatGPT conversation. All three conversations resulted in hallucinations. In every case, I corrected the AI tool, and in every case the AI tool repeated back the correction and apologized for making so egregious a mistake.
The next day, they did it again.
I realize that this is anecdotal evidence. However, there are a multitude of examples of AI fabrications. If any subject can be distorted through hallucinations, history is as much a victim as any other.
One of the commenters on Furze’s post was surprisingly incredulous that AI could ever have a negative impact on historical narratives. They seemed to believe that because historical recording was not a main focus of AI tools, they would never have anything to do with history. “No one suggested” the malicious or ineffective use of AI in history to support false narratives, they argued. So any fear or preparation against this hazard was unfounded.
Theories, and evidence, support the fears of AI-enabled manipulation of historical facts. In addition to Leon Furze and myself, UNESCO seems to agree with the potential distortion hazards regarding historical records. Some are referring to this as “artificial history.”
Also, I never even mentioned deepfakes, which are notoriously easy to make (and they are seeming more authentic every day). Bad actors do not care about the intentions of the designer, or about best practices. If they can do something to support their own goals at the expense of others, they will.
Archival Best Practices in the Age of AI
Digital archival materials contain “analog” materials, those that were originally physical, and “born-digital” materials. All digital materials can be altered through multiple technologies, including AI. Born-digital materials have the added hazard of being inauthentic artificial intelligence creations. This reality presents new hazards. Fortunately, through application of existing archival best practices we can navigate all of these hazards.
Archivism of digital materials relies upon the analysis of “significant properties” of the document, as well as the four qualities of all records, whether analog or digital:
Authenticity - An authentic record is one that can be proven to be what it professes to be, to have been created or sent by the person claiming to have created or sent it, and to have been created or sent at that time.
Integrity - A document with integrity has never been corrupted either through time, transportation, or migration. The ultimate test of integrity is the verifiability of checksums and the ability of a qualified system to open the file.
Reliability - A reliable record is one whose contents can be trusted as a full and accurate representation of the transactions, activities, or facts to which they attest.
Usability - A usable record is able to communicate all of its information to its users. Any reasonable aspect or significant property of the document can be accessed by a typical user.
The significant properties of digital records are their:
Content: text, image, slides, code, etc.
Context: Wh Questions and symbols: signs, metadata, creator, etc.
Appearance: font, size, color, layout, etc.
Structure: pagination, embedded files, headings, etc.
Behavior: hypertext links, embedded feeds, dynamic calculations, etc.
Artificial intelligence materials will probably be able to mimic the significant properties of authentic materials in the future. These guidelines, however, will help archivists to sift through fabricated and corrupted materials.
How Can AI Enhance Archivism?
This section will be shorter than others, because I am going to cover this in the second post in this series.
Generative AI can have a remarkable impact on archival processes, and responsible archivists should be able to understand which tasks can utilize artificial intelligence and which tasks should be performed only by human effort. This concept, called “AI Feasibility,” is covered in an earlier blog post.
AI Feasibility (and How Open AI Tools and Workflows Affect It)
NOTE: Anyone who knows me will know that I have absolutely no filter when it comes to writing or saying what I think. I write things that are in their embryonic stages, and these might not be perfect representations of my ideas. I do not mean to offend anyone, and these posts are not meant to cast judgment. I am just thinking through typing. If I critiq…
As artificial intelligence can be a great hazard to accurate historical narratives, there are also possibilities to use it when conducting archival work and relying on archival materials to glean historical knowledge.
References
Boles, F., & Young, J. (1985). Exploring the black box: The appraisal of university administrative records. The American Archivist, 48(2), 121–140. https://doi.org/10.17723/aarc.48.2.1414g624328868vw
Eisikovits, N. (2024, June 5). The slippery slope of using AI and deepfakes to bring history to life. The Conversation. https://theconversation.com/the-slippery-slope-of-using-ai-and-deepfakes-to-bring-history-to-life-166464.
Forman, C., & Neikrie, J. (2024, May 15). The rise of artificial history. Tech Policy Press. https://www.techpolicy.press/the-rise-of-artificial-history/
ISO 15489-1:2016. ISO. (2021, November 9). https://www.iso.org/standard/62542.html.
Library 2.0. (2024). Chatgpt & Ai Bootcamp. https://www.library20.com/chatgpt-ai-bootcamp.
Milmo, D., & Hern, A. (2024, April 8). “inceptionism” and Balenciaga Popes: A brief history of deepfakes. The Guardian. https://www.theguardian.com/technology/2024/apr/08/inceptionism-and-balenciaga-popes-a-brief-history-of-deepfakes
O’Toole, J. (1994). On the idea of uniqueness. The American Archivist, 57(4), 632–658. https://doi.org/10.17723/aarc.57.4.6l8x444kn3966v00.
UNESCO. (2024, July 5). New UNESCO report warns that Generative AI threatens Holocaust memory. UNESCO.org. https://www.unesco.org/en/articles/new-unesco-report-warns-generative-ai-threatens-holocaust-memory.
Wilson, A. (2008, April 7). Significant properties of digital objects. DPC Online. https://www.dpconline.org/docs/miscellaneous/events/142-presentation-wilson/file.