Part 2 of "Archives and AI"
CICR-ICRC-PublicArchives HQ-Geneva RomanDeckert09062020.jpg was taken by RomanDeckert and shared under a CC BY SA 4.0 International License.
Acknowledgment
I would like to thank the Facebook Archivists Think Tank for jumpstarting many of my ideas in this post. I prompted a discussion regarding AI in archives, and the general consensus was that AI automation, even for OCR and item transcription, is not feasible, even as a first draft. Therefore, I am not going to talk about any AI tools as the definitive answer to problems. Instead, I talk about potential workflows and processes. It may be that AI tool are more useful on the patron discovery side of archives than anything else.
Recapitulation
In the previous post, we talked about the importance of archivism in the age of generative artificial intelligence. The best practices of archivism are meant to assist information professionals in selecting, organizing, recording, preserving, providing access to, and (to some, not to others) interpreting historical records and artifacts. As such, it is a bastion against many physical, cultural, political, and digital threats, including those posed by artificial intelligence. I discussed how proper archivism has the potential to prevent revisionist and erasure threats posed by deliberate and unintentional bad actors.
In this article, I will discuss how a best-practices-informed archivist can harness the technological features of generative artificial intelligence. Archivists can use generative AI tools to enhance accessioning and processing decisions, record creation, interpretation, and other aspects of archives management.
Why Generative AI Is Useful in Archives
The purpose of generative AI tools is to generate output. In the context of archives, that means communicating, representing, and exploring ideas with users.
Large Language Models generate text in response to prompts that users give them. These responses are conditioned according to the context given in the prompt, the objectives in the prompt, and any other information and data contained in documents.
All of the archives functions listed below are related in some way to communication of data, information, and knowledge. As an archivist fulfills these functions according to their institutional policies, and according to the ideas suggested in Part 1 of “Archives and AI,” they will be able to use AI without being ensnared by any of its weaknesses. They can control the collaboration to ensure that the eventual product is high-quality.
Each of the processes listed below are communication-based, whether internal communication through accession and processing decisions or external communication through metadata records and exhibits.
While I was looking at other sources for inspiration and other use cases, I found the National Archives and Records Administration’s priorities and potential use cases for artificial intelligence. Many of these use cases involve the same ideas I had when thinking about this post, so it’s nice to know that I am on the right track!
At the moment, I have not created any custom AI tools, but I probably should make a tool for each of these processes.
Accessioning
As archives can receive hosts of donations per year, there must be some way to narrow the selection process. After all, not all materials are related to the mission and vision of an archives. Every archives has its own scope, limits to what it should preserve and provide access to. Artificial intelligence can assist the archivist in assessing metadata and context to make suggestions regarding acquisitions.
The National Archives of the United Kingdom recently published a research paper on Using AI for Digital Selection in Government. They found that “while AI cannot replace the expertise of Records Managers [or archivists],… AI tools and pipelines [or workflows] can be successfully applied to aid the task of records selection in semi-structured and unstructured collections.”
This report follows the idea that AI-enabled automation is not feasible or helpful, and can actually be harmful. Collaboration with AI tools, heavily moderated by humans, is the most feasible, effective, and useful way to integrate AI with archives processes.
So, how do we use AI when accessioning records?
As in any interaction with an AI tool, I do a fair share of work before ever prompting.
I put the qualities and relevant metadata of a document into a prompt for an AI tool. Even for digital documents, I do this. I would not put an archival document directly in an AI tool for privacy and confidentiality reasons.
Then, I ask the AI tool to examine the aspects discussed in the last article:
Authenticity
Integrity
Reliability
Usability
Once that is done, as I said in the last post, I examine the value of the information, the cost of retaining the material, and the implications of selecting or discarding the material on the archives and the collection as a whole. I also analyze the uniqueness of the material.
The AI model serves as an excellent sounding board when considering all aspects of these documents, individually and as a whole. While I would caution against using the reply of an AI tool as the only reason for accessioning or deaccessioning, the interaction can help you come to your own ideas. Again, you have to do the work before, during, and after this process.
Processing
The most time-consuming part of processing a collection is sorting the materials into series, or categories, that exist according to the functions or processes they facilitated when they were in use. If the function is not clear, they can be sorted by format. The materials in these series are sorted chronologically, and the series are organized according to a certain order (sometimes determined by the institution but other times according to the archivist’s preference).
In a similar way to making decisions about accessioning collections and materials, archivists can benefit from the (more) objective viewpoint of an AI tool. Ask the tool about which functions a specific document fulfilled. Enquire about the relationships of series to each other. Ask about best practice when ordering series.
Note Creation (Scope/Content and Biography)
This one is my personal favorite in which to collaborate with AI tools. However, it can also present a breach of privacy or confidentiality. You must be careful. The best way to do this is manually.
Retain all of the contextual knowledge you learned from the archival materials about subjects, events, functions, individuals, and institutions. Take note of the time range. Ask the AI tool to put all of these aspects into relevant notes. Also ask about the most notable and pervasive topics in the collection. This will be helpful in the next phase of archivism.
If you do need to use external sources, and if you would like help using search engines, I built a search query optimizer that can help you search complicated topics.
Metadata Record Creation
Now we come to what, for many people, is their least favorite aspect of archives work: creating an XML record.
AI is most helpful in this aspect when selecting subject matter and name authority records. I have tried to use it to navigate the Library of Congress system for me, but it cannot. However, I have built an AI tool to help the user navigate the system.
As far as generating an entire XML record with all of the best practices elements of an archival record, whenever I tried to do that, it failed miserably. Also, it did not fail persistently. That is to say, it did not make the same mistakes every time. If you would like to try to use it to create them, then I welcome your experience and advice. I prefer to make them manually, but fill the fields with AI-generated content (if AI use is even necessary in the first place).
Collections Interpretation and Exhibition
After you have accessioned, processed, recorded, and stored the archives collections, the next step is to interpret (to a limited extent) and exhibit the collection materials. Many of the workflows from the “Human-Centered Considerations” can be used when creating exhibit placards, images, flyers, advertisements, and color schemes. Use Ideogram for images, ChatGPT for text, Udio for music, Text-to-Speech.Online for spoken audio, and Kapwing or Capcut for video.
Human-Centered Considerations When Creating and Using AI Tools
The quote in the subtitle is one of my favorites from the father of modern American archivism. Archivism is the recording, organization, and stewardship of historical and anthropological records. Schellenberg was concerned that archivists were using computers to replace critical thinking skills and proper procedures. He wrote a manual to distill the mos…
Before, during, and after the process of creating these materials, you should keep the human-centered concepts in mind. Make sure that your exhibits are truly reflective of the most accurate historical narrative. Also, make sure that you are being sensitive to the community and stakeholders’ needs. You can use ChatGPT to analyze the effectiveness and ethical aspects of your exhibits and interpretations.
Patron Services
I was only going to write this post about archivist uses for materials, but part of archivism is providing opportunities for patrons to interact with archives materials. It just so happens that Sara Brumfield at “From the Page,” a transcription service for archives materials, wrote a blog post regarding “10 Ways AI Will Change Archives.” While this title seems a little clickbait-like, it does offer some ways that AI can change patron interactions with archives.
I would like to focus on two aspects of Brumfield’s list: “Entities” and “Discovery.”
Entities
“Entities” are similar to the subject heading and name authority records I wrote about earlier. Patrons, or archivists, can use AI to discuss the individuals, groups, events, and other important subjects in materials.
Discovery
If an AI tool is trained on all materials in an archives collection (or archives as a whole), patrons can also use AI’s capabilities to find one or multiple materials that discuss an entity. This can help improve researchers’ understanding of the relationship between these materials. This is similar to the Book Finder AI tool I created.
Discovery is enhanced by image analysis and optical character recognition (OCR). These can aid in transcription and caption writing, although the research performed by the National Archives of the United Kingdom says that AI is not adept enough.
With the image, textual, and data analysis abilities of AI tools, they could help promote “conversational” discovery. This is similar to the function of the “Talk to a Book” AI tool that Steve Hargadon and I created.
AI Can Be Useful and Harmful… You Are Responsible!
In the first post of this series I talked mostly about the issues that AI posted to historical authenticity and correct narratives. One might call that post “Archives vs. AI.” In this post, which one might refer to as “Archives with AI.” While we all must be wary of the weaknesses and dangers of overreliance on AI, we should also harness the usefulness of new technologies, including AI, as we create modern archives. Just remember, as T. R. Schellenberg said, “The use of modern gadgetry cannot supplant the use of proper techniques and principles.”
References
The National Archives. (2021, November 11). Using AI for digital selection in Government. https://www.nationalarchives.gov.uk/information-management/manage-information/preserving-digital-records/research-collaboration/using-ai-for-digital-selection-in-government/
Schellenberg, T. R. (2003). Modern Archives Principles & Techniques. Society of American Archivists.