Back from the 2024 AECT International Convention! I presented four times and examined multiple issues, mostly related to generative artificial intelligence.
However, this post is not directly related to artificial intelligence.
Why Does Digital Preservation Matter in the Context of AI?
Why did I post it on this blog? My Archives and AI series discussed ideas related to digital preservation, and AI may be a tool that is used in this in the future.
Another reason for posting this on the CollaborAItion blog is that preservation of data is often performed for the purpose of its re-use, including re-use in generative artificial intelligence tools. If digital preservation is a myth, then the viability of long-term future use of that data is a risky proposition.
The third reason relates to why I made the Archives and AI series of posts in the first place. If digital preservation does not exist, then neither does the authenticity or integrity of digital data (because those are largely why we try to preserve digital objects). What is real does not matter. Whether or not something is artificial does not matter. Digital preservation does not only have to do with historical materials. It has to do with trusting what we see on our devices. More than that, since our digital and physical worlds are so intricately intertwined, it has to do with trusting anything.
I am focused on archival digital preservation, but keep in mind that you can preserve things that were created just seconds ago. I am, for example, going to preserve this webpage after I make this blog post. If digital preservation does not exist, then anything digital cannot be trusted.
The Impetus
The above image is my reaction to the meme that was posted in the Archivists Think Tank Facebook group about a month ago. It was submitted by a member of that group who is a well-known and respected member of the archivist community. I was quite taken back by his post, as I did not think that he was so antagonistic toward digital archivists.
However, he came out swinging with the “Crowder-table” meme and the text, “Digital preservation is a myth. Change my mind.” He captioned this, “There, I said it!”
My Response
The below was my comment response to this meme, and I will include snippets of his response and our resulting conversation in an effort to begin thoughtful consideration of the nature, reality, and best practices of digital preservation. It may be helpful to review my series on archives and AI:
“As a Digital Curation expert, Archives Manager, and Digital Librarian, this is untrue in so many ways.
I'm assuming you're conflating ‘digital preservation,’ ‘digitization,’ and born-digital and analog records.
As long as significant properties are maintained, digital preservation works.
BUT, that doesn't mean that you should give up physical preservation.”
My correspondent replied that digital preservation did not truly exist because archivists were inherently required to perform “endless migrations,” prevent and recover from “potential data loss” (I’ll revise that, even. There is inevitable data loss.), a potential lack of “interoperability” (files or programs may require certain software or hardware to work and may not be retrievable on alternate devices or programs), and obsolescence of the tools required to provide access to the materials and resources.
Well, as I told him, that’s the job. We are stewards of information and data. As I stated in the Archives and AI series, as long as we are preserving the significant properties (at a minimum, not a maximum), authentic and valuable digital preservation of a material is absolutely possible.
The Surrounding Discussion
Some archivists agreed with my point of view, and some agreed with the original poster’s. One individual put my point of view in other words, bringing up the difference between preserving the original object (which digital preservation does not do if the original is physical) and preserving a copy of the object.
Another commenter noted the viability of “migration,” or “format migration.” This is a term for preserving a copy of a digital object by converting it to a non-propriety or more stable or viable format. There are multiple reasons for migration, and no format of a particular media is the optimum format for all cases. For example, you could convert an image to a TIFF file if you want the most fidelity to the original image. Or, you could convert the same image to a JPEG if what you wanted was a reasonable facsimile of the image with minimal storage space.
This account would not be complete if I did not include some more oppositional points of view. One commenter compared digital preservation to previous “fads” of the archival world, such as photocopying materials or committing them to microfilm. They implied that within a few decades (which is how long both of those methods lasted), digitization will not be used.
In two ways, I agree with this point of view. First of all, as I stated earlier, digitization is not the same as digital preservation. The first is the process of creating a digital copy of an originally physical object. The second is the act of taking preventative measures against the erosion and degradation of digital objects, whether they originated digitally (born-digital) or physically (and digitized).
I also agree in another way. An increasing number of records and data are created digitally. The only way that they are physical is if someone prints them out. I hardly think that this is a preservation solution. How many trees would die because we were unwilling to enact digital preservation measures?
What is the Argument Against Digital Preservation?
The main argument against the reality of digital preservation is that “no efforts or tools used in digital preservation are absolutely perfect, so we might as well stop trying.” On that note, no physical preservation is perfect either, so should we just throw all of our physical collections away and delete all of our files?
What Is the Argument For Digital Preservation?
The essential parts of documents are the information and data contained in them, not their format. As long as the data is preserved in a nonproprietary and preservable format, future researchers will be firmly connected to the knowledge, priorities, and social functions of actors in the past.
According to ISO 15489-1, all records, including electronic records, have four major aspects: authenticity, reliability, integrity, and usability. Digital preservation involves attempting to maintain all four of these aspects in perpetuity. These factors were explored in my first Archives and AI post, and I will repeat them.
Authenticity - An authentic record is one that can be proven to be what it professes to be, to have been created or sent by the person claiming to have created or sent it, and to have been created or sent at that time.
Integrity - A document with integrity has never been corrupted either through time, transportation, or migration. The ultimate test of integrity is the verifiability of checksums and the ability of a qualified system to open the file.
Reliability - A reliable record is one whose contents can be trusted as a full and accurate representation of the transactions, activities, or facts to which they attest.
Usability - A usable record is able to communicate all of its information to its users. Any reasonable aspect or significant property of the document can be accessed by a typical user.
What those who claim that digital preservation is a myth forget is that nothing is perfect. As migration occurs and as files and tools become obsolete, we are here to ensure that there are alternate methods to recover the information and data in the files. We can mimic the four factors above if we cannot retain them (and we can record changes in the metadata). There must be a balance between all of the four aspects. It is of no use to any archivist or researcher if a record is in pristine position but it cannot be accessed because that pristine condition necessitates being stored in a proprietary format. Migrating may mean losing data, and it will mean losing at least a little authenticity. However, the important part is always the information, explicit or contextual, contained in the items. And those are the aspects which generative AI tools, incidentally, can alter and mimic the most.
References
Mark Myers, “Basics of Managing Digital Records,” presentation, Society of American Archivists, September 20, 2017. Access May 16, 2023. Basics of Managing Digital Records.pdf
Andrew Wilson, “Significant Properties of Digital Objects,” presentation, Digital Preservation Coalition, April 7, 2008. Access May 17, 2023. https://www.dpconline.org/docs/miscellaneous/events/142-presentation-wilson/file.
Christina Zamon, The Lone Arranger: Succeeding in a Small Repository (Chicago, Illinois: ALA Editions, 2013).
Samuel Muller et al., Manual for the Arrangement and Description of Archives: Drawn up by Direction of the Netherlands Association of Archivists, trans. Arthur A. Leavitt. (New York: H.W. Wilson, 1940), https://babel.hathitrust.org/cgi/pt?id=mdp.39015005389211&view=1up&seq=9.
Peter B. Hirtle et al., Copyright and Cultural Institutions: Guidelines for Digitization for U.S. Libraries, Archives and Museums (Ithaca, New York: Cornell University Library, 2009). https://ecommons.cornell.edu/handle/1813/14142.
Emily Pfotenhauer, Vicki Tobias, and Kristen Whitson, eds. 2022. Digital Readiness Toolkit. Madison, Wisconsin: Wisconsin Library System. https://recollectionwisconsin.org/wp-content/uploads/2022/06/Digital-Readiness-Toolkit-June-2022.pdf.