Skip to Main Content

College of University Libraries and Learning Sciences News

Web Archiving: It Takes a Library

by Amy Winter on 2020-06-11T11:23:00-06:00 in Information Science, Library | 0 Comments

Despite my background in records management and freelance web design, as well as working with multiple digital platforms all day, when I first registered for "Introduction to Web Archiving" via the University of Wisconsin-Madison, the subject was just barely on my radar.   I didn't know if or when the UL would have the resources to sufficiently archive the UL's online offerings, let alone even a part of UNM's entire web presence.

Click the image to zoom

A mind map about web archiving and information science

The course provided a great overview of the web archiving process.  We learned how tools and services like archive-it, WebRecorder, and PermaCC create lasting versions of web resources; we read about developers working on innovations like browser add-ons that let users access archived versions of sites while viewing the live version.  We discussed creating collections policies and writing good metadata so users can easily explore saved content.

However, now that the class has ended, the way many skills from different library specialties converge into the web archiving big picture continues to fascinate me the most.

Like physical archivists and collections managers, web archivists have to ask, what is worth saving and collecting?  And, what boundaries and limitations must be considered when crawling web sites? 


(Did you know Twitter alone creates 12 terabytes of data per day?)


Like information managers and data curators, web archivists need to understand the available tools for capturing web content, and which ones work best for different purposes.  They also need to arrange and describe stored data so researchers can easily access and use them in the future.


(Did you know users have preserved 1,685,000 links and counting?)


The web archiving process can inform data literacy instruction by helping users understand the importance of stewarding their own web content, and how to participate in archiving their own and others' data.


(Did you know anyone can enter a URL and have the Wayback Machine capture and preserve it?)


Incomplete crawls of web pages can help web developers understand what technologies make web resources hard to archive, and advocate for platforms that are easier to preserve -- which could have other advantages as well.


(Did you know Javascript often creates holes in archival copies of web sites?)


In the interests of inspiration, I offer this real-life example demonstrating the potential future importance of web archiving.  A while ago, a patron contacted us.  She was applying for a government position, and needed to provide the official description from a course she took as part of her master's program -- in 1990.  Naturally, the historical print course catalogs are stored in the UNM Archives, and have even been scanned and made available online.   So this was a pretty easy request to fulfill.  I couldn't help but wonder, though; with the course catalog and schedule having transitioned to fully digital, how will we help the student who needs this information twenty or thirty years from today?  Will we help her by creating and storing digital versions of the course catalog right now?  Will we educate her about the need for and process of creating those records for herself?

If, as a library, we don't collaborate in considering these issues and taking the necessary steps now, we might not have what we need to help her later.

Further reading

"How an Archive of the Internet Could Change History" - The New York Times Magazine

"Back to the Cave:  Communicating the Importance of Web Archiving" - Digital Preservation Coalition

"On the Importance of Web Archiving" - Items:  Insights from the Social Sciences

 Add a Comment



Enter your e-mail address to receive notifications of new posts by e-mail.


  Return to Blog
This post is closed for further discussion.