Archive-a-thon
The archive-a-thon worked with materials from DataRefuge on nominating federal climate data websites for archiving in the Internet Archive. It was a central focus of the hackathon, with the most attendees self-selecting this workshop.
The beginning portion of the archive-a-thon was dedicated to leading an orientation to the Internet Archive nomination tool. During this time we also gave participants a chance to connect to wifi (each participant was given a guest wifi access code) download the tool, acquire links to datasets, and ask any questions about the nominating process being taught. All participants were added to a Slack group in order to easily share links and communicate effectively. The group focused on nominating websites from a list of Department of Energy (DOE) related datasets, provided through the DataRefuge resources. Following lunch, attendees of this group began nominating websites.
We produced 568 new URLs in the nomination tool spreadsheet, of which a quick check for duplicates suggests 528 are unique. We had roughly 12-16 people in the seeding group, starting roughly 12:45 and ending mostly 3:00 pm, which works out to approximately 4.2 URLs added per minute. We started with the DOE primer that was identified as the highest priority in the DOE set and even started on the 2nd DOE primer on the list.
We additionally had a number of people experiment with scraping, and a couple of people were downloading datasets. One got as far as figuring out bagging. We notified DataRefuge of where we had left off in each of these stages shortly after the event, and anyone interested in picking up this trail can contact them for proper direction on what datasets to focus on moving forward.
The archive-a-thon worked with materials from DataRefuge on nominating federal climate data websites for archiving in the Internet Archive. It was a central focus of the hackathon, with the most attendees self-selecting this workshop.
The beginning portion of the archive-a-thon was dedicated to leading an orientation to the Internet Archive nomination tool. During this time we also gave participants a chance to connect to wifi (each participant was given a guest wifi access code) download the tool, acquire links to datasets, and ask any questions about the nominating process being taught. All participants were added to a Slack group in order to easily share links and communicate effectively. The group focused on nominating websites from a list of Department of Energy (DOE) related datasets, provided through the DataRefuge resources. Following lunch, attendees of this group began nominating websites.
We produced 568 new URLs in the nomination tool spreadsheet, of which a quick check for duplicates suggests 528 are unique. We had roughly 12-16 people in the seeding group, starting roughly 12:45 and ending mostly 3:00 pm, which works out to approximately 4.2 URLs added per minute. We started with the DOE primer that was identified as the highest priority in the DOE set and even started on the 2nd DOE primer on the list.
We additionally had a number of people experiment with scraping, and a couple of people were downloading datasets. One got as far as figuring out bagging. We notified DataRefuge of where we had left off in each of these stages shortly after the event, and anyone interested in picking up this trail can contact them for proper direction on what datasets to focus on moving forward.