FAQs/Host An Event
1) How did you organize the UCLA data rescue event?
We got in touch with Univ. of Toronto who was in on the first "guerilla archiving" event held in conjunction with concerns over climate data. We also talked to UPenn's Program for Environmental Humanities' #DataRefuge project to find best tech practices, guidelines for planning your own event and a list of data sets people had already worked on and which ones still needed work. (I'd suggest you definitely visit this site and perhaps contact Bethany and Laurie.) We are still in contact with organizers of the project and used many of the resources they developed and that are housed on the #DataRefuge and EDGI websites (EDGI has a really useful toolkit and github for coordinating events).
Here's a webinar on coordinating your own DataRescue.
2) What resources did you need to save the data? Things from wired connections, to memory storage requirements, to servers. We didn't do too much in terms of tech. We made sure that everyone brought their own laptop which sufficed for storage until uploading. We made sure everyone had guest wifi passes though our department so that they could handle the robust downloading and uploading that are necessary. We only had about 40 people doing the data rescue. The others focused on non-tech alternatives, such as creating advocacy toolkits, research proposals to send to the Environmental Data Governance Institute (EDGI) – who is now partnered with #datarefuge – and working with scientists to developed best practices for data management/plans, etc.
3) What techniques are recommended for data scraping?
Here is our github site which outlines best practices we learned from #DataRefuge and other related events. I could get a member of my team to talk with you in more depth if you are interested. I might suggest partnering with one tech person if you don't have one already who has some experience in these things.
4) Is there a repository anywhere to add additional scraped data, besides Internet Archive?
Yes, there are many, many options. We uploaded to #dataRefuge's server which is Amazon Web Services integrated with CKAN – an open source data catalog - that will be available to DataRefuge events for storing and making accessible copies of data.
5) Is there a catalogue anywhere that indicates which data has already been archived, in order to avoid duplication of efforts?
Yes, I think the #DataRefuge (PPEH lab) at Penn has it but its not online. We had to contact them to get it. So, again, I would definitely be in touch with them.
6) Anything else I'm forgetting to ask you about that you think is important?
You've found our website that outlines our rationale, publicity efforts and organization/schedule of the event. We are uploading outcomes and a report soon. Contact us at [email protected] for further questions and inquiries.