Data, Science, & Librarians,
Oh My!

My thoughts as I navigate the world of data librarianship.

Getting Use Cases is Hard

One of my big tasks coming into NYU last August was to work on the ReproZip

Rémi Rampin, the the current developer of ReproZip, and Fernando Chirigati, the former developer, made this great GitHub repository called ReproZip Examples, dedicated to showcasing examples and use cases from different domains using ReproZip.

In May, Rémi and I will be at the Data and Software and Preservation for Open Science workshop, Container Strategies for Data Software Preservation that Promote Open Science. I'm serving as an external organizer, but the two of us will be doing some extensive work with ReproZip while there. NSF funded, DASPOS is committed to

Image from the DASPOS website.

The DASPOS project "represents a collective effort to explore the realization of a viable data, software, and computation preservation architecture for High Energy Physics (HEP)." We use the term preservation to mean “Ensuring the continued usability of the data and software necessary to conduct science.” Preservation has many elements, including a physical archival system for storing data, an organization and policy for deciding what to store and for how long, and technical means for organizing and representing data and software so that they remain usable. This project will focus primarily on the latter two elements by building a community understanding of the organizational needs and by building software prototypes to address the most critical technical problems.

In addition to a talk/demo during the conference preceedings, we are leading three breakout sessions that will allow people to try out ReproZip for themselves, using their research if they brought some. I'm hoping that, with the new repro ReproZip-Examples, we can get some people at the DASPOS workshop to add their own .rpz packages for us to try and reproduce! This would be the best-case scenario, but it depends a lot on the research of the participants.