This year's Moore/Sloan Data Science Environment was in the beautiful Cascade Mountains at the Suncadia Resort in Cle Elum, Washington.
RT @uwescience: #DSESummit is off to a sunny start! pic.twitter.com/fQi3EZpSdL
— Vicky Steeves (@VickySteeves) October 5, 2015
There were a number of sessions here that were fairly typical “data science-y:” Data Structures for DS, Astrophysics Software, and Big Data Systems Tutorial. What I thought was perhaps the most interesting at the summit was this pervasive discussion about ethics and social good. I was pleasantly surprised that the participants here were interested in engaging in topics so far outside the normal purview of coding problems, data analysis methods, and data gathering. Another testament to the great multidisciplinary field that is Data Science and the wonderful people who populate it.
I was really inspired by a lightning talk on Monday morning by Ariel Rokem of University of Washington’s eScience Institute on their Data Science for Social Good program, which had its inaugural summer program this past June. Based on the program with the same name at University of Chicago, the goal of the eScience Institute DSSG program is “to enable new insight by bringing together data and domain scientists to work on focused, collaborative projects that are designed to impact public policy for social benefit.”
RT @uwescience: @arokem presenting #DSSG2015 at #DSESummit pic.twitter.com/pTYrxmq7Zv
— Vicky Steeves (@VickySteeves) October 6, 2015
The eScience Institute hosted four projects focusing on urban environments and urban science across topics such as transportation, social justice, and sustainable urban planning. Each project was assigned a mentor from the eScience Institute, and each team was populated by a project lead, DSSG fellows, and Alliances for Learning and Vision for underrepresented Americans (a post-freshman year internship) students. It was all about bringing together the Data Science fellows and faculty with project leads from industry along with undergraduate students.
Taken from the eScience Institute’s DSSG webpage, the four projects were:
I had a total Twilight Zone moment on Tuesday during a session entitled “Semantics of Data: Integrating Across Tools.” I attended because I thought the discussion was surrounding how the data scientists here want to communicate their tools using standard vocabularies.
I was pretty close--however my scope was off. These scientists talked for ONE HOUR AND A HALF on building standard vocabularies, ontologies, metadata schemas, json-schema. I was near-faint from surprise.
I think I hit my head on my way to this talk because scientists are talking about ontologies and linked data...willingly... #DSESummit
— Vicky Steeves (@VickySteeves) October 6, 2015
I love these projects. These students are committed to improving their communities through integrating what they know about all the multidisciplinary fields that make up data science. The real-world applications of their work are just incredible. I think this speaks to almost a moral obligation of science to not only contribute to the greater body of human knowledge, but also to improve the standard of living globally. For more on this, I’d point you to a great article by Alan Fritzler, project manager for the DSSG program at University of Chicago.
I had a total Twilight Zone moment on Tuesday during a session entitled “Semantics of Data: Integrating Across Tools.” I attended because I thought the discussion was surrounding how the data scientists here want to communicate their tools, or possibly create a directory of tools cross-institutionally to track outputs of the MSDSE.
I was pretty wrong. These scientists talked for AN HOUR AND A HALF on building standard vocabularies, ontologies, metadata schemas, using json-schema, and the semantic web (read: linked data). I was near-faint from surprise.
However, the tone of the conversation left me wondering--where else are the overlaps between science needs and library services? We’ve identified in the LIS field that things like infrastructure (institutional repositories, etc.) are resources for research that should be housed in the library, but where are the boots on the ground librarians? These collaborations are tricky, but maybe they are starting to reach that point of critical mass where we just have to get down to it. Where are my science metadata librarians at? I smell a new field...
"my field is changes too much to use a standard vocab" feels like a cop out. Talk to librarians & let us help you make ontologies #DSESummit
— Vicky Steeves (@VickySteeves) October 6, 2015
That tweet being said, I firmly believe that this is something where librarians (those into metadata--here’s looking at you, Peggy) can collaborate with science to build these vocabularies and schemas. The plain fact of the matter is that the everyday researcher is not equipped to build these ontologies, nor do they really want to--and frankly I don’t blame them. Librarians (read: information professionals) have these skills, want to do the work, and LIS is a service industry. Take advantage of us, science!
However, there was a lot of room in the schedule for hilarity. Between David Hogg’s constant delight in our “obedience” in following directions for lunch seating and another great lightning talk Monday morning on improving the quality of the field (see tweet below), the tone of this conference was jovial, scholarly, and just plain fun. I’m excited for NYU to host next years! Here’s hoping we get a place in the Catskills...
Critical Thinking Pyramid *coughcoughI'mCallingBULLSHITcoughcough* #DSESummit pic.twitter.com/ab6TIB0s6H
— Vicky Steeves (@VickySteeves) October 5, 2015