Discussion

The google doc for our notes.

The short presentations and discussions sessions will focus on these questions (collaboratively developed via our mailing list.

In what respects do data and software serve common functions in science?
In what respects do data and software serve competing functions in science?
In studying software practices, how should the role of data practices be considered?
In studying data practices, how should the role of software practices be considered?
What actions and practices on the part of data collectors, curators, and subsequent users, lead to successful data sharing?
What actions and practices on the part of data developers, repository owners, and subsequent users, lead to successful software sharing?
How can the usefulness of data be sustained over time? How can we anticipate its actual utility in order to determine how much to invest and what actions to take to support sustainability? (Same question for software.)
To what extent can lessons from data sustainability and sharing inform practices surrounding software, and vice versa?
Pushing our discussion closer to the sciences, objects of study and phenomena: Data as about something, data as from somewhere, data as for something (think: sites of collection, instruments, and data as standing in for some/manythings). Scientific software too is often about something, and it does something (think: modelling). In some cases, software is not intended to be about something (‘domain specific’) but a general analytic tool, which is interesting too.
Pushing our discussion to consider the bigger picture by placing them in their institutional setting/making activities: The drive to archive, share and interoperate data, and the drive to systematize software, its production and accountability are both increasingly policy mandated. And we should consider the institutional actors, such as Sloan and NSF, who are shaping and funding a vision for data science, but also industries’ push/pull to reshape university curricula and educate a new workforce (something that universities are responding to with gusto, even if differently at each one).
Lastly, thinking temporally:Neither software nor data are new, they have long heterogeneous lineages for their production, care, sharing and use, some of these long lineages have lock in or legacy effects; even new software and data are ‘born’ into these legacies. Thinking forward, rather than historically: many contemporary architectures will have legacy or lock in effects in the future too, some institutional actors are thinking specifically about that and enacting long term visions for the future of data, software, etc.

Some answers or factors that may be considered in the above questions:

How do notions of non-proprietary software and non-proprietary data, and proprietary software and proprietary data intersect (or not)
What is the role of data and software in relation to publications?
What workforce factors should be considered in data and software intersections?
Who are the players in scholarly credit, and are they the right players?
Software citation practices being promoted by non-librarians who are not familiar with the functions of citation in scholarship Similar to data citation, many years now of people thinking they can set a simple or single standard that all will adopt
Data as complex objects that require linking or integration with software, documentation, calibration etc; cannot describe alone
Data and software both are based on models, sets of assumptions and goals that are often left implicit
Invisible work arounds
Code has many components, as does data. When do you cite the whole archive and when do you cite the cell? Similarly, when do you cite the one tool and when the whole package? How are people making these decisions?

Workshop on field studies of data and software work in science