In the context of these Principles and Guidelines, “research data” are
defined as factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings. A research data set constitutes a systematic, partial representation of the subject being investigated.
This term does not cover the following: laboratory notebooks, preliminary analyses, and drafts of scientific papers, plans for future research, peer reviews, or personal communications with colleagues or physical objects (e.g. laboratory samples, strains of bacteria and test animals such as mice). Access to all of these products or outcomes of research is governed by different considerations than those dealt with here.
Physical stuff is excluded, "factual stuff" is included-->of course, factual is subjective
MT: the "commonly accepted" standards part of the definition conflicts with the exclusions
most of these policies talk about standards within your community, but ignores the fact that people are members of multiple communities with various different standards
the exclusions in the OECD document may have been the result of a negotiation to get the document implemented to get people to have at least some bit of consensus
they had 40 years of practice publishing this type of data, so they refer to a pool of data with which they are accustomed. but since the policy is intended to apply broadly, it comes up short.
highly contentious terms: validate, replicate, reproduce, verification
how they are interpreted in other fields
bt getting the same conditions to reproduce research is extremely difficult; and even if you can do these, journals won't accept it because it's not novel
ROI is contemporary language of monetization of data, which wasn't considered in the creation of the OECD policy guidelines
library aphorism about discovery: "30 hours in the lab will save you one hour in the library"
if we documented better so ppl could find them, you'd have better ROI
1960s policy documents and pouring $ into info retrieval systems-->the same argument for better discovery so we can afford to duplicate the effort
the fact that in most countries facts cannot be copyrighted, but the database configuration may be copyrighted
framing the idea of data in the knowledge commons: the packaging matters as much in the value as what's in it
Zeran case?-->phone number versus medium of directory
definition of science in the OECD context: narrow meaning
definition of science in the Riding the wave document: wiesenschaft-->much broader
Principles -- international, interdisciplinary, different contexts, different stakeholder interests; data should be shared openly--within these constraints; the formal responsibilities are the ones that most relate to library professionals (formal responsibility and sustainability)
the researchers more concerned about the openness
- Legal conformity
- Protection of intellectual property
- Formal responsibility
"Free speech vs. free beer"
fundamental principles-->Peter Suber
the basis of open access
1. authors generally hold the IP until released
2. authors want impact, not revenue
these are specific to scholarly publishing and do not hold for other types of publications
Borgman argues that neither of these hold with data***
capture data upstream--all members of the academic senate grant a non-exclusive license to the UC
I would argue that this is also financially motivated, as tenure is dependent on these things
openness versus privacy and confidentiality tension in the social sciences
data--you can't pin down who owns it
the incentive structure could be made to create incentives to share
MT: is very similar to the law stuff Rebecca is studying
the impact factors for journals are easily manipulated and the editorial policy of getting into the ISI list has effect on what gets cited
this also relates to what Leah said about journals only publishing confirmatory articles
China ISI manipulation selling authorship for a year's salary in Science
when you force data sharing, there is sabotage
public goods common pool resources (this is where sustainability issues arise)
Berman & Lavoie on digital sustainability issues
the Riding the Wave doc starts with develop the infrastructure
the Australians are doing it this way, as well
to accommodate sustainability, you need to start with the infrastructure
a lot of claims that making data citable will create incentives, but Borgman doesn't think this is the case
- All stakeholders, from scientists to national authorities to the
general public, are aware of the critical importance of conserving
and sharing reliable data produced during the scientific process.
Researchers and practitioners from any discipline are able to find,
access and process the data they need. They can be confident in
their ability to use and understand data, and they can evaluate
the degree to which that data can be trusted.
Producers of data benefit from opening it to broad access, and
prefer to deposit their data with confidence in reliable
repositories. A framework of repositories is guided by
international standards, to ensure they are trustworthy.
Public funding rises, because funding bodies have confidence that
their investments in research are paying back extra dividends to
society, through increased use and re-use of publicly generated
The innovative power of industry and enterprise is harnessed by
clear and efficient arrangements for exchange of data between
private and public sectors, allowing appropriate returns to both.
The public has access to and can make creative use of the huge
amount of data available; it can also contribute to the data store
and enrich it. Citizens can be adequately educated and prepared
to benefit from this abundance of information.
Policy makers are able to make decisions based on solid
evidence, and can monitor the impacts of these decisions.
Government becomes more trustworthy.
Global governance promotes international trust and
data as infrastructure
challenges to overcome (22)
trust fabric: has a policy; part of a group that has specific practices; data seal of approval; certifications-->all recognized by the general community as being good. but she's found that there are a lot of holes: need a succession plan to examine potentialities for the future of your data
ARL Association of Research Libraries-->policy group comprises ~100 of the largest research libraries (based on bad states) -- they present a united front in reference policy issues
Scholarly Publishing and Academic Research Consortium (SPARK) is part of ARL
no single institution to give a seal of approval
those kinds of relationships affect infrastructure and sustainability
MT: and the fact that other places implement their policy is reaffirming their authority
readings for next week assume that the data life cycle is discrete, when it is really a more amorphous process
the many ways in which data will be used is nt accounted for in the life cycle model of information/archives-->more like a continual iteration
MT: this has a ton to do with the uncertainty of ways the data could be used in the future
MT: is there a new underrepresented minority in people who do not participate in social media?
MT: ask Dad what happened to his recorded interviews audio cassettes
MT: I should ask Chris what people will do with their inherited digital collections over time
"floss the teeth you want to keep"
read the lifecycle thinking about the different stakeholder concerns
next week due the 29th
Assignment 1 is due: assess the data archiving needs of a research community
address these questions and answer each question discretely
MT: find out abt transplantation data
if it's sthg that dovetails with your term project, cool, but doesn't have to
double-spaced, 12pt font
community that publishes journal articles and collects data
look ofr posted policies and journal data repository requirements
repositories, embargo periods,