IS289.3 Week 3 Class Notes

In Uncategorized

In the context of these Principles and Guidelines, “research data” are
defined as factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings. A research data set constitutes a systematic, partial representation of the subject being investigated.


This term does not cover the following: laboratory notebooks, preliminary analyses, and drafts of scientific papers, plans for future research, peer reviews, or personal communications with colleagues or physical objects (e.g. laboratory samples, strains of bacteria and test animals such as mice). Access to all of these products or outcomes of research is governed by different considerations than those dealt with here.

Physical stuff is excluded, "factual stuff" is included-->of course, factual is subjective

MT: the "commonly accepted" standards part of the definition conflicts with the exclusions
most of these policies talk about standards within your community, but ignores the fact that people are members of multiple communities with various different standards

the exclusions in the OECD document may have been the result of a negotiation to get the document implemented to get people to have at least some bit of consensus

they had 40 years of practice publishing this type of data, so they refer to a pool of data with which they are accustomed. but since the policy is intended to apply broadly, it comes up short.

highly contentious terms: validate, replicate, reproduce, verification
how they are interpreted in other fields
bt getting the same conditions to reproduce research is extremely difficult; and even if you can do these, journals won't accept it because it's not novel

ROI is contemporary language of monetization of data, which wasn't considered in the creation of the OECD policy guidelines

library aphorism about discovery: "30 hours in the lab will save you one hour in the library"

if we documented better so ppl could find them, you'd have better ROI

1960s policy documents and pouring $ into info retrieval systems-->the same argument for better discovery so we can afford to duplicate the effort

the fact that in most countries facts cannot be copyrighted, but the database configuration may be copyrighted

framing the idea of data in the knowledge commons: the packaging matters as much in the value as what's in it

Zeran case?-->phone number versus medium of directory

definition of science in the OECD context: narrow meaning

definition of science in the Riding the wave document: wiesenschaft-->much broader

Principles -- international, interdisciplinary, different contexts, different stakeholder interests; data should be shared openly--within these constraints; the formal responsibilities are the ones that most relate to library professionals (formal responsibility and sustainability)

the researchers more concerned about the openness

  1. Openness
  2. Flexibility
  3. Transparency
  4. Legal conformity
  5. Protection of intellectual property
  6. Formal responsibility
  7. Professionalism
  8. Interoperability
  9. Quality
  10. Security
  11. Efficiency
  12. Accountability
  13. Sustainability

"Free speech vs. free beer"

fundamental principles-->Peter Suber
the basis of open access
1. authors generally hold the IP until released
2. authors want impact, not revenue

these are specific to scholarly publishing and do not hold for other types of publications

Borgman argues that neither of these hold with data***

capture data upstream--all members of the academic senate grant a non-exclusive license to the UC

I would argue that this is also financially motivated, as tenure is dependent on these things

openness versus privacy and confidentiality tension in the social sciences

data--you can't pin down who owns it

the incentive structure could be made to create incentives to share

MT: is very similar to the law stuff Rebecca is studying

the impact factors for journals are easily manipulated and the editorial policy of getting into the ISI list has effect on what gets cited

this also relates to what Leah said about journals only publishing confirmatory articles

China ISI manipulation selling authorship for a year's salary in Science

when you force data sharing, there is sabotage

public goods        common pool resources (this is where sustainability issues arise)

club                           private


Berman & Lavoie on digital sustainability issues

the Riding the Wave doc starts with develop the infrastructure

the Australians are doing it this way, as well

to accommodate sustainability, you need to start with the infrastructure

a lot of claims that making data citable will create incentives, but Borgman doesn't think this is the case

Wave principles

  1. All stakeholders, from scientists to national authorities to the
    general public, are aware of the critical importance of conserving
    and sharing reliable data produced during the scientific process.
  2. Researchers and practitioners from any discipline are able to find,
    access and process the data they need. They can be confident in
    their ability to use and understand data, and they can evaluate
    the degree to which that data can be trusted.

  3. Producers of data benefit from opening it to broad access, and
    prefer to deposit their data with confidence in reliable
    repositories. A framework of repositories is guided by
    international standards, to ensure they are trustworthy.

  4. Public funding rises, because funding bodies have confidence that
    their investments in research are paying back extra dividends to
    society, through increased use and re-use of publicly generated

  5. The innovative power of industry and enterprise is harnessed by
    clear and efficient arrangements for exchange of data between
    private and public sectors, allowing appropriate returns to both.

  6. The public has access to and can make creative use of the huge
    amount of data available; it can also contribute to the data store
    and enrich it. Citizens can be adequately educated and prepared
    to benefit from this abundance of information.

  7. Policy makers are able to make decisions based on solid
    evidence, and can monitor the impacts of these decisions.
    Government becomes more trustworthy.

  8. Global governance promotes international trust and

Screen Shot 2014-01-22 at 2.36.40 PM


data as infrastructure

challenges to overcome (22)

Screen Shot 2014-01-22 at 2.38.27 PM


trust fabric: has a policy; part of a group that has specific practices; data seal of approval; certifications-->all recognized by the general community as being good. but she's found that there are a lot of holes: need a succession plan to examine potentialities for the future of your data

ARL Association of Research Libraries-->policy group comprises ~100 of the largest research libraries (based on bad states) -- they present a united front in reference policy issues

Scholarly Publishing and Academic Research Consortium (SPARK) is part of ARL

no single institution to give a seal of approval

those kinds of relationships affect infrastructure and sustainability

MT: and the fact that other places implement their policy is reaffirming their authority

Uhlir slides

Download (PPT, Unknown)


readings for next week assume that the data life cycle is discrete, when it is really a more amorphous process

the many ways in which data will be used is nt accounted for in the life cycle model of information/archives-->more like a continual iteration

MT: this has a ton to do with the uncertainty of ways the data could be used in the future

MT: is there a new underrepresented minority in people who do not participate in social media?

MT: ask Dad what happened to his recorded interviews audio cassettes

MT: I should ask Chris what people will do with their inherited digital collections over time

"floss the teeth you want to keep"

read the lifecycle thinking about the different stakeholder concerns

next week due the 29th
Assignment 1 is due: assess the data archiving needs of a research community
address these questions and answer each question discretely


MT: find out abt transplantation data
if it's sthg that dovetails with your term project, cool, but doesn't have to
double-spaced, 12pt font

community that publishes journal articles and collects data
look ofr posted policies and journal data repository requirements

repositories, embargo periods,