September 5, 2018 at 08:01PM americanlibrariesmagazine Data Collection and Privacy
The University of Arizona in Tucson made big news earlier this year when it revealed that it was tracking swipes of ID cards given to every student and used at almost 700 campus locations in an attempt to predict which students are likely to drop out.
It’s an example of learning analytics, the use of data to understand and optimize learning and learning environments. The general concept isn’t new—the university’s announcement noted that student retention has been studied for more than 30 years—but the amount of data that is easy to generate with card swipes has exploded in recent years. And while the goals of learning analytics projects may be noble, the practice has raised alarms among privacy advocates.
“I find the idea of constant surveillance, particularly of adult students, problematic,” says Deborah Caldwell-Stone, deputy director of the American Library Association’s Office for Intellectual Freedom. “When it starts to include what’s going on in the library, it raises questions about free expression, because if you’re being tracked and you know it, you’re less likely to conduct research that might raise questions about you.”
Students may not be informed about the data that is being collected or why. They may not be given an opportunity to opt out. And if the data is not properly managed, it may ultimately be used for purposes beyond its original intent—or by vendors completely outside of the university’s control.
Nevertheless, libraries of all types can easily be tempted to collect data for learning analytics projects. “When people are fighting for budgets, they need to show that the money is well placed, and one of the ways of achieving that is by showing data,” says Michael Zimmer, associate professor in the University of Wisconsin–Milwaukee (UWM) School of Information Studies and director of the Center for Information Policy Research.
Zimmer is leading a project funded by an Institute of Museum and Library Services (IMLS) grant to develop field guides for librarians on data privacy and security issues. In many cases, he notes, library staff won’t be the final audience for these guides. “It’s the administrators and boards of trustees or city managers who librarians need to convince why privacy needs to be maintained as a core value,” Zimmer says. “We’re trying to provide a road map of the questions to ask and the factors to weigh.” He hopes initial drafts will be available for feedback by the end of the year.
In 2014, Seattle Public Library (SPL) undertook a learning analytics project with the aim of increasing the use of library resources by millennials. That project evolved into a data warehouse, containing information from several library sources. Library Applications and Systems Manager Becky Yoose is responsible for making sure the data collection respects patron privacy. She has been investigating methods to deidentify data—removing or modifying information that could be used to pinpoint a patron—collected for the library’s data warehouse and used for a variety of analytics projects.
Instead of storing patron birthdates, for example, the warehouse will only note the age of the patron at the time of a transaction. Or instead of tracking a checked-out item’s full call number, it will store a truncated version to identify the general category but not a specific title.
Creating this type of data warehouse requires high-level technical expertise, both to build the architecture and to carry out the process to extract, transform, and load the data to prevent personally identifiable information from making it in.
And even so, no universally effective process exists for deidentifying data. “SPL can use various methods of deidentification because our population is large enough and we don’t have a lot of outliers,” Yoose says. In libraries with smaller population segments, identifying a library user from just a few data points may still be possible. For example, in an academic library, a user’s year and major may be enough to detect who makes a transaction. “And once you start collecting more information, the accuracy of reidentification increases,” she observes.
Kristin Briney, data services librarian at UWM, has observed misconceptions about data privacy in her research into learning analytics projects. “A lot of libraries say that they’re anonymizing data, but you look at their procedures and it’s not actually,” she says. “People think their data is more secure than it is.”
In the IMLS grant–funded Data Doubles project, she and other researchers will investigate how students feel about the ways in which libraries and universities use their data. “I think that’s a big missing voice.”
Responsible data management principles
Pressure to justify library investments—and the temptation to collect and analyze data—is unlikely to abate any time soon. “It’s easy if you have access to data to run with it and learn new things without thinking about privacy,” Briney says. “But I think as libraries we have an ethical obligation to step back and think about it.”
Since the regulatory framework related to private data hasn’t kept up with the ability to collect it, privacy efforts related to library data will have to come from librarians. “Here in the US, we have sort of a wild, wild West—if you can get your hands on the data, you can pretty much treat it as your property unless there’s a contract between an institution and the vendor to prevent it,” Caldwell-Stone observes.
Privacy regulations—such as the European Union’s General Data Protection Regulation (GDPR), which took effect in May—could have a significant impact on US institutions, although it’s too early to know precisely how. Similar privacy legislation “has affected learning analytics work,” Briney says, “mostly making libraries careful to have a valid justification to retain learning analytics data.”
In Seattle, Yoose says the library has created a data governance committee to shape how it will use data and ensure responsibility. General principles that Yoose recommends include being clear about what data is needed and why, and not casting a wider net just in case. “We have a huge case of data FOMO [fear of missing out], and it’s common in a lot of professions,” she says. By asking why data is being collected—repeatedly if necessary to get to its actual purpose—alternatives to risky data collection can often be found, such as recording that a valid ID was checked rather than collecting and storing driver’s license numbers in patron records.