Jos Berens, Stefaan Verhulst
An analysis of terms and conditions present in a diversity of data-driven prizes and challenges to better understand governance frameworks of data sharing practices.
Published in The GovLab in 2015
Blog Post
Governance and Operations
Published in The GovLab in 2015
World Economic Forum
An overview report from the World Economic Forum on the existing data deficit and the value and impact of big data for sustainable development.
Published in 2015
Report
Benefits
Incentives
Governance and Operations
Published in 2015
This report captures an overview of the existing data deficit and the value and impact of big data for sustainable development.
The authors of the report focus on four main priorities towards a sustainable data revolution: commercial incentives and trusted agreements with public- and private-sector actors; the development of shared policy frameworks, legal protections and impact assessments; capacity building activities at the institutional, community, local and individual level; and lastly, recognizing individuals as both produces and consumers of data.
Frederika Welle Donker, Bastiaan van Loenen, Arnold K. Bregt
A case study examining the opening of private data by Dutch energy network administrator Liander.
Published in International Journal of Geo-Information in 2016
Case Study
In Practice
Governance and Operations
Published in International Journal of Geo-Information in 2016
This research has developed a monitoring framework to assess the effects of open (private) data using a case study of a Dutch energy network administrator Liander.
Focusing on the potential impacts of open private energy data – beyond ‘smart disclosure’ where citizens are given information only about their own energy usage – the authors identify three attainable strategic goals:
The authors propose a seven-step framework for assessing the impacts of Liander data, in particular, and open private data more generally:
While the authors note that the true impacts of this open private data will likely not come into view in the short term, they argue that, “Liander has successfully demonstrated that private energy companies can release open data, and has successfully championed the other Dutch network administrators to follow suit.”
Nicholas Vogel, Christopher Theisen, Jonathan P. Leidig, Jerry Scripps, Douglas H. Graham, Greg Wolffe
A paper on the use of mobile call records to enable predictive action around Ebola diffusion.
Published in Paper presented at the Procedia Computer Science in 2015
Paper
Benefits
In Practice
Published in Paper presented at the Procedia Computer Science in 2015
The paper presents a research study conducted on the basis of the mobile calls records shared with researchers in the framework of the Data for Development Challenge by the mobile operator Orange.
The study discusses the data analysis approach in relation to developing a situation of Ebola diffusion built around “the interactions of multi-scale models, including viral loads (at the cellular level), disease progression (at the individual person level), disease propagation (at the workplace and family level), societal changes in migration and travel movements (at the population level), and mitigating interventions (at the abstract government policy level).”
The authors argue that the use of their population, mobility, and simulation models provide more accurate simulation details in comparison to high-level analytical predictions and that the D4D mobile datasets provide high-resolution information useful for modeling developing regions and hard to reach locations.
Stefaan Verhulst, Iryna Susha, Alexander Kostura
A report describing emerging practice, opportunities and challenges in data collaboratives as identified at the International Data Responsibility Conference.
Published in 2016
Essay
General
Benefits
Incentives
In Practice
Published in 2016
This piece articulates a set of key lessons learned during a session at the International Data Responsibility Conference focused on identifying emerging practices, opportunities and challenges confronting data collaboratives.
The authors list a number of privately held data sources that could create positive public impacts if made more accessible in a collaborative manner, including:
To the end of identifying and expanding on emerging practice in the space, the authors describe a number of current data collaborative experiments, including:
In order to capitalize on the opportunities provided by data collaboratives, a number of needs were identified:
Elizabeth Stuart, Emma Samman, William Avis, Tom Berliner
The Overseas Development Institute’s annual report focused on solutions toward a sustainable data revolution.
Published in 2015
Report
Benefits
Risks and Challenges
Data Responsibility
In Practice
Governance and Operations
Published in 2015
The authors of this Overseas Development Institute report highlight the need for good quality, relevant, accessible and timely data for governments to extend services into underrepresented communities and implement policies towards a sustainable “data revolution.”
The report explores solutions focused on capacity-building activities of national statistical offices (NSOs), alternative sources of data (including shared corporate data) to address gaps, and building strong data management systems.
Stefaan Verhulst, David Sangokoya
An essay on leveraging the potential of data to solve complex public problems through data collaboratives and four critical accelerators towards responsible data sharing and collaboration.
Published in 2015
Essay
General
Benefits
Incentives
In Practice
Governance and Operations
Published in 2015
The essay refers to data collaboratives as a new form of collaboration involving participants from different sectors exchanging data to help solve public problems. These forms of collaborations can improve people’s lives through data-driven decision-making; information exchange and coordination; and shared standards and frameworks for multi-actor, multi-sector participation.
The essay cites four activities that are critical to accelerating data collaboratives: documenting value and measuring impact; matching public demand and corporate supply of data in a trusted way; training and convening data providers and users; experimenting and scaling existing initiatives.
Stefaan Verhulst, David Sangokoya
This essay describes an emerging taxonomy of activities involving corporate data sharing for public good, an emerging trend in which companies share anonymized and aggregated data with third-party users towards data-driven policymaking and greater public good.
Published in Internet Monitor 2014: Reflections on the Digital World: Platforms, Policy, Privacy, and Public Discourse in 2014
Paper
General
Benefits
In Practice
Governance and Operations
Published in Internet Monitor 2014: Reflections on the Digital World: Platforms, Policy, Privacy, and Public Discourse in 2014
This essay, included in the Harvard Berkman Center’s 2014 Internet Monitor, describes a taxonomy of current corporate data sharing practices for public good: research partnerships; prizes and challenges; trusted intermediaries; application programming interfaces (APIs); intelligence products; and corporate data cooperatives or pooling.
Examples of data collaboratives discussed in the piece include: Yelp Dataset Challenge, the Digital Ecologies Research Partnerhsip, BBVA Innova Challenge, Telecom Italia’s Big Data Challenge, NIH’s Accelerating Medicines Partnership and the White House’s Climate Data Partnerships.
The authors highlight important questions to consider towards a more comprehensive mapping of these activities.
Willem G van Panhuis, Proma Paul, Claudia Emerson, John Grefenstette, Richard Wilder, Abraham J Herbst, David Heymann, Donald S Burke
A literature review of potential barriers to public health data sharing.
Published in BMC Public Health in 2014
Journal Article
Risks and Challenges
Data Responsibility
Published in BMC Public Health in 2014
The authors of this report provide a “systematic literature of potential barriers to public health data sharing.” These twenty potential barriers are classified in six categories: “technical, motivational, economic, political, legal and ethical.” In this taxonomy, “the first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing.”
The authors suggest the need for a “systematic framework of barriers to data sharing in public health” in order to accelerate access and use of data for public good.
Linnet Taylor, Ralph Schroeder
A paper describing how data, such as privately held mobile phone data – could improve development policy.
Published in GeoJournal in 2014
Journal Article
General
Benefits
Risks and Challenges
Published in GeoJournal in 2014
This journal article describes how privately held data – namely “digital traces” of consumer activity – “are becoming seen by policymakers and researchers as a potential solution to the lack of reliable statistical data on lower-income countries.”
They focus especially on three categories of data collaborative use cases:
They note, however, that a number of challenges and drawbacks exist for these types of use cases, including:
Nicholas Robin, Thilo Klein, Johannes Jütting
A working paper describing how privately held data sources could fill current gaps in the efforts of National Statistics Offices.
Published in OECD Development Co-operation Working Papers in 2016
Paper
General
Benefits
Risks and Challenges
Data Responsibility
In Practice
Governance and Operations
Published in OECD Development Co-operation Working Papers in 2016
This working paper acknowledges the growing body of work on how different types of data (e.g, telecom data, social media, sensors and geospatial data, etc.) can address data gaps relevant to National Statistical Offices (NSOs).
Four models of public-private interaction for statistics are describe: in-house production of statistics by a data-provider for a national statistics office (NSO), transfer of data-sets to NSOs from private entities, transfer of data to a third party provider to manage the NSO and private entity data, and the outsourcing of NSO functions.
The paper highlights challenges to public-private partnerships involving data (e.g., technical challenges, data confidentiality, risks, limited incentives for participation), suggests deliberate and highly structured approaches to public-private partnerships involving data require enforceable contracts, emphasizes the trade-off between data specificity and accessibility of such data, and the importance of pricing mechanisms that reflect the capacity and capability of national statistic offices.
Case studies referenced in the paper include:
Markus Perkmann, Henri Schildt
A paper highlighting the advantages of third-party organizations enabling data sharing between industry and academia to uncover new insights to benefit the public good.
Published in Research Policy in 2015
Journal Article
General
Incentives
Risks and Challenges
Governance and Operations
Published in Research Policy in 2015
This paper discusses the concept of a “boundary organization” in relation to industry-academic partnerships driven by data. Boundary organizations perform mediated revealing, allowing firms to disclose their research problems to a broad audience of innovators and simultaneously minimize the risk that this information would be adversely used by competitors.
The authors identify two especially important challenges for private firms to enter open data or participate in data collaboratives with the academic research community that could be addressed through more involvement from boundary organizations:
David Pastor-Escuredo, Alfredo Morales-Guzám, Yolanda Torres-Fernández, Jean-Martin Bauer, Amit Wadhwa, Carlos Castro-Correa, Liudmyla Romanoff, Jong Gun Lee, Alex Rutherford, Vanessa Frias-Martinez, Nuria Oliver, Enrique Frias-Martinez, Miguel Luengo-Oroz
An analysis of aggregated and anonymized call details records (CDR) conducted in collaboration with the UN, Government of Mexico, academia and Telefonica suggests high potential in using shared telecom data to improve early warning and emergency management mechanisms.
Published in Proceedings of the IEEE Global Humanitarian Technology Conference (GHTC) in 2014
Paper
Benefits
Incentives
In Practice
Published in Proceedings of the IEEE Global Humanitarian Technology Conference (GHTC) in 2014
This report describes the impact of using mobile data in order to understand the impact of disasters and improve disaster management. The report was conducted in the Mexican state of Tabasco in 2009 as a multidisciplinary, multi-stakeholder consortium involving the UN World Food Programme (WFP), Telefonica Research, Technical University of Madrid (UPM), Digital Strategy Coordination Office of the President of Mexico, and UN Global Pulse.
Telefonica Research, a division of the major Latin American telecommunications company, provided call detail records covering flood-affected areas for nine months. This data was combined with “remote sensing data (satellite images), rainfall data, census and civil protection data.” The results of the data demonstrated that “analysing mobile activity during floods could be used to potentially locate damaged areas, efficiently assess needs and allocate resources (for example, sending supplies to affected areas).”
In addition to the results, the study highlighted “the value of a public-private partnership on using mobile data to accurately indicate flooding impacts in Tabasco, thus improving early warning and crisis management.”
Gideon Mann
The transcript of a keynote talk on the potential of leveraging corporate data to help solve public problems.
Published in 2016
Blog Post
General
Benefits
Risks and Challenges
Data Responsibility
Governance and Operations
Published in 2016
This Medium post from Gideon Mann, the Head of Data Science at Bloomberg, shares his prepared remarks given at a lecture at the City College of New York. Mann argues for the potential benefits of increasing access to private sector data, both to improve research and academic inquiry and also to help solve practical, real-world problems. He also describes a number of initiatives underway at Bloomberg along these lines.
Mann argues that data generated at private companies “could enable amazing discoveries and research,” but is often inaccessible to those who could put it to those uses. Beyond research, he notes that corporate data could, for instance, benefit:
Mann recognizes the privacy challenges inherent in private sector data sharing, but argues that it is a common misconception that the only two choices are “complete privacy or complete disclosure.” He believes that flexible frameworks for differential privacy could open up new opportunities for responsibly leveraging data collaboratives.
Sharona Hoffman, Andy Podgurski
A journal article primarily focused on the risks involved in health data pooling.
Published in American Journal of Law & Medicine in 2013
Journal Article
Risks and Challenges
Data Responsibility
Published in American Journal of Law & Medicine in 2013
This journal article explores the benefits and, in particular, the risks related to large-scale biomedical databases bringing together health information from a diversity of sources across sectors. Some data collaboratives examined in the piece include:
Hoffman and Podgurski note that biomedical databases populated have many potential uses, with those likely to benefit including: “researchers, regulators, public health officials, commercial entities, lawyers,” as well as “healthcare providers who conduct quality assessment and improvement activities,” regulatory monitoring entities like the FDA, and “litigants in tort cases to develop evidence concerning causation and harm.”
They argue, however, that risks arise based on:
Harlan M. Krumholz, Cary P. Gross, Katrina L. Blount, Jessica D. Ritchie, Beth Hodshon, Richard Lehman, Joseph S. Ross
A review of industry-led efforts and cross-sector collaborations to share data from clinical trials to inform clinical practice.
Published in Circulation: Cardiovascular Quality and Outcomes in 2015
Journal Article
Incentives
In Practice
Published in Circulation: Cardiovascular Quality and Outcomes in 2015
This article provides a comprehensive overview of industry-led efforts and cross-sector collaborations in data sharing by pharmaceutical companies to inform clinical practice.
The article details the types of data being shared and the early activities of GlaxoSmithKline (“in coordination with other companies such as Roche and ViiV”); Medtronic and the Yale University Open Data Access Project; and Janssen Pharmaceuticals (Johnson & Johnson). The article also describes the range of involvement in data sharing among pharmaceutical companies including Pfizer, Novartis, Bayer, AbbVie, Eli Llly, AstraZeneca, and Bristol-Myers Squibb.
Silja M. Eckartz, Wout J. Hofman, Anne Fleur Van Veenstra
A paper proposing a decision model for data sharing arrangements aimed at addressing identified risks and challenges.
Published in International Conference on Electronic Government in 2014
Report
Risks and Challenges
Data Responsibility
Published in International Conference on Electronic Government in 2014
This paper proposes a decision model for data sharing of public and private data based on literature review and three case studies in the logistics sector.
The authors identify five categories of the barriers to data sharing and offer a decision model for identifying potential interventions to overcome each barrier:
Cameron F. Kerry, Jake Kendall, Yves-Alexandre de Montjoye
An issues paper from the Brookings Institution on leveraging the benefits of mobile phone data for humanitarian use while minimizing risks to privacy.
Published in 2016
Report
Benefits
In Practice
Governance and Operations
Published in 2016
Using Ebola as a case study, the authors describe the value of using private telecom data for uncovering “valuable insights into understanding the spread of infectious diseases as well as strategies into micro-target outreach and driving update of health-seeking behavior.”
The authors highlight the absence of a common legal and standards framework for “sharing mobile phone data in privacy-conscientious ways” and recommend “engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.”
Matthew Brack, Tito Castillo
A Chatham House report describing the need for data sharing and collaboration for global public health emergencies and potential lessons learned from the commercial sector.
Published in Chatham House in 2015
Report
Benefits
Risks and Challenges
Governance and Operations
Published in Chatham House in 2015
This Chatham House report provides an overview on public health surveillance data sharing, highlighting the benefits and challenges of shared health data and the complexity in adapting technical solutions from other sectors for public health.
The report describes data sharing processes from several perspectives, including in-depth case studies of actual data sharing in practice at the individual, organizational and sector levels. Among the key lessons for public health data sharing, the report strongly highlights the need to harness momentum for action and maintain collaborative engagement: “Successful data sharing communities are highly collaborative. Collaboration holds the key to producing and abiding by community standards, and building and maintaining productive networks, and is by definition the essence of data sharing itself. Time should be invested in establishing and sustaining collaboration with all stakeholders concerned with public health surveillance data sharing.”
Examples of data collaboratives include H3Africa (a collaboration between NIH and Wellcome Trust) and NHS England’s care.data programme.
Chris Ansell, Alison Gash
A journal article describing the emerging practice of public-private partnerships, particularly those built around data sharing.
Published in Journal of Public Administration Research and Theory in 2007
Journal Article
Governance and Operations
Published in Journal of Public Administration Research and Theory in 2007
This article describes collaborative arrangements that include public and private organizations working together and proposes a model for understanding an emergent form of public-private interaction informed by 137 diverse cases of collaborative governance.
The article suggests factors significant to successful partnering processes and outcomes include:
The authors provide a ‘’contingency theory model’’ that specifies relationships between different variables that influence outcomes of collaborative governance initiatives. Three “core contingencies’’ for successful collaborative governance initiatives identified by the authors are:
Bellagio Big Data Workshop Participants
A white paper describing the potential of big data, and corporate data in particular, to positively benefit development efforts.
Published in 2014
Report
Benefits
Risks and Challenges
Published in 2014
This white paper, produced by “a group of activists, researchers and data experts” explores the potential of big data to improve development outcomes and spur positive social change in low- and middle-income countries. Using examples, the authors discuss four areas in which the use of big data can impact development efforts:
The authors argue that in order to maximize the potential of big data’s use in development, “there is a case to be made for building a data commons for private/public data, and for setting up new and more appropriate ethical guidelines.”
They also identify a number of challenges, especially when leveraging data made accessible from a number of sources, including private sector entities, such as:
Institute of Medicine
A consensus, peer-revieed IOM report recommending how to promote responsible clinical trial data sharing and minimize risks and challenges of sharing.
Published in 2015
Report
Benefits
Risks and Challenges
Data Responsibility
Published in 2015
Stefaan Verhulst
An essay offering a new understanding of data responsibility comprising a duty to share, a data to protect, and a duty to act.
Published in The Conversation in 2016
Essay
General
Benefits
Data Responsibility
In Practice
Published in The Conversation in 2016
Iryna Susha, Marijn Janssen, Stefaan Verhulst
A research paper providing a new taxonomy for types of data collaboratives.
Published in Proceedings of the 50th Hawaii International Conference on System Sciences in 2017
Paper
General
Governance and Operations
Published in Proceedings of the 50th Hawaii International Conference on System Sciences in 2017
Monica Bulger, Patrick McCormick, Mikaela Pitcan
A report exploring the history of inBloom, and education technology and data platform launched in 2013 and ended a year later.
Published in Data & Society Working Paper Series in 2017
Report
Risks and Challenges
In Practice
Published in Data & Society Working Paper Series in 2017
Stefaan G. Verhulst
This article investigates how companies use their data for social good, and identifies an emerging field of “data responsibility” where proprietary data can be used as a social asset.
Published in Stanford Social Innovation Review in 2017
Essay
Magazine Article
Benefits
Data Responsibility
In Practice
Published in Stanford Social Innovation Review in 2017
In this article for the Stanford Social Innovation Review, Stefaan Verhulst analyses how an increasing number of companies are using their data for social good, evidence for a new concept of “data responsibility” where data and information is used to reach positive public ends. Data Responsibility is defined as steps companies can take to open their proprietary data to external bodies, who can use this information to confront humanitarian emergencies and public problems. Central to this is the creation of “Data Collaboratives”—data-sharing models between corporations and public institutions, NGOs, and academic bodies. The article goes on to detail the “Three Pillars of Data Responsibility” which include: • Share: the duty for data-holders to share their data. • Protect: the need to protect data through adequate anonymization techniques. • Act: the need for policy makers and leaders to act upon the information and data shared. The article ends by calling on a “culture shift” to incorporate concepts of data responsibility into day-to-day company activities, and outlying four practical steps to make this happen.
Iryna Susha, Marijn Janssen, Stefaan Verhulst
Focusses on the operational challenges on implementing Data Collaboratives.
Published in Transforming Government: People, Process and Policy, Vol. 11 Issue: 1 in 2017
Journal Article
Risks and Challenges
Operations
In Practice
Published in Transforming Government: People, Process and Policy, Vol. 11 Issue: 1 in 2017
Freddy De Meersman, Gerdy Seynaeve, Marc Debusschere, Patrick Lusyne, Pieter Dewitte, Youri Baeyens, Albrecht Wirthmann, Christophe Demunter, Fernando Reis, Hannes I. Reuter
This paper presented at the 2016 European Conference on Quality in Official Statistic assesses the ability for mobile phone data in Belgium to be used to collect official statistics, comparing data collected from mobile phones with official census data.
Published in European Conference on Quality in Official Statistics (Q2016) in 2016
Journal Article
Benefits
In Practice
Governance and Operations
Published in European Conference on Quality in Official Statistics (Q2016) in 2016
Jake Porway
This articles provides non profits with three guiding principles on how to effectively incorporate data science into their operations.
Published in Stanford Social Innovation Review in 2017
Magazine Article
Operations
In Practice
Governance and Operations
Published in Stanford Social Innovation Review in 2017
This articles provides non profits with three guiding principles on how to effectively incorporate data science into their operations. (1) Collaborate with data science experts to define your project. (2) Collaborate across your organization to “build with, not for.” (3) Collaborate across your sector to move the needle.
UN Global Pulse, GSMA
“This report outlines the value of harnessing mobile data for social good and provides an analysis of the gaps. Its aim is to survey the landscape today, assess the current barriers to scale, and make recommendations for a way forward.”
Published in 2017
Report
Case Study
Benefits
Risks and Challenges
Data Responsibility
In Practice
Published in 2017
– “The report reviews the challenges the field is currently facing and discusses a range of issues preventing mobile data from being used for social good. – It continues by providing a set of recommendations intended to move beyond short-term and ad hoc projects to more systematic and institutionalized implementations that are scalable, replicable, sustainable and focused on impact. – Finally, the report proposes a roadmap for 2018 calling all stakeholders to work on developing a scalable and impactful demonstration project that will help to establish the value of mobile data for social good. – The report includes examples of innovation projects and ways in which mobile data is already being used to inform development and humanitarian work. It is intended to inspire social impact organizations and mobile network operators (MNOs) to collaborate in the exploration and application of new data sources, methods and technologies.”
Future of Privacy Forum
“In this report, we aim to contribute to the literature by seeking the ‘ground truth’ from the corporate sector about the challenges they encounter when they consider making data available for academic research.”
Published in 2017
Report
Benefits
Incentives
Risks and Challenges
Data Responsibility
In Practice
Published in 2017
– “The report seeks to provide insights about why companies share data with academic researchers, how they make that data available for research, the perceived risks from sharing data, and the strategies that companies employ to address those risks. – Participants that provided information for the report came from 19 companies across diverse sectors, such as high-tech manufacturing, workforce, education, healthcare, telecommunucations, real estate, ecommerce, data-related services, transportation, consumer genetics testing, and online services. – Future of Privacy Forum researchers use the insights gathered from interviews with these companies to identify opportunities for private data sharing. Some recommendations include the potential to enhance a positive public profile of a company sharing data, to increase peer-to-peer knowledge sharing networks, to create a clearinghouse to match academics to companies with the research that they need, and to develop safeguards to mitigate perceived risks.”
Andrew Young, Stefaan Verhulst
An article describing lessons learned regarding data collaboratives built on leveraging social media intelligence
Published in Harvard Business Review in 2018
Magazine Article
Benefits
Data Responsibility
In Practice
Published in Harvard Business Review in 2018
Iryna Susha, Marijn Janssen, Stefaan Verhulst
Published in Transforming Government: People, Process and Policy in 2017
Journal Article
Data Responsibility
Published in Transforming Government: People, Process and Policy in 2017
Purpose In “data collaboratives”, private and public organizations coordinate their activities to leverage data to address a societal challenge. This paper aims to focus on analyzing challenges and coordination mechanisms of data collaboratives.
Design/methodology/approach This study uses coordination theory to identify and discuss the coordination problems and coordination mechanisms associated with data collaboratives. The authors also use a taxonomy of data collaborative forms from a previous empirical study to discuss how different forms of data collaboratives may require different coordination mechanisms.
Findings The study analyzed data collaboratives from the perspective of organizational and task levels. At the organizational level, the authors argue that data collaboratives present an example of the bazaar form of coordination. At the task level, the authors identified five coordination problems and discussed potential coordination mechanisms to address them, such as coordination by negotiation, by third party, by standardization, to name a few.
Research limitations/implications This study is one of the first few to systematically analyze the phenomenon of “data collaboratives”.
Practical implications This study can help practitioners better understand the coordination challenges they may face when initiating a data collaborative and to develop successful data collaboratives by using coordination mechanisms to mitigate these challenges.
Originality/value Data collaboratives are a novel form of data-driven initiatives which have seen rapid experimentation lately. This study draws attention to this concept in the academic literature and highlights some of the complexities of organizing data collaboratives in practice.
Gary King, Nate Persily
Published in 2018
Paper
General
Published in 2018
The mission of the academic social sciences is to understand and ameliorate society’s greatest challenges. The data held by private companies holds vast potential to further this mission. Yet, because of its interaction with highly politicized issues, customer privacy, proprietary content, and differing goals of firms and academics, these data are often inaccessible to university researchers. We propose here a new model for industry-academic partnerships that addresses these problems via a novel organizational structure: Respected scholars form a commission which, as a trusted third party, receives access to all relevant firm information and systems, and then recruits independent academics to do research in specific areas following standard peer review protocols organized and funded by nonprofit foundations. We also report on a partnership we helped forge under this model to make data available about the extremely visible and highly politicized issues surrounding the impact of social media on elections and democracy. In our partnership, Facebook will provide privacy-preserving data and access; seven major politically and substantively diverse nonprofit foundations will fund the research; and the Social Science Research Council will oversee the peer review process for funding and data access.
Stefaan Verhulst, Andrew Young
Data collaboratives can help harness the value of data held by the private sector and create a new added value that can address various public issues. This article delineates the potential of public-private partnership for data sharing, and proposes a data responsibility framework that serves as a guideline as well as a safeguard to protect from the risks involved in data sharing.
Published in Harvard Business Review in 2018
Essay
Data Responsibility
Published in Harvard Business Review in 2018
Stefaan Verhulst
Stefaan Verhulst’s latest work centers on how technology can improve people’s lives and the creation of more effective and collaborative forms of governance. Specifically, he is interested in the perils and promise of collaborative technologies and how to harness the unprecedented volume of information to advance the public good.
Published in TedxMidAtlantic in 2017
Paper
Data Responsibility
Published in TedxMidAtlantic in 2017
Skipper Seabold, Andrea Coppola
This study seeks to assess the possibility of using Google Trends data for forecasting price series in Central America. It discusses some of the challenges inherent in working with it in the context of developing countries. It finds that the addition of the Internet search index improves forecasting over benchmark models in about 20 percent of the series and discusses the reasons for the varied success and potential avenues for future research.
Published in World Bank in 2015
Paper
General
Risks and Challenges
Published in World Bank in 2015
European Commission
The European Commission published this accompanying document to the Communication “Towards a common European data space”. This “Staff Working Document aims to provide a toolbox for companies that are data holders, data users, or both at the same time. For this purpose, it contains a “How to” guide on legal, business and technical aspects of data sharing that can be used in practice when considering and preparing data transfers between companies coming from the same or different sectors.”
Published in European Commission in 2018
Paper
General
Data Responsibility
Governance and Operations
Published in European Commission in 2018
The purpose of this Staff Working Document is “to provide a toolbox for companies that are data holders, data users, or both at the same time. For this purpose, it contains a “How to” guide on legal, business and technical aspects of data sharing that can be used in practice when considering and preparing data transfers between companies coming from the same or different sectors.”
This Staff Working Document defines the following principles to ensure fair markets: transparency, shared value creation, respect for each other’s commercial interests, ensure undistorted competition, minimised data lock-in.
Additionally, it also provides a guide of data sharing for businesses and a guide on how to make data sharing successful and fruitful.
European Commission
The European Commission published this document for the following purpose: “With this Communication, the Commission proposes a package of measures as a key step towards a common data space in the EU - a seamless digital area with the scale that will enable the development of new products and services based on data.”
Published in European Commission in 2018
Paper
General
Benefits
Governance and Operations
Published in European Commission in 2018
In building towards a common data space, which is “a seamless digital area with the scale that will enable the development of new products and services based on data,” the European Commission proposes the following steps:
Alberto Alemanno
This article identifies the major challenges of unlocking private-held data to the benefit of society and sketches a research agenda for scholars interested in collaborative and regulatory solutions aimed at unlocking privately-held data for good.
Published in European Journal of Risk Regulation in 2018
Journal Article
General
Benefits
Risks and Challenges
Published in European Journal of Risk Regulation in 2018
Municipality of Copenhagen and Capital Region of Denmark
The City Data Exchange a collaborative project created by the Municipality of Copenhagen, the Capital Region of Denmark, and Hitachi to create a marketplace for public and private organizations to take part in data exchange.
Published in 2018
Report
Paper
General
Data Responsibility
Governance and Operations
Published in 2018
The CDE has created a platform for both public and private organizations to sell and purchase data in an effort to create a data exchange between the two sectors. One of the datasets in highest demand is what they call people movement patterns data, which is how people in a given area move around at different times and places.
Gabriel Popkin
“In the past few years, technology and satellite companies’ offerings to scientists have increased dramatically. Thousands of researchers now use high-resolution data from commercial satellites for their work. […] Researchers use the new capabilities to track and visualize forest and coral-reef loss; monitor farm crops to boost yields; and predict glacier melt and disease outbreaks.”
Published in Nature in 2018
Journal Article
General
Benefits
In Practice
Published in Nature in 2018
In this article, Popkin examines a number of examples of technology companies initiating data collaboratives, including:
Jamie Holton
From the abstract: “This research looks at the emerging phenomenon of data collaboratives, specifically in the ‘crisis response’ sector, with which the private sector assists the public sector’s data-driven efforts to prevent or respond to humanitarian emergencies. This research explores and explains why the private sector participates in crisis response data collaboratives.”
Published in Leiden University in 2018
Paper
Benefits
Incentives
Data Responsibility
Published in Leiden University in 2018
From the abstract: “This research explores and explains why the private sector participates in crisis response data collaboratives.
“Through secondary literature analysis, and primary survey and interview analysis of three case studies, this research provides new insights into data collaborative objectives, the private sector’s activities, the incentives and risks these collaboratives present for the private sector, and how it mitigates such risks.
“The research concludes that the private sector enters crisis response data collaboratives to help the public sector address one or more of its obstacles to creating datadriven solutions to societal problems, and occasionally to achieve additional objectives for the public good.
“Although the private sector is motivated by various incentives, sufficient mitigation of presented risks, especially risks to data subjects’ privacy and security, is a precondition to joining a crisis response data collaborative.”
Spyratos Spyridon, Vespe Michele, Natale Fabrizio, Weber Ingmar, Zagheni Emilio, Rango Marzia
While research on the use of big data sources for migration is in its infancy, and the diffusion of internet technologies in less developed countries is still limited, the use of big data sources can unveil useful insights on quantitative and qualitative characteristics of migration.
Published in Joint Research Centre, European Commission in 2018
Report
In Practice
Published in Joint Research Centre, European Commission in 2018
In this report, the authors examine social media data to study migration patterns:
Richard Beckwith, John Sherry, David Prendergast
From the chapter: “This paper explores the complex relationship between cities and data or, more accurately, the way that the citizens of a city want data about their community to be managed.”
Published in Springer in 2019
Journal Article
General
Published in Springer in 2019
From the abstract: “Much of the recent excitement around data, especially ‘Big Data,’ focuses on the potential commercial or economic value of data. How that data will affect people isn’t much discussed. People know that smart cities will deploy Internet-based monitoring and that flows of the collected data promise to produce new values. Less considered is that smart cities will be sites of new forms of citizen action—enabled by an ‘economy’ of data that will lead to new methods of collectivization, accountability, and control which, themselves, can provide both positive and negative values to the citizenry. Therefore, smart city design needs to consider not just measurement and publication of data but also the implications of city-wide deployment, data openness, and the possibility of unintended consequences if data leave the city.”
Daniel Kondor, Behrooz Hashemian, Yves-Alexandre de Montjoye, Carlo Ratti
This paper presents “present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people’s mobility traces, coming from a mobile network operator and transportation smart card usage.”
Published in IEEE Transactions on Big Data in 2018
Journal Article
Benefits
Published in IEEE Transactions on Big Data in 2018
From the abstract: “The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people’s mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching process and analyze their impact on the matchability of users. We show that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success only after a one-week long observation of their mobility traces, and over 55% after four weeks. We show that the main determinant of matchability is the expected number of co-occurring records in the two datasets. Finally, we discuss different scenarios in terms of data collection frequency and give estimates of matchability over time. We show that with higher frequency data collection becoming more common, we can expect much higher success rates in even shorter intervals.”
Yves-Alexandre de Montjoye, et. al.
From the article: “The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.”
Published in Nature in 2018
Journal Article
Risks and Challenges
Published in Nature in 2018
Bram Klievink, Haiko van der Voort, Wijnand Veeneman
From the abstract: “This article looks at the idea of data collaboratives as a form of cross-sector partnership to exchange and integrate data and data use to generate public value.”
Published in Information Polity Journal in 2018
Journal Article
Benefits
Published in Information Polity Journal in 2018
Full abstract: “Driven by the technological capabilities that ICTs offer, data enable new ways to generate value for both society and the parties that own or offer the data. This article looks at the idea of data collaboratives as a form of cross-sector partnership to exchange and integrate data and data use to generate public value. The concept thereby bridges data-driven value creation and collaboration, both current themes in the field. To understand how data collaboratives can add value in a public governance context, we exploratively studied the qualitative longitudinal case of an infomobility platform. We investigated the ability of a data collaborative to produce results while facing significant challenges and tensions between the goals of parties, each having the conflicting objectives of simultaneously retaining control whilst allowing for generativity. Taken together, the literature and case study findings help us to understand the emergence and viability of data collaboratives. Although limited by this study’s explorative nature, we find that conditions such as prior history of collaboration and supportive rules of the game are key to the emergence of collaboration. Positive feedback between trust and the collaboration process can institutionalise the collaborative, which helps it survive if conditions change for the worse.”
Bjorn Lundqvist
From the abstract: “This article will discuss what implications combining data in data pools by firms might have on competition, and when competition law should be applicable. It develops the idea that data pools harbour great opportunities, whilst acknowledging that there are still risks to take into consideration, and to regulate.”
Published in Faculty of Law, Stockholm University Research Paper in 2018
Journal Article
Governance and Operations
Published in Faculty of Law, Stockholm University Research Paper in 2018
Jennifer Shkabatur
This paper makes a case for the practice and application of the global commons of data, by proposing alternatives to address the concern of data sharing and presenting policy framework that will allow the global commons of data to work.
Published in Stanford Technology Law Review in 2018
Journal Article
General
Operations
Published in Stanford Technology Law Review in 2018
Abstract:
Data platform companies (such as Facebook, Google, or Twitter) amass and process immense amounts of data that is generated by their users. These companies primarily use the data to advance their commercial interests, but there is a growing public dismay regarding the adverse and discriminatory impacts of their algorithms on society at large. The regulation of data platform companies and their algorithms has been hotly debated in the literature, but current approaches often neglect the value of data collection, defy the logic of algorithmic decision-making, and exceed the platform companies’ operational capacities.
This Article suggests a different approach — an open, collaborative, and incentives-based stance toward data platforms that takes full advantage of the tremendous societal value of user-generated data. It contends that this data shall be recognized as a “global commons,” and access to it shall be made available to a wide range of independent stakeholders — research institutions, journalists, public authorities, and international organizations. These external actors would be able to utilize the data to address a variety of public challenges, as well as observe from within the operation and impacts of the platforms’ algorithms.
After making the theoretical case for the “global commons of data,” the Article explores the practical implementation of this model. First, it argues that a data commons regime should operate through a spectrum of data sharing and usage modalities that would protect the commercial interests of data platforms and the privacy of data users. Second, it discusses regulatory measures and incentives that can solicit the collaboration of platform companies with the commons model. Lastly, it explores the challenges embedded in this approach.
Meg Young, Luke Rodriguez, Emily Keller, Feiyang Sun, Boyang Sa, Jan Whittington, Bill Howe
From the abstract: “We find that the liberal use of synthetic data, in conjunction with strong legal protections over raw data, strikes a tunable balance between transparency, proprietorship, privacy, and research objectives; and that the legal-technical framework we describe can form the basis for organizational data trusts in a variety of contexts.”
Published in Proceedings of ACM in 2019
Paper
Risks and Challenges
Published in Proceedings of ACM in 2019
Bianca Wylie, Sean McDonald
The authors propose a new concept called “Data Trust” to “steward, maintain and manage how data is used and shared — from who is allowed access to it, and under what terms, to who gets to define the terms, and how.”
Published in Center for International Governance Innovation in 2018
Blog Post
Data Responsibility
Governance and Operations
Published in Center for International Governance Innovation in 2018
Claire Borsenberger, Mathilde Hoang, Denis Joram
This paper seeks to answer the following research questions:
Published in Springer in 2019
Essay
General
Published in Springer in 2019
Abstract: “Thanks to appropriate data algorithms, firms, especially those on-line, are able to extract detailed knowledge about consumers and markets. This raises the question of the essential facility character of data. Moreover, the features of digital markets lead to a concentration of this core input in the hands of few big “superstars” and arouse legitimate economic and societal concerns. In a more and more data-driven society, one could ask if data openness is a solution to deal with power derived from data concentration. We conclude that only a case-by-case approach should be followed. Mandatory open data policy should be conditioned on an ex-ante cost-benefit analysis proving that the benefits of disclosure exceed its costs.”
Natalia Adler, Ciro Cattuto, Kyriaki Kalimeri, Daniela Paolotti, Michele Tizzoni, Stefaan Verhulst, Elad Yom-Tov, Andrew Young
“As the product of a data collaborative, this paper leverages private-sector search engine data toward gaining a fuller, more accurate picture of the suicide issue among young people in India.”
Published in Journal of Medical Internal Research in 2019
Journal Article
In Practice
Published in Journal of Medical Internal Research in 2019
Abstract:
“Background: India is home to 20% of the world’s suicide deaths. Although statistics regarding suicide in India are distressingly high, data and cultural issues likely contribute to a widespread underreporting of the problem. Social stigma and only recent decriminalization of suicide are among the factors hampering official agencies’ collection and reporting of suicide rates.
“Objective: As the product of a data collaborative, this paper leverages private-sector search engine data toward gaining a fuller, more accurate picture of the suicide issue among young people in India. By combining official statistics on suicide with data generated through search queries, this paper seeks to: add an additional layer of information to more accurately represent the magnitude of the problem, determine whether search query data can serve as an effective proxy for factors contributing to suicide that are not represented in traditional datasets, and consider how data collaboratives built on search query data could inform future suicide prevention efforts in India and beyond.
“Methods: We combined official statistics on demographic information with data generated through search queries from Bing to gain insight into suicide rates per state in India as reported by the National Crimes Record Bureau of India. We extracted English language queries on “suicide,” “depression,” “hanging,” “pesticide,” and “poison”. We also collected data on demographic information at the state level in India, including urbanization, growth rate, sex ratio, internet penetration, and population. We modeled the suicide rate per state as a function of the queries on each of the 5 topics considered as linear independent variables. A second model was built by integrating the demographic information as additional linear independent variables.
“Results: Results of the first model fit (R2) when modeling the suicide rates from the fraction of queries in each of the 5 topics, as well as the fraction of all suicide methods, show a correlation of about 0.5. This increases significantly with the removal of 3 outliers and improves slightly when 5 outliers are removed. Results for the second model fit using both query and demographic data show that for all categories, if no outliers are removed, demographic data can model suicide rates better than query data. However, when 3 outliers are removed, query data about pesticides or poisons improves the model over using demographic data.
“Conclusions: In this work, we used search data and demographics to model suicide rates. In this way, search data serve as a proxy for unmeasured (hidden) factors corresponding to suicide rates. Moreover, our procedure for outlier rejection serves to single out states where the suicide rates have substantially different correlations with demographic factors and query rates.”
Kieron O'Hara
“This paper defends the following thesis: A data trust works within the law to provide ethical, architectural and governance support for trustworthy data processing.”
Published in 2019
Paper
Governance and Operations
Published in 2019
Abstract: “In their report on the development of the UK AI industry, Wendy Hall and Jérôme Pesenti recommend the establishment of data trusts, “proven and trusted frameworks and agreements” that will “ensure exchanges [of data] are secure and mutually beneficial” by promoting trust in the use of data for AI. This paper defends the following thesis: A data trust works within the law to provide ethical, architectural and governance support for trustworthy data processing. Data trusts are therefore both constraining and liberating. They constrain: they respect current law, so they cannot render currently illegal actions legal. They are intended to increase trust, and so they will typically act as further constraints on data processors, adding the constraints of trustworthiness to those of law. Yet they also liberate: if data processors are perceived as trustworthy, they will get improved access to data. The paper addresses the areas of: trust and trustworthiness; ethics; architecture; legal status.”
Shannon Lefaivre, Brendan Behan, Anthony Vaccarino, Kenneth Evans, Moyez Dharsee, Tom Gee, Costa Dafnas, Tom Mikkelsen, Elizabeth Theriault
“The aim of this report is to highlight these best practices and develop a key open resource which may be referenced during the development of similar open science initiatives.”
Published in Frontiers in Genetics in 2019
Journal Article
In Practice
Governance and Operations
Published in Frontiers in Genetics in 2019
André Corrêa d'Almeida, Caroline McHeffey, Nilda Mesa, Arnaud Sahuguet, Stefaan G. Verhulst, Andrew Young, Andrew J. Zahuranec
“This inaugural edition of New Lab’s Research Journal (i) describes the process of developing and launching New Lab’s The Circular City program, (ii) introduces circular city data as the first exploration of this program, and (iii) investigates and methodologically tests the value of circular data applied to three urban challenges: economic development, mobility, and resilience.”
Published in New Lab in 2019
Journal Article
General
Benefits
Incentives
Risks and Challenges
In Practice
Published in New Lab in 2019
From the introduction: “Coming out of ten months of applied, participatory, and multidisciplinary research, this journal presents:
A. “One case study developed to document and explain how the program was conceived, designed, and implemented, with the goal of offering lessons for scalability at New Lab and replicability in other cities around the world. The key questions explored in the case study are:
B. “Three research papers developed to investigate three urban challenges:
Ron S. Jarmin
“In this essay, [Jarmin] describe[s] some work underway that hints at what 21st century official economic measurement will look like and offer some preliminary comments on what is needed to get there.”
Published in Journal of Economic Perspectives in 2019
Journal Article
Benefits
Published in Journal of Economic Perspectives in 2019
Geoff Mulgan, Vincent Straub
“Here we attempt to open up part of the debate on data governance; suggesting how to address the twin goals of greater control for citizens, and greater value for the public as a whole. We argue that there are a variety of different solutions that need to be designed, and experimented with.”
Published in Nesta in 2019
Essay
Data Responsibility
Published in Nesta in 2019
“This paper argues that new institutions—an ecosystem of trust—are needed to ensure that uses of data are trusted and trustworthy. It advocates the creation of different kinds of data trust to fill this gap. It argues:
A. Martinez, A. C. Rainie
“The project described here reviewed publicly available data sharing agreements that focus on research with Indigenous nations and communities in the United States. […] The results detail how Indigenous peoples currently use data sharing agreements and potential areas of expansion for language to include in data sharing agreements as Indigenous peoples address the research needs of their communities and the protection of community and cultural data.”
Published in American Geophysical Union in 2018
Journal Article
In Practice
Governance and Operations
Published in American Geophysical Union in 2018
Abstract: “Indigenous communities and scholars have been influencing a shift in participation and inclusion in academic and agency research over the past two decades. As a response, Indigenous peoples are increasingly asking research questions and developing their own studies rooted in their cultural values. They use the study results to rebuild their communities and to protect their lands. This process of Indigenous-driven research has led to partnering with academic institutions, establishing research review boards, and entering into data sharing agreements to protect environmental data, community information, and local and traditional knowledges.Data sharing agreements provide insight into how Indigenous nations are addressing the key areas of data collection, ownership, application, storage, and the potential for data reuse in the future. By understanding this mainstream data governance mechanism, how they have been applied, and how they have been used in the past, we aim to describe how Indigenous nations and communities negotiate data protection and control with researchers.
“The project described here reviewed publicly available data sharing agreements that focus on research with Indigenous nations and communities in the United States. We utilized qualitative analysis methods to identify specific areas of focus in the data sharing agreements, whether or not traditional or cultural values were included in the language of the data sharing agreements, and how the agreements defined data. The results detail how Indigenous peoples currently use data sharing agreements and potential areas of expansion for language to include in data sharing agreements as Indigenous peoples address the research needs of their communities and the protection of community and cultural data.”
Deborah Mascalzoni, et al
“[D]ata deposition requirements and research repositories will have to adapt to the legal and ethical landscape of the GDPR. Noncompliance with the GDPR may incur administrative fines of up to €20 million, and the regulation is enforced by data protection authorities in each EU nation.”
Published in Annals of Internal Medicine in 2019
Essay
Risks and Challenges
Data Responsibility
Governance and Operations
Published in Annals of Internal Medicine in 2019
Stefaan G. Verhulst
Testimony before New York City Council Committee on Technology and the Commission on Public Information and Communication (COPIC).
Published in The GovLab in 2019
Essay
General
Published in The GovLab in 2019
Steve MacFeely
“This paper examines the opportunities and challenges presented by big data for compiling indicators to support Agenda 2030.”
Published in Global Policy Volume in 2019
Journal Article
General
Benefits
Published in Global Policy Volume in 2019
Daniel Kondor, Behrooz Hashemian, Yves-Alexandre de Montjoye, Carlo Ratti
“Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people’s mobility traces, coming from a mobile network operator and transportation smart card usage.”
Published in IEEE Transactions on Big Data in 2018
Journal Article
General
Benefits
Published in IEEE Transactions on Big Data in 2018
From the abstract:
Yves-Alexandre de Montjoye, et al
“The breadcrumbs we leave behind when using our mobile phones—who somebody calls, for how long, and from where—contain unprecedented insights about us and our societies. Researchers have compared the recent availability of large-scale behavioral datasets, such as the ones generated by mobile phones, to the invention of the microscope, giving rise to the new field of computational social science.”
Published in Nature in 2018
Journal Article
Risks and Challenges
Data Responsibility
Governance and Operations
Published in Nature in 2018
GloPID-R
According to the Executive Summary, this “roadmap aims to accelerate effective data sharing by highlighting measures GloPID-R research funders can take to improve research data sharing by their grantees and to advocate for increased research and public health data sharing more widely.”
Published in GloPID-R
Report
General
Incentives
Operations
Published in GloPID-R
Recommendations: