Gaining a clear sense of the problem you are seeking to solve is an essential first step in establishing a successful data collaborative. While defining the problem space might seem like a straightforward proposition, achieving the level of precision needed for a well-targeted data collaborative requires an in-depth assessment of the problem space. Zeroing in on the problem can be difficult. To pinpoint an actionable problem with a specific data collaboration solution rather than a vague issue, requires writing and re-writing a one page Problem Statement. The Problem Statement articulates the problem with precision and makes assumptions explicit. It asks and answers why the problem has not been solved yet. It might also address who is harmed by the existence of the problem and why and what are the root causes of the problem. It might take several drafts to strip down to a statement of an actionable problem and its causes. But undertaking this exercise – especially doing so collaboratively with the participation of the key stakeholders involved in the issue – will help to build consensus behind the implementation of the project.
Research and practice in data collaboratives point to a number of societal benefits arising from the cross-sector sharing of data. After clearly defining the problem that a data collaborative will serve to address, organizers should look to gain specificity on the specific benefit a data collaborative could offer. Without assessing the value of the data collaborative, tradeoffs cannot later be measured. To understand whether the use of a corporate data is worthwhile despite the risks involved, and to find the proper steps to take that mitigate risk, it is important to evaluate the context for the use of the data. This might involve an assessment of the urgency of having access to the data. Is there a disaster relief component or other time sensitivity? If it is hard to define the problem or hard to define the value, it will be impossible to evaluate the success of any data project. To justify the risks and the potential liability that arises from using and analyzing corporate data requires having a clearly articulated benefit that can be measured.
What is the intended societal benefit of the data collaborative?
Cross-sector data sharing requires not only the bringing together of diverse datasets, but also collaboration between people and organizations with different skills and institutional norms. An upfront understanding of how human and institutional capacity and cultures either do or do not mesh can help to define optimal roles and responsibilities and identify capacity gaps that need to be filled (through additional partnerships or other mechanisms).
In many ways, data collaboratives are means for filling gaps in existing institutional data supplies. In order to fill such gaps, organizers should conduct an upfront audit of both the existing internal data supply and the potential supply of data existing in other sectors. Such an audit can provide insight regarding the data relevant to your project, who has it, who needs it, and how it can be used to tackle the problem. Data audits can also help inform and prioritize outreach to external data holders and ensure that newly accessible datasets are well-positioned for filling the most important data gaps. Too often the assumption is that all data must be made available when only a few data points are needed. As part of the data inventory, other considerations may include assessing the reputation and reliability of those who have the data and those who want it in terms of security and data responsibility. In and of itself, undertaking the data inventory helps to mitigate risk by helping to develop strategies for how to use the data. A more detailed analysis by interdisciplinary experts may help to identify technical or procedural workarounds for seemingly difficult or expensive tasks.
When and why corporations contribute their data differs according to the context in which the data is being requested or shared, the question access to their data may answer and the corporate and legal culture of the firm. Different corporations also have different views regarding the expected benefits and risks from sharing their data. As such, when firms extend themselves and share their data they seek to satisfy a variety of motivations.
Data collaboratives exist in a number of forms, each of which is better positioned for a certain problem or date type. The current field of practice shows that there are currently six main types of data collaboratives, the preceding steps in this canvas should help to make clear which of the following collaboration mechanisms is best-suited for the opportunity at hand.
Given the insights gained during previous steps, which data collaborative type is best suited to the problem at hand?
The collection, processing, sharing, analysis and use of data introduce a number of risks and challenges for stakeholders involved in data collaboratives. Rather than seeking to mitigate the realized harms arising from those risks after the fact, stakeholders should seek to understand the risks at every stage of the data lifecycle in order to develop well-targeted strategies for mitigating them.
Understanding that risks can be cumulative – i.e., that risks at the collection stage can grow and compound at later stages of the data lifecycle.
InBloom aimed to store, and aggregate student data for states and districts but was met with privacy concerns regarding the use and storage of personally identifiable information. The firm shuttered in 2014.
Armed with a better understanding of the risks present in a data collaborative, organizers can develop strategies and responsibility frameworks to help mitigate those risks before they have real-world consequences.
Beyond risks related to the data lifecycle, data collaboratives introduce questions and uncertainties around roles and responsibilities, ownership, intellectual property and other concerns. The creation of a list of agreed-upon terms and conditions can ensure clarity regarding such questions and help to avoid.
Establishing a data collaborative requires a number of upfront efforts across stakeholder groups. The many decisions to be made and responsibilities present in such an arrangement, however, do not end at the implementation stage. An agreed upon governance structure for the lifespan of the data collaborative can help to ensure that the processes for making important decisions – whether, for example, related to new uses for datasets or unanticipated risks coming into view – are clearly defined and understood by all parties. In addition, for the effort to be seen as legitimate, the process of developing data collaborative policies needs to be collaboratively engaging and consulting with a variety of groups, including both private sector and impacted citizens. Such consultation is also part of identifying potential benefits and these steps can be brought together even though we distinguish between them here.
Upon completing the preceding steps, stakeholders in the data collaborative should possess a clear understanding of how the arrangement can be put into practice. With this upfront knowledge, the operational aspects of an effective data collaborative can be defined – with the understanding that specifics can and should be iterated upon as needed going forward.
Establishing a data collaborative is often less expensive than creating the mechanisms to generate and collect data that is already held elsewhere, but there are often costs involved – including human capital costs for data scientists and stewards. An upfront and realistic assessment of the likely costs of the arrangement can inform strategic funding decisions – whether a tiered pricing model for a B2C, B2B or B2G data collaborative or seeking support from philanthropic and/or governmental grantmakers.
Especially for data collaboratives where corporate data providers were incentivized to participate based on reputational benefits, communicating the objectives and (intended) impacts of the arrangement to the public can be important. In many cases, a high level of specificity in public communications may not be desirable, but promoting the existence of the data collaborative can spur interest and engagement among target audiences (including funders).
Some data collaboratives are primarily or exclusively focused on improving the data capacity of participating institutions. Others, however, have additional, external audiences or user groups. Clearly defining the audience(s) and their needs can enable stakeholders to craft an information sharing approach that is well-suited to maximizing the usefulness of newly created data-driven offerings.
Building on the work done at the problem definition and data audit stage, defining the baseline of current practice will ensure that the impact of a data collaborative (or the lack thereof) can be meaningfully assessed. Without an understanding of the effectiveness of current efforts to address the problem, measuring success and iterating on new data practices will be challenging.
In order to measure progress throughout the lifespan of the data collaborative, ensure that mechanisms are in place for the consistent generation of data enabling assessment against the baseline. While much of the work of impact assessment is done at the start or conclusion of such an initiative, upfront efforts to create or gain access to data about progress throughout can help to inform iteration and improve the likelihood of success.
After an agreed upon period of time, stakeholders should conduct a detailed impact assessment to determine the real-world impacts of the data collaborative.