The United Nations and World Bank are part of a growing globally-oriented Open Data movement. After World Bank meetings last week in Washington, DC, those involved in defining the international Big Data Revolution asked for inputs. This is what I posted:
There is no doubt that big data offers significant and exponential potential for international development work across multiple fields. That this Independent Advisory Group has been formed is clear evidence that the United Nations is ready to see how big data can help, an exciting initiative for any of us who have straddled the worlds between international development and big data technologies in their life's work.
In this initiative, I believe we are hoping for and seeking paths for a robust collaboration that brings local, regional, and global community groups, national governments, and international representatives together with the gamut of current and future big data technologies that will enable them to do better, live longer, be happier, or otherwise make a difference. There is no end to the technologies, platforms, or research initiatives that are available to enable robust use of big data within international development. And likewise there is no end of international development use cases to which big data technologies could be fit.
But as we move forward, we must remember that much of our success will fall upon a common understanding of the potential and current limitations of big data. To that end, I would like to offer four key insights from my last 10 years of working on the technical side of massive open and public data projects for use in decision making circles.
1) Data Quality and Assurance Matters
It is commonly known that roughly 80% of a data scientist's, engineer's, or analyst's time can be spent cleaning up data so it can be used in conjunction with a software platform to produce a compelling story. Of course, resolving this issue varies greatly depending on the size of the project or whether the project is being produced from a single license platform from a laptop or an enterprise-level solution within which multiple users are networked. It could also depend on the level of technical skill in databases and Extract, Transform, and Load (ETL) processes, the mechanisms for otherwise ingesting and aggregating data, the access and security protocols in place for assuring that data is protected, and the multitude of other issues involved in using big data to create good narratives. But bottom line, if the data isn't workable to begin with, the user will loose trust in the data and in the process, and we will end up with a phenomena which is frequently referred to as "garbage in, garbage out". This benefits no one.
2) Data Provenance Matters
In many data science circles, and particularly in many data science labs, the source of the data matters not. Those working on the methods and algorithms to make the data sciences come to life don't care because it is not important to making the technology operational. As a result, few platforms provide the means for documenting data provenance from the inception of a data science project. While it doesn't matter in the research world, for the purposes of the governments and international organizations who would use big data outputs to make policy, it matters greatly. Indeed, many big data projects have failed to be adopted in policy making circles because of the lack of understanding among big data platform developers of how important it is to be able to trace back to a source and determine its reliability when making life-affecting decisions.
3) Context and Use Cases Matter
There is no one size fits all approach to big data, especially when it comes to international development. And much of the success of adopting big data approaches will depend on providing experts and world citizens with the right tools to match a need. For example, current analytics software and data visualization packages are great for producing nationalized, aggregated demographic type of data or metadata outputs and beautiful visuals. But they will not likely be as useful for tracking the moment by moment of humanitarian disaster relief as a geo-spatial mapping platform, tracking agricultural price fluctuations in a micro community as crowd sourcing solutions, or providing a mechanism for the International Criminal Court to upload and sift through the exorbitant amounts of physical data they gather during a field mission to find the nuggets relevant to their cases. These are all different types of big data problems, each of which requires an understanding and formalization of existing processes, what parts can be automated, which will remain manual until advances in big data enabling technology can catch up to needs, and where humans will continuously need to intercede in the process.
4) Flexible Open Data Standards Matter
As discussions move forward on the foundations for a legal framework for agreement, I urge stakeholders to assure that whatever outcomes produced do not restrict innovations in big data technologies. The international community is ready to adopt technologies as they stand now, but there is a lot that big data innovators can learn about how the international community uses and needs to use big data. Aside from some forays into international business marketing, international data and how it is used in different countries and within different domains is a largely unexplored area. Certainly, the technology that exists today can provide significant insight, but there is so much more to learn. To that end, I hope that whatever standards are developed take into account that the standards that will assure legal, international collaboration are not necessarily the same as those that will enable collaboration for the purpose of technological innovation. Big Data innovators will need open standards for data formats, accessing data, developing APIs, and other considerations that fall under the auspices of GNU Free Documentation License developed by the Free Software Foundation or a Creative Commons license.
The pendulum has swung dear readers. Big Data technologies have reached a moment where inputs from international stakeholders can help fuel a new generation of advancements, many of which have yet to be defined. And the opportunities presented in Open Data initiatives such as the UN Secretary General's Independent Expert Advisor Group are in a great position to help frame the myriad of international problems needing practical solutions. I look forward to a process that enables the adoption of big data technologies and open data initiatives within broader international circles and to the Big Data innovations that could emerge as a result.