Contributed Article from MIT Professor Devavrat Shah.
According to analyst firm IDC, the amount of data being generated in our digital universe is expected to double every two years, producing more than 40,000 exabytes by 2020. And while this seemingly endless torrent of data holds enormous potential for organizations, it also presents incredible challenges, the most critical of which is figuring out how to effectively convert the unprecedented volumes and varieties of data into meaningful business insights.
For those that harness it successfully, big data has the potential to spark innovation, support decision making, predict unmet needs and inform the development of new products, services and revenue sources. However, it will also require organizations to transform into efficient “Data Science Machines” with not only the tools, but also the skills to perform data processing and computation at a massive scale.
How can organizations address this shortcoming and successfully convert big data into real-world decisions? Below are some key techniques and emerging technologies that will be vital to success in our new big data world.
Provide Ongoing Education: Organizations in every industry need to know and understand that the key to converting big data into meaningful information begins with a team of skilled professionals who are educated in all disciplines to be both data scientists and statisticians. Unfortunately, the value and importance of professional education continues to be underrated and as a result, many businesses are likely miss out on the benefits big data has to offer. To improve their chances of success, organizations must invest in ongoing education through institutions with multidisciplinary programs that include elements from engineering, mathematical sciences and social sciences.
Focus on Infrastructure: Over the new few years, the volume and variety of data will continue to increase, which will fuel the demand for information to be utilized in real-time. This in turn will require more agile systems with better computational and data storage architecture. Organizations need to prepare and build for the future by focusing on three key elements:
This is where business analysts and data scientists in companies of all sizes can benefit from computer languages such as Python. Python and its ecosystem can be a powerful tool for data structure, manipulation, query, analysis and visualization.
Make Interfaces & Algorithms Matter: Data-processing algorithms are what transform the raw data collected into valuable insights and decisions. But appropriate models are needed to connect that data to decision variables. Standard interfaces or a generic computation and storage architecture simply won’t survive the impending big data deluge. Forward-looking organizations looking to enhance their ability to understand and analyze large data sets are now developing personalized data-processing architecture with algorithms that are not only capable of predicting the unknowns, but also deliver insights via user-friendly interfaces.
While it is easy to get caught up in the hype around big data, the reality is that turning data into actionable insights poses huge challenges, the least of which are technological. Over the past few decades, we have built infrastructure that can sort and process massive amounts of data. However, we still lack the ability to stitch together the various pieces to deliver meaningful insight. Operationally, it will require building a data-processing systems that can operation at scale and in real time with three high-level components: interfaces, infrastructure, and personalized algorithms. But the key to building this type of system relies primarily on having access to a skilled team of data scientists and statisticians who are trained to work with modern tools and can identify appropriate methods and models and design human-friendly interfaces. Organizations that focus on these components and invest in educating their workforce can successfully convert all forms of data into decisions, and as a result will likely outperform their competition in revenue, growth and efficiency.
Devavrat Shah, co-director of the Data Science: Data to Insights course, is a professor in MIT’s Department of Electrical Engineering and Computer Science, director of the SDSC, and a core faculty member at the IDSS. He is also a member of MIT’s Laboratory for Information and Decision Systems (LIDS) and the Operations Research Center (ORC).