Data Scientists 4.0
The 4th Industrial Revolution was publicly announced in 2011 at the Hannover Fair (1). Since then, many resources have been appeared around the so called Industry 4.0. Elements such as the Digital Twins, Industrial Internet of Things or Cyber Physical Systems have came into the scene as unseparated elements, providing the necessary ingredients for a paradigm shift in many manufacturing areas. Over all these components, the most pioneering are related with the predictive analytics and artificial intelligence, directly applied to real use cases without resolution just a few years ago. The predictive maintenance, the artificial vision or the pattern recognition mechanisms to identify potential failures on real time are some examples of many use cases that are being applied in the new industry.
Therefore the data science is becoming a valuable tool for the industry in order to transform the information into knowledge using algorithms created in the 60’s (2). The data scientists are playing a fundamental role in this new way to understand the enterprise activity. Nevertheless, a new challenge appears when the data engineers must face a problem in the industry: they have to deal with unknown processes, procedures, operations, science and specific casuistries where lay the answers that they are looking for. Thus the data scientists come to be such as detectives that are developing their skills in different crime scenarios in each project. A new set of multidisciplinary skills must be added to the data scientist expertise to provide the required added value in the specialized analysis for each problem.
Every industry requires different machine learning and deep learning systems to get benefits from the available information. The retail sector is totally customer oriented and the opportunity is behind the buyer preferences and needs, the just in time delivery and the stock minimization. The natural language processing and media analytics are tools wide accepted to make attractive suggestions to the final consumer. Considering the fintech requirements, the data scientist must adapt specific algorithms for fraud detection, aggregation and anomaly detection of tones of commercial transactions in real time or for insurance policies based on a large variety of medical and health risk indicators. Nevertheless the manufacturing activity defined by the Industry 4.0 is even more complex. For instance, the automotive sector interacts with suppliers and with the assembly chain (humans and robots) reducing the warehouse costs while the market demand must be satisfied with the latest technology in self driving support, intelligent security and comfort elements.
The biotech and pharmaceutical industries are a special case among the wide manufacturing spectra. The medicine and drug fabrication adds on top of the existing complexity the regulatory requirements that brake in somehow the technological advances and the innovative paradigm shifts. The reason is that the human security must always be the first priority in this sector. When processes are designed and implemented in the factory, they are subject to strong quality controls based in many cases, in the assumption that the variability must be minimized. Under this perspective it is difficult to validate operations that supply results based on algorithms like the Random Forest. Despite this reality, it’s important to remark that some machine learning algorithms have been accepted by the European Pharmacopoeia as valid chemometric techniques for processing analytical data set (3).
In the other side of the scenario, the Administration and the Regulatory Institutions are cautiously observing and establishing the first approach to the set of laws that must govern the data, the processes and the results obtained through the AI algorithms. Thus appear a new generation of data scientist and technological experts with a clear orientation for regulatory advising or legislators consultancy, initiating the path to a consensual data management. When results coming from the AI algorithmic are used in aeronautics, automotive, medical or pharmaceutical activities with direct impact on people’s health, the full process must be surveilled in order to grant the personal integrity.
The Smart Industry concept raises as a pushed evolution from the society to the manufacturing and business sphere. The tons of generated data by human beings and devices forced the emergence of specific technology breakthroughs to manage and process the heterogeneous information available in the network. Images, geolocations, social media data, traffic status, forecasts, stock trends or documents must be accessible in real time as well as suggestions related with the topic that is being looking for. This continuous flow of knowledge ready to be consumed in real time has became such as a commodity that is not possible to renounce to. The peculiar characteristics provided by the Industry 4.0 require that the specialists that have to supply goods to a society used to consume knowledge, take care also about a new product that is being wrapped with the data science sauce as added value into the distributed goods. The data scientists 4.0 must know the algorithms to apply on top of the data ecosystem and this fact requires to be multidisciplinar and specialized at the same time. They have the huge mission about how to satisfy the growing demand of data science services applied to everywhere.
(1) “Industrie 4.0: Mit dem Internet der Dinge auf dem Weg zur 4. indus…. Vdi-nachrichten.com (in German). 1 April 2011. Retrieved 2018-03-04.
(2) The name artificial intelligence was used for the first time in 1956 as the topic of the second Dartmouth Conference, organized by John McCarthy. In 1963 Edward Feigenbaum and Julian Feldman published Computers and Thought, the first collection of articles about artificial intelligence.
(3) European Pharmacopoeia 9.0. Chapter 5.21, Chemometric methods applied to analytical data. 04/2016:52100