Professor Bart Baesens is a professor at KU Leuven (Belgium), and a lecturer at the University of Southampton (United Kingdom). He has done extensive research on analytics, customer relationship management, web analytics, fraud detection, and credit risk management. His findings have been published in well-known international journals (e.g. Machine Learning, Management Science, IEEE Transactions on Neural Networks, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Evolutionary Computation, Journal of Machine Learning Research, …) and presented at international top conferences. He is also author of the books Credit Risk Management: Basic Concepts, published by Oxford University Press in 2008; and Analytics in a Big Data World published by Wiley in 2014. His research is summarized at www.dataminingapps.com. He also regularly tutors, advises and provides consulting support to international firms with respect to their analytics and credit risk management strategy.
Q: Why did you decide to write a book on Analytics in a Big Data World?
During the past few years, I have been teaching many courses and doing a lot of consulting around the globe on the topic of Big Data & Analytics. Having talked to many business professionals and done lots of projects, I wanted to write a book which is relevant to decisions that all businesses will need to make in the coming years. As the number of practical applications for data skyrockets, learning how to extract business value from big data becomes a competitive requirement. Big data sets are assets that can be leveraged quickly and inexpensively, if tackled wisely! My book Analytics in a Big Data World addresses this seemingly Herculean task of coming to grips with multiple channels of data and sculpting them into quantifiable value. This book is for business professionals who want a focused, practical approach to big and data analytics. I hereby focus on case studies, real-world application, and steps for implementation, using theory and mathematical formulas only when strictly necessary!
Q: What best practices do you recommend, when starting and working on enterprise analytics projects?
In fact, I would recommend a couple of things. The first one is to set up a multidisciplinary analytical team. Analytics touches upon every aspect of a business setting and it is of key importance that these are all appropriately represented in the team. In other words, the team should be made up of database administrators, business experts, legal experts, data scientists and tool vendors. Next, involvement of senior management is important. The strategic impact of analytics is now bigger than ever before and it is crucial that senior management is aware of all analytical efforts throughout the enterprise. This will allow them to install the right logistics and procedures to fully leverage the power of analytics. It will also facilitate the coordination of all analytical efforts enterprise-wide instead of working with isolated islands of analytical expertise. The creation of a company specific center of analytical excellence, potentially headed by a CAO (Chief Analytics Officer), could be an interesting competitive asset. Finally, continuous education is highly recommended. The world is changing at a rapid pace and this also applies to analytics. New techniques are being developed on an on-going basis and it is important to keep up with these evolutions in order to see how they can be used to can create competitive advantage.
Q: What are the bottlenecks and other issues that prevent analytic projects from reaching their full potential?
A first important bottleneck relates to the key ingredient of any analytical model: data! In order to have successful analytical models, data should be of good quality. This is commonly referred to as the GIGO principle: Garbage In, Garbage out, or bad data yields bad models. Hence, every company should continuously invest in diagnosing data quality issues and come up with ways to improve it using e.g. master data management programs. Another bottleneck is the focus on the business. Far too often, analytical models are being developed which look nice at first sight but actually do not solve the business problem. E.g., a fraud detection model can be highly performing in statistical sense, but should also allow to detect fraud as quickly as possible. The latter is often referred to as operational efficiency. Hence, besides statistical performance, also operational efficiency, economical cost and interpretability should be taken into account when gauging the performance of an analytical model.
Q: What are the most important trends and challenges in analytics?
Well, let me discuss some trends which I consider important based upon both my industry and research experience. First of all, I believe analytics is about being actionable and simple. It’s not about complex numbers, black box models or statistics. In our analytics projects, we have found that simple analytical models (e.g. regression models, decision trees) typically perform well in many settings such as credit scoring, response and retention modeling, customer lifetime value modeling and segmentation. Hence, the best investment firms can make to boost the performance of their analytical models is not by buying expensive software and trying out complex techniques, but rather by investing in data and improving data quality! That’s why in my book I also devoted a whole section to this topic. From a technical perspective, next to the analytical models themselves, firms should also thoroughly consider how to appropriately monitor, backtest and integrate these models with their other applications such as advertisement, new product development, next best offer campaigns, …. Closing this loop poses quite a bit of challenges which are also addressed in the book! Finally, data and analytics is everywhere and all around. It speaks for itself that this creates huge challenges from a privacy perspective. Firstly, data about individuals can be collected without these individuals being aware about it. Secondly, people may be aware that data is collected about them, but have no say in how the data is being analyzed and used. Hence, regulatory authorities have to think about new regulations, whereas researchers should focus more on the development of privacy friendly analytical techniques.
Q: What is the most efficient way to become a real data scientist, based on your career path and history?
Let me again answer this question based upon my consulting and research experience. A good data scientist should possess various skills. A first important one is programming. Although many powerful analytical software tools exist, real life data typically requires tailored processing and this can only be done by developing customized software programs. Next, a data scientist should have a good quantitative background in statistics, optimization, machine learning, … This will allow him/her to immediately recognize whether a given data set could be analysed using e.g. predictive, descriptive or even social network analytics. Finally, a data scientists should be creative and communicative. Data scientists never work individually, but are part of a team. Communicating the results of analytical models in a user-friendly and transparent way to the various stakeholders involved is hence also very important.
Q: What are your top 3 research projects in analytics at this moment?
First, I would say fraud detection. Fraud is an important phenomenon encountered in various settings. Popular examples are insurance fraud, credit card fraud, social security fraud, identity theft, … In our research, we have recently developed some new exciting algorithms based upon social networks to detect fraud. We have benchmarked our techniques for both social security and credit card fraud detection and found some amazing results. Fraud is really a social phenomenon! Another important topic we currently work on is survival analysis. This is a set of (statistical) techniques aimed at predicting the timing of events. In fact, many popular classification problems (e.g. credit scoring, churn detection, response modeling, …) have a time dimension associated with them (e.g. when a customer defaults, churns, responds, …). Using survival analysis techniques, it becomes possible to predict when customers default, churn, respond, … which is typically very useful information for e.g. profit scoring or customer lifetime value calculation. Finally, we do a lot of research on process analytics. Today’s organizations use a plethora of information systems to support their business processes. Such support systems often record and log an abundance of data, containing a variety of events that can be linked back to the occurrence of a task in an originating business process. Process analytics starts from these event logs as the cornerstone of analysis and aims to derive knowledge to model, improve and extend operational processes “as they happen” in the organization. I believe this will be a very popular application of analytics in the next 5 years!