Book Site: http://bigdataminingbook.info
Target Audience: This book is intended for a variety of audiences:
(1) There are many people in the technology, science, and business disciplines who are curious to learn about big data analytics in a broad sense, combined with some historical perspective. They may intend to enter the big data market and play a role. For this group, the book provides an overview of many relevant topics. College and high school students who have interest in science and math, and are contemplating about what to pursue as a career, will also find the book helpful.
(2) For the executives, business managers, and sales staff who also have an interest in technology, believe in the importance of analytics, and want to understand big data analytics beyond the buzzwords, this book provides a good overview and a deeper introduction of the relevant topics.
(3) Those in classic organizations—at any vertical and level— who either manage or consume data find this book helpful in grasping the important topics in big data analytics and its potential impact in their
(4) Those in IT benefit from this book by learning about the challenges of the data consumers: data miners/scientists, data analysts, and other business users. Often the perspectives of IT and analytics users are different on how data is to be managed and consumed.
(5) Business analysts can learn about the different big data technologies and how it may impact what they do today.
(6) Statisticians typically use a narrow set of statistical tools and usually work on a narrow set of business problems depending on their industry. This book points to many other frontiers in which statisticians can continue to play important roles.
(7) Since the main focus of the book is high-performance data mining and contrasting it with big data analytics in terms of commonalities and differences, data miners and machine learning practitioners gain a holistic view of how the two relate.
(8) Those interested in data science gain from the historical viewpoint of the book since the practice of data science—as opposed to the name itself—has existed for a long time. Big data revolution has significantly helped create awareness about analytics and increased the need for data science professionals.
Intro: The use of machine learning and data mining to create value from corporate or public data is nothing new. It is not the first time that these technologies are in the spotlight. Many remember the late '80s and the early '90s when machine learning techniques-in particular neural networks-had become very popular. Data mining was at a rise. There were talks everywhere about advanced analysis of data for decision making. Even the popular android character in "Star Trek: The Next Generation" had been named appropriately as "Data." Data mining science has been the cornerstone of many data products and applications for more than two decades, e.g., in finance and retail. Credit scores have been in use for decades to assess credit worthiness of people when applying for credit or loan. Sophisticated real-time fraud scores based on individual's transaction spending patterns have been used since early '90s to protect credit cardholders from a variety of fraud schemes. However, the popularity of web products from the likes of Google, Linked-in, Amazon, and Facebook has helped analytics become a household name. While a decade ago, the masses did not know how their detailed data were being used by corporations for decision making, today they are fully aware of that fact. Many people, especially the millennial generation, voluntarily provide detailed information about themselves. Today people know that any mouse click they generate, any comment they write, any transaction they perform, and any location they go to, may be captured and analyzed for some business purpose.
Every new technology comes with lots of hype and many new buzzwords. Often, fact and fiction get mixed-up making it impossible for outsiders to assess the technology's true relevance. I wrote this book to provide an objective view of analytics trends today. I have written it in complete independence, and solely as a personal passion. As a result, the views expressed in this book are those of the author and do not necessarily represent the views of, and should not be attributed to, any vendor or employer.
Due to the exponential growth of data, today there is an ever increasing need to process and analyze big data. High-performance computing architectures have been devised to address the need for handling big data, not only from a transaction processing standpoint but also from a tactical and strategic analytics viewpoint. The success of big data analytics in large web companies has created a rush toward understanding the impact of new big data technologies in classic analytics environments that already employ a multitude of legacy analytics technologies. There is a wide variety of readings about big data, high-performance computing for analytics, massively parallel processing (MPP) databases, Hadoop and its ecosystem, algorithms for big data, in-memory databases, implementation of machine learning algorithms for big data platforms, and big data analytics. However, none of these readings provides an overview of these topics in a single document. The objective of this book is to provide a historical and comprehensive view of the recent trend toward high-performance computing technologies, especially as it relates to big data analytics and high-performance data mining. The book also emphasizes the impact of big data on requiring a rethinking of every aspect of the analytics life cycle, from data management, to data mining and analysis, to deployment.
As a result of interactions with different stakeholders in classic organizations, I realized there was a need for a more holistic view of big data analytics' impact across classic organizations, and also the impact of high-performance computing techniques on legacy data mining. Whether you are an executive, manager, data scientist, analyst, sales or IT staff, the holistic and broad overview provided in the book will help in grasping the important topics in big data analytics and its potential impact in your organizations.