After a few biostatistics classes, I began fitting my first logistic regression model using my physician friend’s data on tumors excised from skin cancer patients. I realized that although we were very clear about the dependent variable we were trying to predict – a certain feature of the tumor – I really did not know how to pick the independent…Continue
Summary: A little history lesson about all the different names by which the field of data science has been called, and why, whatever you call it, it’s all the same thing.
Our profession of…Continue
Added by William Vorhies on December 4, 2019 at 3:12pm — No Comments
Ecommerce sites generate tons of web server log data which can provide valuable insights through analysis. For example, if we know which users are more likely to buy a product, we can perform targeted marketing, improve relevant product placement on our site and lift conversion rates. However, raw web logs are often enormous and messy so preparing the data to train a predictive model is time consuming for data scientists.…
Added by Ayumi Owada on July 18, 2019 at 2:00pm — No Comments
I thought I would follow on my first blog posting with a follow-up on a claim in the post that going returns followed a truncated Cauchy distribution in three ways. The first way was to describe a proof and empirical evidence to support it in a population study. The second was to discuss the consequences by performing simulations so that financial modelers using things such as the Fama-French, CAPM or APT would understand the full consequences of that decision. The third was to discuss…Continue
Added by David Harris on December 27, 2018 at 7:32pm — No Comments
In 1963 Benoit Mandelbrot published an article called “The Variation of Certain Speculative Prices.” It is a response to the forming theory that would become Modern Portfolio Theory. Oversimplified, Mandelbrot’s argument could be summarized as “if this is your theory, then this cannot be your data, and this is your data.” This issue has haunted models such as Black-Scholes, the CAPM, the APT and Fama-French. None of them have survived validation tests. Indeed, a good argument can be…Continue
Added by David Harris on December 10, 2018 at 2:00pm — No Comments
For most businesses, machine learning seems close to rocket science, appearing expensive and talent demanding. And, if you’re aiming at building another Netflix recommendation system, it really is. But the trend of making everything-as-a-service has affected this sophisticated sphere, too. You can jump-start an ML initiative without much investment, which would be the right move if you are new to data science and just want to grab the low hanging fruit.
One of ML's…Continue
Added by Olexander Kolisnykov on September 18, 2018 at 2:52am — No Comments
The insurance industry – one of the least digitalized – is not surprisingly one of the most ineffective segments of the financial services industry. Internal business processes are often duplicated, bureaucratized, and time-consuming. As the ubiquity of machine learning and artificial intelligence systems increases, they have the potential to automate operations in insurance companies thereby cutting costs and increasing productivity. However, organizations have plenty of reasons to resist…Continue
Added by Denys Harnat on August 28, 2018 at 3:35am — No Comments
The best trained soldiers can’t fulfill their mission empty-handed. Data scientists have their own weapons — machine learning (ML) software. There is already a cornucopia of articles listing reliable machine learning tools with in-depth descriptions of their functionality. Our goal, however, was to get the feedback of industry experts.
And that’s why we interviewed data science practitioners — gurus, really —regarding the useful tools they…Continue
Added by Kateryna Lytvynova on July 13, 2018 at 2:00am — No Comments
R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises.
Learn the fundamentals of data analysis in the second edition of Data Analysis with R, authored by data scientist…Continue
Added by Packt Publishing on May 8, 2018 at 10:30pm — No Comments
Added by Peter Bruce on January 14, 2018 at 11:00am — No Comments
Finding out the difference between data scientists, data engineers, software engineers, and statisticians can be confusing and complicated. While all of them are linked to data in a way, there is an underlying difference between the work they do and manage.
The growth of data and its usage across…Continue
Summary: Which is more important, the data or the algorithms? This chicken and egg question led me to realize that it’s the data, and specifically the way we store and process the data that has dominated data science over the last 10 years. And it all leads back to Hadoop.
Today, data scientists are generally divided among two languages — some prefer R, some prefer Python. I will not try to explain in this article which one is…Continue
Added by Marija Zoldin on September 26, 2017 at 10:00am — No Comments
Data is important. It is not a secret for anybody. We can even paraphrase famous saying mentioning that “who owns the data, owns the world”. And if you are a business person, you should know like no one else. Your activity can be changed for better if you use Big Data sources. Sales growth, clever marketing strategy - you can achieve it using Big Data. Let’s check it out what is Big Data and how you can make use of it.
In fact, Big Data…Continue
Added by Nataliia Kharchenko on August 14, 2017 at 7:00am — No Comments
The images on this blog are from an algorithmic environment that I first developed about 15 years ago - rendered using a graphical system that I wrote in Java. A “differential lattice” is a structured array of differences between two points: e.g. the difference between the closing price of a stock on day T-0 (today) and T-6 (a week ago). Consequently, if the closing prices are $10.10, $10.20, $10.30, $10.40, and $10.50 (today), then 0/3 is from T-0/T-3 or $10.50 less $10.20 = $0.30. A…Continue
Added by Don Philip Faithful on August 12, 2017 at 5:30am — No Comments
For statistical models, selecting those predictors is what tests the steel of data scientists. It is really challenging to lay out the steps, as for every step, they should evaluate the situation and make decisions for the next or upcoming steps. It is a completely different story when running predictive models, and if relationship among the variables is not the main focus, situations get easier. Data analysts can go ahead to run step-wise regression models, empowering the data to give best…Continue
Added by Chirag Shivalker on July 31, 2017 at 10:30pm — No Comments
The following links describe a set of free SAS tutorials which help you to learn SAS programming online on your own. It includes tutorials for data exploration and manipulation, predictive modeling and some scenario based examples.
SAS (Statistical analysis system) is one of the most popular software for data analysis. It is widely used for various purposes such as data management, data mining, report writing, statistical analysis, business modeling, applications development and data…Continue
Added by Deepanshu Bhalla on June 27, 2017 at 9:00am — No Comments
Summary: Quantum computing is already being used in deep learning and promises dramatic reductions in processing time and resource utilization to train even the most complex models. Here are a few things you need to know.
Added by William Vorhies on June 13, 2017 at 8:00am — No Comments
R language is the world's most widely used programming language for statistical analysis, predictive modeling and data science. It's popularity is claimed in many recent surveys and studies. R programming language is getting powerful day by day as number of supported packages grows. Some of big IT companies such as Microsoft and IBM have also started developing packages on R and offering enterprise version of R.
Added by Deepanshu Bhalla on June 12, 2017 at 12:30am — No Comments
This article explains how to select important variables using boruta package in R. Variable Selection is an important step in a predictive modeling project. It is also called 'Feature Selection'. Every private and public agency has started tracking data and collecting information of various attributes. It results to access to too many predictors for a predictive model. But not every variable is important for prediction of a particular task. Hence it is essential to…Continue