After a few biostatistics classes, I began fitting my first logistic regression model using my physician friend’s data on tumors excised from skin cancer patients. I realized that although we were very clear about the dependent variable we were trying to predict – a certain feature of the tumor – I really did not know how to pick the independent…
ContinueAdded by Monika Wahi on November 14, 2020 at 10:30am — 1 Comment
Summary: A little history lesson about all the different names by which the field of data science has been called, and why, whatever you call it, it’s all the same thing.
A little reminiscence, or for those of you who are only recently data scientists, a little history lesson.
Our profession of…
ContinueAdded by William Vorhies on December 4, 2019 at 3:12pm — No Comments
Ecommerce sites generate tons of web server log data which can provide valuable insights through analysis. For example, if we know which users are more likely to buy a product, we can perform targeted marketing, improve relevant product placement on our site and lift conversion rates. However, raw web logs are often enormous and messy so preparing the data to train a predictive model is time consuming for data scientists.…
Added by Ayumi Owada on July 18, 2019 at 2:00pm — No Comments
I thought I would follow on my first blog posting with a follow-up on a claim in the post that going returns followed a truncated Cauchy distribution in three ways. The first way was to describe a proof and empirical evidence to support it in a population study. The second was to discuss the consequences by performing simulations so that financial modelers using things such as the Fama-French, CAPM or APT would understand the full consequences of that decision. The third was to discuss…
ContinueAdded by David Harris on December 27, 2018 at 7:32pm — No Comments
In 1963 Benoit Mandelbrot published an article called “The Variation of Certain Speculative Prices.” It is a response to the forming theory that would become Modern Portfolio Theory. Oversimplified, Mandelbrot’s argument could be summarized as “if this is your theory, then this cannot be your data, and this is your data.” This issue has haunted models such as Black-Scholes, the CAPM, the APT and Fama-French. None of them have survived validation tests. Indeed, a good argument can be…
ContinueAdded by David Harris on December 10, 2018 at 2:00pm — No Comments
For most businesses, machine learning seems close to rocket science, appearing expensive and talent demanding. And, if you’re aiming at building another Netflix recommendation system, it really is. But the trend of making everything-as-a-service has affected this sophisticated sphere, too. You can jump-start an ML initiative without much investment, which would be the right move if you are new to data science and just want to grab the low hanging fruit.
One of ML's…
ContinueAdded by Olexander Kolisnykov on September 18, 2018 at 2:52am — No Comments
The insurance industry – one of the least digitalized – is not surprisingly one of the most ineffective segments of the financial services industry. Internal business processes are often duplicated, bureaucratized, and time-consuming. As the ubiquity of machine learning and artificial intelligence systems increases, they have the potential to automate operations in insurance companies thereby cutting costs and increasing productivity. However, organizations have plenty of reasons to resist…
ContinueAdded by Denys Harnat on August 28, 2018 at 3:35am — No Comments
The best trained soldiers can’t fulfill their mission empty-handed. Data scientists have their own weapons — machine learning (ML) software. There is already a cornucopia of articles listing reliable machine learning tools with in-depth descriptions of their functionality. Our goal, however, was to get the feedback of industry experts.
And that’s why we interviewed data science practitioners — gurus, really —regarding the useful tools they…
ContinueAdded by Kateryna Lytvynova on July 13, 2018 at 2:00am — No Comments
R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises.
Learn the fundamentals of data analysis in the second edition of Data Analysis with R, authored by data scientist…
ContinueAdded by Packt Publishing on May 8, 2018 at 10:30pm — No Comments
In the article, Data Science Should Monitor Big Brother by Arjan Haring, one important…
ContinueAdded by Peter Bruce on January 14, 2018 at 11:00am — No Comments
Finding out the difference between data scientists, data engineers, software engineers, and statisticians can be confusing and complicated. While all of them are linked to data in a way, there is an underlying difference between the work they do and manage.
The growth of data and its usage across…
ContinueAdded by Ronald van Loon on December 19, 2017 at 1:00am — 1 Comment
Summary: Which is more important, the data or the algorithms? This chicken and egg question led me to realize that it’s the data, and specifically the way we store and process the data that has dominated data science over the last 10 years. And it all leads back to Hadoop.
Recently I was challenged to speak on the role of data in data…
Added by William Vorhies on November 28, 2017 at 10:36am — 1 Comment
Today, data scientists are generally divided among two languages — some prefer R, some prefer Python. I will not try to explain in this article which one is…
ContinueAdded by Marija Zoldin on September 26, 2017 at 10:00am — No Comments
Data is important. It is not a secret for anybody. We can even paraphrase famous saying mentioning that “who owns the data, owns the world”. And if you are a business person, you should know like no one else. Your activity can be changed for better if you use Big Data sources. Sales growth, clever marketing strategy - you can achieve it using Big Data. Let’s check it out what is Big Data and how you can make use of it.
In fact, Big Data…
ContinueAdded by Nataliia Kharchenko on August 14, 2017 at 7:00am — No Comments
The images on this blog are from an algorithmic environment that I first developed about 15 years ago - rendered using a graphical system that I wrote in Java. A “differential lattice” is a structured array of differences between two points: e.g. the difference between the closing price of a stock on day T-0 (today) and T-6 (a week ago). Consequently, if the closing prices are $10.10, $10.20, $10.30, $10.40, and $10.50 (today), then 0/3 is from T-0/T-3 or $10.50 less $10.20 = $0.30. A…
ContinueAdded by Don Philip Faithful on August 12, 2017 at 5:30am — No Comments
For statistical models, selecting those predictors is what tests the steel of data scientists. It is really challenging to lay out the steps, as for every step, they should evaluate the situation and make decisions for the next or upcoming steps. It is a completely different story when running predictive models, and if relationship among the variables is not the main focus, situations get easier. Data analysts can go ahead to run step-wise regression models, empowering the data to give best…
ContinueAdded by Chirag Shivalker on July 31, 2017 at 10:30pm — No Comments
The following links describe a set of free SAS tutorials which help you to learn SAS programming online on your own. It includes tutorials for data exploration and manipulation, predictive modeling and some scenario based examples.
SAS (Statistical analysis system) is one of the most popular software for data analysis. It is widely used for various purposes such as data management, data mining, report writing, statistical analysis, business modeling, applications development and data…
ContinueAdded by Deepanshu Bhalla on June 27, 2017 at 9:00am — No Comments
Summary: Quantum computing is already being used in deep learning and promises dramatic reductions in processing time and resource utilization to train even the most complex models. Here are a few things you need to know.
So far in this series of articles on Quantum computing we showed that…
Added by William Vorhies on June 13, 2017 at 8:00am — No Comments
R language is the world's most widely used programming language for statistical analysis, predictive modeling and data science. It's popularity is claimed in many recent surveys and studies. R programming language is getting powerful day by day as number of supported packages grows. Some of big IT companies such as Microsoft and IBM have also started developing packages on R and offering enterprise version of R.
Table of…
ContinueAdded by Deepanshu Bhalla on June 12, 2017 at 12:30am — No Comments
This article explains how to select important variables using boruta package in R. Variable Selection is an important step in a predictive modeling project. It is also called 'Feature Selection'. Every private and public agency has started tracking data and collecting information of various attributes. It results to access to too many predictors for a predictive model. But not every variable is important for prediction of a particular task. Hence it is essential to…
ContinueAdded by Deepanshu Bhalla on June 1, 2017 at 9:00am — 1 Comment
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
1999
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles