Subscribe to DSC Newsletter

All Blog Posts (6,251)

Data Quality Case Studies: How We Saved Clients Real Money Thanks to Data Validation

Machine learning models grow more powerful every week, but the earliest models and the most recent state-of-the-art models share the exact same dependency: data quality. The maxim “garbage in – garbage out” coined decades ago, continues to apply today. Recent examples of data verification shortcomings abound, including JP Morgan/Chase’s 2013 fiasco and this lovely…

Continue

Added by Michał Frącek on July 4, 2019 at 4:21am — No Comments

Lightweight but effective way of documenting a group of Jupyter Notebooks

My app Qubiter has a folder full of Jupyter notebooks (27 of them, in fact). Opening a notebook takes a short while, which is slightly annoying. I wanted to give Qubiter users the ability to peek inside all the notebooks at once, without having to open all of them. Qubiter’s new SUMMARY.ipynb notebook allows the user to do just that.

SUMMARY.ipynb scans the directory in which it lives to find all Jupyter notebooks (other than itself) in that directory. It then prints for every…

Continue

Added by Robert R. Tucci on July 4, 2019 at 3:08am — No Comments

How the Mathematics of Fractals Can Help Predict Stock Markets Shifts

In financial markets, two of the most common trading strategies used by investors are the momentum and mean reversion strategies. If a stock exhibits momentum…

Continue

Added by Marco Tavora on July 4, 2019 at 12:30am — 1 Comment

Data Science Central Thursday Digest, July 4

Here is our selection of featured articles and resources posted since Monday:

Technical resources 

Continue

Added by Vincent Granville on July 3, 2019 at 7:30pm — No Comments

New Book: Data Science for Healthcare - Methodologies and Applications

This Springer book seeks to promote the exploitation of data science in healthcare systems. The focus is on advancing the automated analytical methods used to extract new knowledge from data for healthcare applications. To do so, the book draws on several interrelated disciplines, including machine learning, big data analytics, statistics, pattern recognition, computer vision, and Semantic Web technologies, and focuses on their direct application to…

Continue

Added by Sergio Consoli on July 3, 2019 at 5:45am — No Comments

Multilevel Modelling of U.S. Home Loan Data

The housing market has undergone quite a change in the past decade, with more stringent lending criteria for housing having been enforced.

A key objective of financial institutions is to minimise the risk of mortgage lending by ensuring that the debtor is ultimately able to repay the loan.

In this example, multilevel modelling techniques are used to analyse data from the Federal Home Loan Bank…

Continue

Added by Michael Grogan on July 3, 2019 at 3:01am — No Comments

Writing/Reading Large R dataframes/data.tables -- Addendum.



After posting my most recent blog using …

Continue

Added by steve miller on July 2, 2019 at 9:00am — No Comments

Predicting Hotel Cancellations with Support Vector Machines and SARIMA

Hotel cancellations can cause issues for many businesses in the industry. Not only is there the lost revenue as a result of the customer cancelling, but this can also cause difficulty in coordinating bookings and adjusting revenue management practices.

Data analytics can help to overcome this issue, in terms of identifying the customers who are most likely to cancel – allowing a hotel chain to adjust its marketing strategy accordingly.

To investigate how machine learning can…

Continue

Added by Michael Grogan on July 2, 2019 at 3:00am — No Comments

How To Choose An NLP Vendor For Your Organization

Continue

Added by Shaily Baheti on July 2, 2019 at 12:30am — No Comments

How Long Does It Take to Learn Python for Data Science?

Python is the most loved, dreaded, and wanted programming languages by most developers, according to StackOverflow survey.  Popular among most professional software developers, Python was ranked the world’s seventh popular programming language.

A study by PYPL Popularity of Programming Language Index (a study that monitors the frequency of searches regarding the popular programming languages to learn) predicted that it showed that there was a growth of 17.1% during the last…

Continue

Added by Yoey Thamas on July 2, 2019 at 12:29am — No Comments

Open-source Logistic Regression FPGA core for accelerated Machine Learning

Machine learning algorithms are extremely computationally intensive and time consuming when they must be trained on large amounts of data. Typical processors are not optimized for machine learning applications and therefore offer limited performance. Therefore, both academia an industry is focused on the development of specialized…

Continue

Added by Chris Kachris on July 1, 2019 at 10:27pm — No Comments

Critical skills set to make or break a data scientist

 A data scientist must know how to approach the extent of any problem; it means identifying features and figuring out the question that how to frame the desired answer is the key to become the most wanted data scientist. …

Continue

Added by Nisha Dhiman on July 1, 2019 at 9:00pm — No Comments

How Data Science is Playing a Big Role in Higher Education?



Data science is a growing and promising discipline that has impacted various domains, including higher education. Owing to its ability to use precise methods and platforms to extract insights from data, several academic institutions are incorporating data science into their operations and educational curriculum. This helps them engage students, improve educational…

Continue

Added by Gaurav Belani on July 1, 2019 at 8:53pm — No Comments

Workload Optimized Compute Servers Are Creating the Need for Converged Clusters

Pooled, also referred to as “converged”, clusters in a unified data environment support disparate workload better than separate, siloed clusters. Vendors now provide direct support for converged clusters to run key HPC-AI-HPDA (AI, HPC, and High Performance Data Analytic) workloads.

The success of workload optimized compute servers has created the need for converged clusters as organizations have generally added workload optimized clusters piecemeal to support their disparate AI, HPC,…

Continue

Added by Rob Farber on July 1, 2019 at 8:52am — No Comments

Where’s the Love – Trends in Data Science Career Opportunities

Summary:  The annual Burtch Works salary survey tells us a lot about which industries are using the most data scientists and the difference between higher and lower skilled data scientists.  Salary increases show us whether demand is increasing, and finally we take a shot at determining which skills are most in demand.

 …

Continue

Added by William Vorhies on July 1, 2019 at 8:00am — No Comments

Role Conflicts and Deviant Behaviours

It has been suggested the role conflicts can lead to poorer performance in the workplace.  Below I present the general dynamics: more role conflicts equate to less performance.

Performance can be expressed empirically - as in the case above using a formal scoring scheme.  On the other hand, a qualitative approach can be used:…

Continue

Added by Don Philip Faithful on July 1, 2019 at 6:50am — No Comments

Fine grained analysis of K- mean clustering and where we are using it

K-means is a centroid based algorithm that means points are grouped in a cluster according to the distance(mostly Euclidean) from centroid.

Centroid-based…

Continue

Added by satyajit maitra on July 1, 2019 at 6:30am — No Comments

Microsoft Azure ML Studio – A Tutorial on How to Create a Churn Model in No Time

In this article, we will see how we can implement a simple customer churn model that is built by using Azure Machine Learning studio. This article will give us a starting point to understand how Azure ML based models are created and deployed in the most easy to understand manner. The experiment (Azure ML Model terminology) that I will refer to is based on a dummy customer data and is readily available on Azure ML Studio AI Gallery.

Understanding the Churn…

Continue

Added by Sunil Kappal on July 1, 2019 at 3:02am — No Comments

Why is it hard for AI to detect human bias?

AI bias is in the news – and it’s a hard problem to solve

 

But what about the other way round?

 

When AI engages with humans – how does AI know what humans really…

Continue

Added by ajit jaokar on June 30, 2019 at 9:19am — No Comments

Data Science Central Monday Digest, July 1

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this…

Continue

Added by Vincent Granville on June 29, 2019 at 3:30pm — No Comments

Blog Topics by Tags

Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

1999

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service