Featured Blog Posts – December 2019 Archive (68)

Math for Data Science in One Picture: What do you REALLY need to study?

In my last blog post, I covered the statistics you need to know for data science. But of course, stats isn't the only math related knowledge you need. Rather than offer my own biased opinion about the importance of this subject vs. that one, I performed a meta analysis of popular opinion to see what data scientists and educators are saying (see…


Added by Stephanie Glen on December 14, 2019 at 7:00am — No Comments

Rule of thumb: Which AI / ML algorithms to apply to business problems


How to know which AI/ ML algorithm to apply to which business problem?

This is a common question

I found a good reference for it –…


Added by ajit jaokar on December 13, 2019 at 10:18am — 2 Comments

Using "record id's" to facilitate processing in Python-Pandas and R-data.table.

ID card template example

Both R and Python-Pandas are array-oriented platforms that support fast filtering through vectors of record-id's. In Python-Pandas, such vectors are implemented via Pandas's powerful index construct; in R-data.table, they're accessible through the "which" and "row.name" functions. In both instances, joins to record-id vectors generate fast subsetted access.

How is the record-id vector approach helpful? For starters, the analyst can encapsulate common…


Added by steve miller on December 13, 2019 at 5:51am — No Comments

Sports Telemetry in Real-Time

The history of F1 motor racing and the use of telemetry as a way to monitor car setup and performance dates back to the 80s. The first electronic systems were installed onboard the car, collected information for only one lap and the data were then downloaded when the car was back in the garage. The explosion of computing capabilities, in the 90s, contributed to the growth of intelligent data usage in the F1 and the…


Added by Valeria on December 13, 2019 at 1:00am — 2 Comments

2020 Trends, Predictions and Challenges for Data Management and Privacy

As we move into 2020, data management will continue to advance and develop efficiencies that will make the job of having data ready for business purposes faster and more reliable than ever. While the data management space is a diverse field in its practices, there are four trends that will be forefront in 2020:

  • Data Orchestration – The uniting of data integration, API integration, and data movement to support DataOps techniques. This involves combining multiple…

Added by Todd Wright on December 12, 2019 at 6:00am — No Comments

Performance evaluation of cloud computing platforms for Machine Learning

A use case on Logistic regression training

Over the last few years there are several efforts for more powerful computing platforms to face the challenges imposed by emerging applications like machine learning. General purpose CPUs have been developed specialized ML modules, GPUs and FPGAs with specialized engines are…


Added by Chris Kachris on December 12, 2019 at 4:30am — No Comments

It Is Never Too Late To Learn!

The article by Stefanie Glen in the November 30 DSC Newsletter  is spot on!  I am a 77-year old Data Scientist, and I have done my best work since I “retired” in 2009.  Since then, I published 3 books on Data Science topics with Academic Press, and a 4th book is in press at Cambridge University Press.  I began teaching Data Science at the University of California at Irvine in 2012.  All of my students are international (in an international program at UCI), and almost all of them…


Added by Robert Nisbet on December 10, 2019 at 6:14pm — No Comments

Make Crucial Predictions as Data Comes

Walking by the hottest IT streets in these days means you've likely heard about achieving Streaming Machine Learning, i.e. moving AI towards streaming scenario and exploiting the real-time capabilities along with new Artificial Intelligence techniques. Moreover, you will also notice the lack of research related to this topic, despite the growing interest in it.

If we try to investigate it a little bit deeper then, we realize that…


Added by Valeria on December 10, 2019 at 7:30am — No Comments

Why Event Stream Processing Is Leading the New Big Data Era

Big Data is probably one of the most misused words of the last decade. It was widely promoted, discussed, and spread around by business managers, technical experts, and experienced academics. Slogans like “Data is the new oil” were widely accepted as unquestionable truth.

These beliefs pushed  technologies forward. Its stack, formerly developed by Yahoo! and now owned by the Apache Software Foundation, was recognized as “The” Big Data…


Added by Valeria on December 10, 2019 at 7:21am — No Comments

Deep Analytics: Risk Management with AI

We first provide a mini-tutorial on  Adjoint Algorithmic Differentiation (AAD) (also known as back-propagation in machine learning). We then illustrate how  neural networks may be used to compute dynamic values and risks of trading books with applications to risk management of derivatives,  valuation adjustments (XVA), counterpart credit risk, FRTB and SIMM margin valuation adjustments (MVA). We also describe new techniques to substantially improve deep learning on simulated data, and…


Added by Antoine Savine on December 10, 2019 at 1:30am — No Comments

Fun with maps: Part 2

Last time we created a beautiful map with a lot of features, see here. This time I will show you how to customize different things. I use the same data.

map1= folium.Map(location=[10,20], zoom_start=2, tiles='http://tile.stamen.com/toner-lite/{z}/{x}/{y}.png',attr="Dr.Katharina Glass")

Let’s start with marker.…


Added by Dr. Katharina Glass on December 9, 2019 at 10:00pm — No Comments

CPU Vendors Compete Over Memory Bandwidth to Achieve Leadership in Real-World Application Performance

By Rob Farber

Now is a great time to be procuring systems as vendors are finally addressing the memory bandwidth bottleneck. Succinctly, memory performance dominates the performance envelope of modern devices be they CPUs or GPUs. [i] It does not matter if…


Added by Rob Farber on December 9, 2019 at 10:00am — No Comments

Statistics for Data Science in One Picture

There's no doubt about it, probability and statistics is an enormous field, encompassing topics from the familiar (like the average) to the complex (regression analysis,…


Added by Stephanie Glen on December 9, 2019 at 7:30am — No Comments

Self Aware Streaming

 Self Aware Streaming


1. Problem Statement

By processing data in motion, Real time/stream processing enables you to get insight into your business and make vital decisions.

Challenges in Stream Processing -

  • Over-provisioning of resources for…

Added by Daljeet Kaur on December 9, 2019 at 2:30am — No Comments

Weekly Digest, December 9

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link.  …


Added by Vincent Granville on December 8, 2019 at 3:30pm — No Comments

List of Time Series Methods in One Picture

The picture below was found in some tweets posted by top data science influencers, though its origin is somewhat obscure. 

Many of these methods are described in Wikipedia. Many are also described on Data Science Central, see for instance…


Added by Capri Granville on December 8, 2019 at 12:30pm — No Comments

Creating Assets that Appreciate, Not Depreciate, in Value Thru Continuous Learning – Part II

“If you buy a Tesla today, I believe you're buying an appreciating asset, not a depreciating asset” – Elon Musk

OK, I realize it’s on me for not explaining it well. Let me try again at explaining this game-changing concept of leveraging massive amounts of operational data with Artificial Intelligence (AI) and Deep Learning to create assets that appreciate, not depreciate, in value through usage.

What Elon Musk is saying is that Tesla…


Added by Bill Schmarzo on December 8, 2019 at 6:30am — No Comments

Regression analysis using Python

This article was written by Stuart Reid. 


This tutorial covers regression analysis using the Python StatsModels package with Quandl integration. For motivational purposes, here is what we are working towards: a regression analysis program which receives multiple data-set names from Quandl.com, automatically downloads the data, analyses it, and plots the results in a new window.…


Added by Andrea Manero-Bastin on December 8, 2019 at 6:30am — No Comments

Event Distribution as a Subject of Ontological Recognition Criteria

The “Diagnostic and Statistical Manual of Mental Disorders” produced by the American Psychiatric Association is an interesting document from a conceptual standpoint.  In order to count the number of individuals with a particular disorder and to make the numbers comparable regardless of source, there has to be clear criteria guiding the ontology.  This document therefore serves an important ontological purpose. By the way, given that some dictionaries don’t…


Added by Don Philip Faithful on December 8, 2019 at 6:18am — No Comments

Fighting Overfitting in Deep Learning


While training the model, we want to get the best possible result according to the chosen metric. And at the same time we want to keep a similar result on the new data. The cruel truth is that we can’t get 100% accuracy. And even if we did, the result is still not…


Added by Igor Bobriakov on December 6, 2019 at 9:00am — No Comments

Featured Monthly Archives












© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service