Subscribe to DSC Newsletter

John Thuma's Blog (16)

Video: IoT and the self healing power grid

PREAMBLE:  

This video was built as a result of our internal hackathon using Teradata Listener to absorb real time small messages from Transformers and other devices on the Power Grid in Southern California. The video demonstrates a real time predictive analytic showcasing proactive repairs of the power grid to reduce costs and avoid disruptions of power service.…

Continue

Added by John Thuma on June 9, 2016 at 1:00am — No Comments

VIDEO: Aster On Hadoop - Introduction and Demo

Please watch my new video on Aster on Hadoop and get a simple demonstration of how easy it is to perform advanced analytics with Hadoop data!

Added by John Thuma on November 4, 2015 at 9:49am — No Comments

Aster on Hadoop is Hadoop for Everyone!

This announcement is a very exciting prospect for some but may strike fear into others. In my blog, I will entertain some of the interesting prospects of bringing together these technologies. I also hope to allay some fears as well.

One of the biggest announcements at Teradata Partners 2015 is that Aster will run on Hadoop. Many of our customers have already…

Continue

Added by John Thuma on October 19, 2015 at 5:30am — No Comments

Assembling The Data Team: Part 2: Traits to Avoid

In part one of this blog post we discussed the traits of people you should try to build in a data science team.  Of course technical, statistical, programming and mathematical skills are needed.  However, it takes much more.  We discussed the following traits:  The Pioneer, The Cattle Herder, The Muscle, and The Story Teller.  See below for that blog post:

 …

Continue

Added by John Thuma on August 22, 2015 at 11:00pm — No Comments

VIDEO: Aster Tango - SQL-MR/GR Scripting Tool

Writing SQL-MR and SQL-GR statements is made much easier by a tool I wrote while on vacation.  As an Aster Data Scientist I needed a tool that would enable me to focus on the 'WHAT' and not the 'HOW!'  I needed a tool to write the code for me.   So I wrote Aster Tango.

Added by John Thuma on August 17, 2015 at 4:12am — No Comments

Aster and Text Analysis (TextChunker, Vector Distance, Levenshtein Distance, Text_Parser, and TF_IDF)

Some of you may not know that Aster provides deep capabilities in text analysis.  These functions are easy to use.  They also allow you to perform text analysis at scale.  What does this mean?  This means that I am able to take billions of customer service notes from a CRM system and perform text analysis.  Here is a sample of some of the Aster Text Analytic Functions:  (TextChunker, Vector Distance, Levenshtein Distance, Text_Parser, and TF_IDF)

TextChunker:…

Continue

Added by John Thuma on August 3, 2015 at 7:30am — No Comments

Aster and Generalized Linear Model Functionality

Summary:  The generalized linear model (GLM) extends from the general linear model to accommodate dependent variables that are not normally distributed.  GLM is a methodology for modeling relationships between variables.

Use Cases:  

  -  Insurance and Loss Prediction…

Continue

Added by John Thuma on July 30, 2015 at 1:00pm — No Comments

List of All Aster Videos on Advanced Analytics

Please see the following list of videos that I have created to document some of Asters 120+ Analytic Functions:

Aster Analytic Learning Series

Text_Parser…

Continue

Added by John Thuma on July 21, 2015 at 4:30pm — No Comments

Teradata Aster: Multi-Channel Churn Prediction in Banking

Please watch this video and learn how we take multiple channels of data:  Bank systems, IVR, Call Center Notes, Clickstream, and others to perform behavioral churn prediction.  We do this at data scale and not using samples.

Added by John Thuma on July 11, 2015 at 9:30am — No Comments

See What You Can do With One Aster Command....

We would like to know if the customer discounts are having any effect on customer visits. We'll look to see if having a large discount (greater than .10 cents) leads to a greater number of additional purchases made at the store. Specifically, we want to know the date of the first large discount event ( > .10), the size of the discount, and the total number of unique products purchased after that discount. First, construct an nPath query that returns the total number of products…

Continue

Added by John Thuma on July 3, 2015 at 2:30pm — No Comments

Aster: Video: Using Confusion Matrix in Machine Learning

Genre: Statistical Analysis (Machine Learning)



Background: Learn how easy it is to leverage Aster for implementing confusion matrices. A Confusion Matrix provides a visual representation of the performance of a supervised machine learning algorithm. It makes it easy to determine if a model is confusing or mislabeling classes. We also go over some of the math involved and help to understand how confusion matrices are used in supervised machine learning.



Use Cases:

-… Continue

Added by John Thuma on July 3, 2015 at 8:30am — No Comments

Teradata Aster: Principal Component Analysis and Unsupervised Machine Learning

Please watch my video on Aster's principal component analysis or PCA. I not only show how Aster performs this analytic but I attempt to explain how PCA works and explain eigenvectors and eigenvalues. Genre: Statistical Analysis (Unsupervised Learning) Background: A process used to emphasize variability and bring out strong patterns in a dataset. This variability is expressed by principal components; which are directions of highest degree of variance. The first several principal components…

Continue

Added by John Thuma on June 29, 2015 at 2:31am — No Comments

Demystifying Teradata AsterR - R in Parallel, R at Scale

How AsterR is used in the Data Discovery Process?

AsterR is a Teradata produced package installed within the R client application.  This package is distinct from, but complements, the installation of R within Aster.  Together the AsterR package and the R installation into Aster create a rich environment that provides the R user with the normal look and feel of R while maintaining the power and speed of Aster.  There is a great deal of…

Continue

Added by John Thuma on June 22, 2015 at 6:00am — No Comments

Your Math Is All Wrong: Flipping The 80/20 Rule For Analytics

I was deep into a presentation at a major retailer. In the darkened room, a lone hand shot up. “John, we spend 80% of our time on data load and prep. Only 20% is used to produce analytics. We don’t like that ratio.”

The speaker was right. About 80% of the analytics process is spent on data preparation and loading. Numerous examples come to mind. I remember a project for an auto insurance company using telematics and driver behavior. The one-off code to prepare the data took three days…

Continue

Added by John Thuma on June 8, 2015 at 3:00am — 2 Comments

Optimizing your search functionality on your website

Your website’s search capabilities may be a potential customer’s first (or only) interaction with your website.  Customers who can’t find relevant products based how they search are likely to  abandon and go to competitor websites.  For many retailers, 30% - 40% of search queries are under-performing.  Underperforming search queries are costing you sales and customers.

Some examples of under performing search queries are:

  • Queries that return zero…
Continue

Added by John Thuma on June 1, 2015 at 7:30am — 1 Comment

Machine Learning is Not the Boogie Man! Gates and Musk Are Wrong.

Humanity is going to be okay!   The big bad robots are not going to come and get you...

In a recent Reddit AMA session, Bill Gates commented, “First the machines will do a lot of jobs for us and not be super intelligent… A few decades after that though the intelligence is strong enough to be a concern. I agree with Elon Musk and some others…

Continue

Added by John Thuma on May 29, 2015 at 7:30am — 2 Comments

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service