In this post, I'll explore the new AWS Machine Learning services.
The problem we are trying to solve is to classify auto accident severity given a set of features. I'll not go into further details of the data set and what classification algorithms,etc. here since the goal of this blog is to explore the new AWS Machine Learning service step by step.
In the next blog post, I'll explore another service: Microsoft Azure Machine Learning.
Let's get started by logging into the AWS Console.
Now select Machine Learning service:
The open screen comes up. Select "Get Started"
Let's click on Explore model performance to see the details. It looks too good to be true.
Oh, wow! Wait a minute... Something is amiss!
The model has a 100% classification accuracy across all three different types of accident severity types?! Something is wrong. For more details on how to read and interpret the matrix above, check out this documentation here.
It was fun to experiment with the new waves of Machine Learning services. As a data scientist, I still prefer the powerful language R so I know exactly what I put in the models, tune it, and understand its outputs. Yes, these GUI-based machine learning services can be easier for the novices, but it's not obvious if it does exactly what one wants to do and if it's flexible enough for fine tuning. Perhaps, I need to spend more time on the documentations. This is just first impressions. I'm sure these things will improve over time.
Additionally, it takes what seems like a VERY LONG time to process a relatively small data file. We are talking about 43K rows of data. R can rip through that thing very quickly, but I was waiting like 15-20 minutes for the entire sequence to process on AWS Machine Learning.
So, the use case for AWS Machine Learning ONLY makes sense if one has REALLY large scale data that you need the cloud computing infrastructure. Otherwise, it's really slow. It's like using Hadoop to process 1MB of data. Not a good use case. :)
For a professional data scientist, I find this canned service rather limiting and does not offer the full flexibility of a true data science computing environment. To be fair, I'm sure it will improve.
Well, that's all for now folks.
Next time, I'll explore another Machine Learning services using the new Microsoft Azure Machine Learning.
Originally posted here.