We wanted to be able to predict median rent of a place given the median price of the home, median household income of the place and the percentage of homes vacant in that place. The data can be downloaded from here
The steps to be followed are
Create data source
Train the model
Evaluate the model
To get started login into Amazon AWS console and click on machine learning. It shows all your entities by default. An entity can be an ML model, Data set, Evaluation etc
Creating a Datasource
To create a dataset click on “Create new” drop down button and then select “Data Source”
To create a Datasource, your data file needs to be present in either amazon S3 or RedShift
If you are getting data from S3, you need to provide the location of the data in your S3. Once you provide the information click “verify”
The datasource is validated in this process
A schema is composed of all attributes in the input data and their corresponding data types. Amazon ML uses the information in the schema to correctly read and interpret the input data, compute statistics, apply the correct attribute transformations, and fine-tune its learning algorithms.
You can provide a separate schema file when you upload your AWS S3 data. Here we let Amazon ML to infer the attribute types and create a schema.
On the schema page check the “Does the first line in your CSV contain the column names?” option to “Yes”.
Make sure that the attributes in the file are assigned the correct datatype.
Review the types properly and click continue
In the next page for “Do you want to use this dataset to create and/or evaluate a ML model?” choose “Yes”
This will let us select the target attribute
The “Target” is the attribute which the model must learn to predict. Here we want to predict Median rent of a place. So we select it as target
In the next page for “Do you want to select an identifier?” choose “Yes”
and in the next page check “Geo_ID”
and click on “Review”
In the next page review the attributes and click “Finish”
Once you click finish you see the data source being “initialized”. It takes some time to reach “Completed” status
Training ML model
Amazon ML supports 3 types of ML models, namely
Multi class classification
The type of model depends on the type of data you want to predict
For binary classification AWS ML uses logistic regression algorithm and for multi class classification and regression, it uses multinomial logistic regression and linear regression algorithms respectively
Since we want to predict the rent at a particular place, which is a number we use Regression ML. The ML model based on training data, computes one weight for each feature to form a model that can predict or estimate the target value
Create an ML model
You can create an ML model either from the datasource or from the “Create New” dropdown button in the dashboard, like you created the dataset
If you created it from the create new dropdown button you have to provide the name of the data source on which the model has to train
Click “Continue”. In the next page give the name of the model
In the next page for “Training and evaluation settings” choose “Default”
Because it is best to start with the simple and default options first.
By selecting this option an evaluation will automatically be generated. 70% of the data will be used for training and the remaining 30% will be used for evaluation
Evaluating the model
Once the model is built, it can be run on some data which it has not seen and the predicted values can be compared to that of the original value to evaluate the performance of the model.
Since we selected the “Default” option, an evaluation is automatically generated
For regression tasks, Root mean square error is used to evaluate the accuracy.
The RMSE for our model is 278
We can also see the the distribution of errors of the estimates. It can be seen by selecting
Evaluations -> Explore Performance
Generating Batch Predictions
You can start generating the batch predictions and real time predictions immediately
You will need to create a datasource to generate the batch predictions
AWS Machine Learning charges an hourly rate for the compute time used to build predictive models, and then you pay for the number of predictions generated for your application. For real-time predictions you also pay an hourly reserved capacity charge based on the amount of memory required for your model.
For data analysis and model building amazon charges $0.42 per Hour
For generating batch predictions $0.10 per 1,000 predictions
For real time predictions $0.0001 per prediction, rounded up to the nearest penny.