Subscribe to DSC Newsletter

Using Python on Azure Machine Learning Studio

AzureML is the cloud hosted machine learning platform on top of Microsoft’s cloud platform. Readers of Data Science Central will realize that AzureML have hosted a few webinars about their platformThis tutorial will walk you through integrating Python with AzureML. 


Business Case

You are planning to move out of the place you are currently staying and are looking for a place place which is similar to the current place. How will you decide where to go?

 

The below example data set is given here & we used Cosine Similarity to determine the closest places. However, the cosine similarity code in Python was used on AzureML.  The testing data set for Machine Learning is given here 

 

Place in California

Montecito

Atherton

Tiburon

Los Altos Hills

Median Home Value

$1000001

$1000001

$1000001

$1000001

% of Homes Built 2000 to 2009

10

11

8

11

% of Homes Built 1990 to 1999

11

6

11

10

% of Homes Built 1980 to 1989

13

5

19

20

% of Homes Built 1970 to 1979

11

10

28

23

% of Homes Built 1960 to 1969

16

31

22

18

% of Homes Built 1950 to 1959

6

13

3

5

% of Homes Built 1940 to 1949

23

14

4

5

% of Homes Built 1939 or earlier

10

10

6

9

% of Homes No Bedrooms

2

0

2

0

% of Homes 1 Bedroom

7

2

9

0

% of Homes 2 Bedrooms

21

3

25

6

% of Homes  3 Bedrooms

37

21

35

20

% of Homes 4 Bedrooms

22

39

19

40

% of Homes 5 or more Bedrooms

11

35

10

35

 

Python + Azure ML: 

Python scripts can be embedded in machine learning experiments in azure machine learning studio. The scripts can be used to manipulate data and even to generate visualizations. The scripts can be executed on azure machine learning studio using “Execute Python Script” module which is listed under “Python language modules”.

 

The module can take  3 optional inputs and give 2 outputs.

The 3 inputs being

  1. Dataset 1: 1st data input file from the workpace

  2. Dataset 2: 2nd data input file from the workpace

  3. Script bundle:  “script bundle” can be a zip file from the workspace. The zip can contain any files which can be referenced by the script during the runtime

 

The module gives 2 types of outputs: 1)one is a dataset in internal dataset format and console and png graphics output can also be seen via “Python device” output

 

The 2 outputs are

  1. Dataset: The Dataset output is in internal dataset format

  2. Python Device: Python device output can show console output as well as png graphics

 

To get going, login into azure machine learning studio.  

 

For the current experiment we need to import data regarding the variables we talked above for different cities in California.

 

Step 1: Import a dataset start by clicking “New” at the bottom left corner of the page. Select   “Dataset” option, Here you can upload a local file and use it in the machine learning studio

 

Step 2: Once the process is done, you will see the available data sets. Now you need to get going on the experiment by clicking on the experiment icon (the experiment flask) & then click New -> Experiment -> Blank Experiment.

 

Click on the saved data set & choose the file and click on the data set or you can also search for the dataset. Just drag and drop into the canvas



Step 3: Drag and drop “Execute Python Script” module which is listed under “Python language modules” on to the canvas.  This module can take 3 inputs and return 2 outputs. 


To use the dataset imported from the local machine in the python script connect the dataset to the left bubble as shown below.

Step 4: Click on the execute python script module and you can see the script on the right side. Click anywhere in the text box and you can start editing the python script

 

As you can see the text box contains a heavily commented boiler plate code.

The function azureml_main is the entry point of the code. 2 dataframes can be passed as inputs. We can access the imported data which we connected to the first bubble of this module as a Pandas dataframe inside this function

 

The Python code to calculate cosine similarity which we inserted into the Azure ML Module is given below:

 

# The script MUST contain a function named azureml_main

# which is the entry point for this module.

#

# The entry point function can contain up to two input arguments:

#   Param<dataframe1>: a pandas.DataFrame

#   Param<dataframe2>: a pandas.DataFrame

def azureml_main(dataframe1 = None, dataframe2 = None):

   from scipy import dot

   from numpy.linalg import norm

   from pandas import Series

   import numpy

   from scipy import spatial

   

   #calculate percentile home value to normalize the home values

   dataframe1 = dataframe1.sort(['Median Home value'])

   dataframe1['percentile home value']=Series([100.0*(pos-0.5)/1440 for pos,vec in enumerate(dataframe1.values)],index=dataframe1.index)

 

#16000us0623182

#is the geoid of Fairfield

 

   place_vec = dataframe1[dataframe1['Geo-ID'].isin(['16000us0623182'])].iloc[0]

   #print place_vec

   

 

#function to calculate cosine    

   def cal_cosine(v1,v2):

  return (dot(v1,v2)*1.0)/norm(v1)/norm(v2)

 

   

   cosine = [cal_cosine(place_vec[3:],vec[3:]) for vec in dataframe1.values]

   #print cosine

   dataframe1['cosine']=Series(cosine,index=dataframe1.index)

   

   dataframe1 = dataframe1.sort(['cosine'])

   

   print "The place very similar to ",place_vec['Place']," is ",dataframe1.iloc[-2]['Place'], " the cosine similarity is ",dataframe1.iloc[-2]['cosine']

   return dataframe1

To start running the experiment click “Run” button at the bottom. Once the experiment is finished you can see the console output by clicking on the bottom right bubble of the “Execute Python Script” module and selecting “Visualize” from the dropdown.

The output of the script is:

 

 

The dataframe returned by the script can be accessed by clicking the bottom left bubble of the module. The data frame can be downloaded as a csv by adding a “convert to csv” module and connecting it to the bubble and running the experiment once again. Once done the csv can be downloaded by clicking on the bottom bubble and selecting download

 

 

Views: 7010

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service