Home » Uncategorized

Introduction to Machine Learning / Data Mining

Machine Learning? Data Mining?

Well, there is a little bit difference between machine learning and data mining although I don’t see any difference between them.

See the Stackexchange debate on the difference between machine learning and data mining.

At the end, it is about training the machine to recognize the data, and the predict the future (or unknown variables) with the training. I’ll use both terms interchangeably. Please, feel free to challenge me if I am wrong.

How it works?

Well, seeing is believing. I have been in search for the better explanation. But, professor Keating in University Notre Dame has a really great explanation for that. You’ll see just two pictures with the painters’ name. Next, I am going to give you just pictures, and give you the question “who is the painter?” I swear you can answer the question 100% correctly.

<Claude Monet>

Claude_Monet_023

<Van Gogh>

print-starry-night

Now, who painted these pictures?

<1>
van
<2>
Claude_Monet_Saint-Georges_majeur_au_crepuscule
<3>
.
monet-madame-monet-and-her-son
<4>
Vincent_Willem_van_Gogh_128
<5>
gogh.olive-trees
<Answer>
1 – Gogh
2 – Monet
3 – Monet
4 – Gogh
5 – Gogh
<How your brain worked?>
As soon as you saw those pictures, in your mind, you already have a formula
Monet: Use bright colors, pictures liken feamle, it’s like a dream
Gogh: Use simple colors, pictures liken male, we can feel powerful. Rough.
Although there are some pictures which exactly fall into those two categories, we can get a broad sense of which picture is painted by whom.
Machine learning does the same thing. It learns the data given by the user. We call it as a “training set” Then, it applies the formula that was built when the machine analyzed the training set to the data set that we want to forecast. We call it as a “test set.” It can be wrong, but generally as we provide the machine with the qualified test data, we can get the better prediction.
Where can we apply it?
<Sales>
You are the sales person of the insurance company. Just you’ve got the list of potential customers. It has the information of their income, age, place, and jobs. If you are a good sales person, you would have a gut feeling to single out which customer is willing to sing up the new insurance plan. However, with the machine learning, you don’t need any gut feeling. If you have the past transaction records, it tells you
<Card company>
Suppose that you are in charge of issuing cards. You don’t want to issue cards to those who are highly likely not to pay the card bill on time. In this case, you can figure out who is likely to default based upon age, income, job, and savings. Actually, credit card companies adopt this techniques long time ago. If you get “you are rejected to your request on issuing card” message, you would probably not pass this test.
I want to lead this conversation into real application of data mining.