Subscribe to DSC Newsletter
Tom Lous
  • Male
  • Berkel en Rodenrijs
  • Netherlands
Share on Facebook
Share

Tom Lous's Friends

  • Stefano Ghiotto

Gifts Received

Gift

Tom Lous has not received any gifts yet

Give a Gift

 

Tom Lous's Page

Latest Activity

Stefano Ghiotto commented on Tom Lous's blog post Record linking with Apache Spark’s MLlib & GraphX
"Absolutely yes Tom! Thank you. Regards"
May 22
Tom Lous commented on Tom Lous's blog post Record linking with Apache Spark’s MLlib & GraphX
"Sure Stefano. In the mean time I have been improving this approach as well. It will be nice to compare notes later on?"
May 22
Stefano Ghiotto commented on Tom Lous's blog post Record linking with Apache Spark’s MLlib & GraphX
"Hi Tom, Sorry I haven't got back to you sooner than this. I would like to go over your approach again and hopefully start working with my data soon so that I might come up with more comments or questions :) Thank you very much for your…"
May 15
Tom Lous commented on Tom Lous's blog post Record linking with Apache Spark’s MLlib & GraphX
"Hi Stefano, Great question. Full disclosure: I'm an engineer with data science aspirations :-) I was aware, but didn't really adjust for/analyse covariates in this model. I did however incorporate an elastic net regularisation that…"
Apr 22
Stefano Ghiotto commented on Tom Lous's blog post Record linking with Apache Spark’s MLlib & GraphX
"Hi Tom, Very interesting job! I don't know anything about Sparks therefore I couldn't interpret all the lines of your scripts, so I was wondering, which covariates did you use in your logistic model? Were they all similarity measures on a…"
Apr 20
Tom Lous liked Raghavan Madabusi's blog post Text Normalization with Spark – Part 2
Apr 17
Stefano Ghiotto liked Tom Lous's blog post Record linking with Apache Spark’s MLlib & GraphX
Apr 10
Robert de Munter liked Tom Lous's blog post Record linking with Apache Spark’s MLlib & GraphX
Apr 6
Tom Lous liked Tom Lous's blog post Record linking with Apache Spark’s MLlib & GraphX
Apr 5
Tom Lous's blog post was featured

Record linking with Apache Spark’s MLlib & GraphX

The challengeRecently a colleague asked me to help her with a data problem, that seemed very straightforward at a glance.  She had purchased a small set of data from the chamber of commerce (Kamer van Koophandel: KvK) that contained roughly 50k small sized companies (5–20FTE), which can be hard to find online. She noticed that many of those companies share the same address, which makes sense, because a lot of those companies tend to cluster in business complexes.However she also found that many…See More
Apr 5
Tom Lous posted a blog post

Record linking with Apache Spark’s MLlib & GraphX

The challengeRecently a colleague asked me to help her with a data problem, that seemed very straightforward at a glance.  She had purchased a small set of data from the chamber of commerce (Kamer van Koophandel: KvK) that contained roughly 50k small sized companies (5–20FTE), which can be hard to find online. She noticed that many of those companies share the same address, which makes sense, because a lot of those companies tend to cluster in business complexes.However she also found that many…See More
Apr 5
Tom Lous updated their profile
May 4, 2016

Profile Information

Short Bio
ML is my middle name.. well at least the middle part of my name.
My Web Site Or LinkedIn Profile
http://about.me/tomlous
Field of Expertise
Analytics, Data Integration, Visualization, BI, Other, Big Data, Data Science
Professional Status
Technical
Years of Experience:
17
Your Company:
GraphIQ / Datlinq
Industry:
IT
Your Job Title:
Big Data Software Engineer
How did you find out about DataScienceCentral?
Prismatic
Interests:
Finding a new position, Networking, New venture, Recruiting, Other
What is your Favorite Data Mining or Analytical Website?
http://www.kdnuggets.com/

Tom Lous's Blog

Record linking with Apache Spark’s MLlib & GraphX

Posted on April 4, 2017 at 11:00pm 5 Comments

The challenge

Recently a colleague asked me to help her with a data problem, that seemed very straightforward at a glance. 

She had purchased a small set of data from the chamber of commerce (Kamer van Koophandel: KvK) that contained roughly 50k small sized companies (5–20FTE), which can be hard to find online.

She noticed that many of those companies share the same address,…

Continue

Comment Wall

You need to be a member of Data Science Central to add comments!

Join Data Science Central

  • No comments yet!
 
 
 

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service