When dealing with building machine learning models, Data scientists spend most of the time on 2 main tasks when building machine learning models
Pre-processing and Cleaning
The major portion of time goes in to collecting, understanding, and analysing, cleaning the data and then building features. All the above steps mentioned are very important and critical to build successful machine learning…Continue
Recently a colleague asked me to help her with a data problem, that seemed very straightforward at a glance.
She had purchased a small set of data from the chamber of commerce (Kamer van Koophandel: KvK) that contained roughly 50k small sized companies (5–20FTE), which can be hard to find online.
She noticed that many of those companies share the same address,…
Data Analytics favorite Apache Spark, is progressing as a reference standard for Big Data, and a “fast and general engine for large-scale data processing”. In our previous post, we detailed how to expand ML tools using a PySpark kernel and leverage the …Continue
Added by Marc Borowczak on June 9, 2016 at 10:30am — No Comments