Subscribe to DSC Newsletter

Zygimantas Jacikevicius's Blog Posts Tagged 'Hadoop' (3)

The Hadoop Ecosystem: HDFS, Yarn, Hive, Pig, HBase and growing...

Hadoop is the leading open-source software framework developed for scalable, reliable and distributed computing. With the world producing data in the zettabyte range there is a growing need for cheap, scalable, reliable and fast computing to process and make sense of all of this data. The underlying technology for Hadoop framework was created by Google as there…

Continue

Added by Zygimantas Jacikevicius on November 25, 2015 at 1:20am — 4 Comments

Introduction to Apache Spark

New technologies continue to emerge enabling faster data processing and advanced analytics. The Hadoop platform was a great breakthrough in this space as it solved many of the storage and retrieval challenges for very large and varied datasets by dividing and processing across multiple machines. This was faster, more cost-effective, and less prone to failures than…

Continue

Added by Zygimantas Jacikevicius on October 14, 2015 at 4:06am — No Comments

10 tools and platforms for data preparation

Traditional approaches to enterprise reporting, analysis and Business Intelligence such as Data Warehousing, upfront modelling and ETL have given way to new, more agile tools and ideas. Within this landscape Data Preparation tools have become very popular for good reason.  Data preparation has traditionally been a very manual task and consumed the bulk of most data project’s time.  Profiling data, standardising it and transforming it has traditionally been very manual and error…

Continue

Added by Zygimantas Jacikevicius on September 16, 2015 at 3:00am — 6 Comments

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service