Celebrate the Big Data Problems – #2
How to identify the no of buckets for a Hive table while executing the HiveQL DDLs ?
The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework.
Bucketing is another…Continue
Added by Kumar Chinnakali on January 21, 2016 at 7:41pm — No Comments
Hadoop is the leading open-source software framework developed for scalable, reliable and distributed computing. With the world producing data in the zettabyte range there is a growing need for cheap, scalable, reliable and fast computing to process and make sense of all of this data. The underlying technology for Hadoop framework was created by Google as there…Continue
Reference to Hadoop implies huge amount of data. The intend of the data is of course to derive insights that will help businesses stay competitive. "Scoring" the data is a common exercise in determining e.g. customer churn, fraud detection, risk mitigation, etc... It is one of the slowest analytics activities and especially when very large data set is involved. There are various fast scoring products in the market but they are very specialized and/or are provided by one vendor, usually…Continue
Added by Michael Walker on September 3, 2013 at 9:31pm — No Comments