It will come raw, naked and dirty. But before you clean, clothe and tame this beast, It would be a good idea to assess its value in domesticating it.
Big Data Junk Yard strategy lets you play around with big data in its natural form. The approach allows enough runway for technology to ramp up infrastructure and for business to find right use cases through data discovery. It’s easy, economical and quicker way to get on the big data band wagon.(See Big Data: Junk Yard to Gold Mine).
With that said, there is a need for some method to madness though. If you want your Big Data Junk Yard to be productive, some discipline is required to nurture and grow it so that you can start mining valuable gold soon enough. Here are the three recommended basic steps:
An Invisible Fence
It is important to draw a line in the sand. In my last blog, I strongly advocated for not investing too much time thinking about the use cases. However, that does not imply not investing any time at all and start hoarding every data set that’s available.
Aligning strategic business goals with the futuristic data needs will help defining the boundaries. This is also a great opportunity to engage business in the big data initiative. Start with a laundry list of data sources with some perceived value to business in next 3-5 years. Keep the list alive and agile but also use it as a guideline for your hoarding strategy. This will help you manage the mix and keep your junk yard from getting out of control.
Know your Junk
Keeping it organized and tagged will help save a tremendous amount of time and efforts for data scientists looking for those gold nuggets in the data junk yard. Keep all Chevys in one corner and Hondas in another. It will make that car mechanic scavenging for 1999 Chevy parts really happy. Start with a comprehensive tag list to mark your data and schemas. There are multiple technology options that you can find in the Hadoop eco-system that can help keeping the yard in order.
Ingest. Digest. Divest
Do not forget to spring clean. In order to enjoy the freedom of bringing home all new toys, we need to let go some old stuff that we know we will never play with again for sure. Hadoop may be a cheaper alternative to our traditional darling databases but it is still a finite resource.
It is recommended to form a joint governance group of business and technology stakeholders. The group can review the data discovery findings on a regular basis and can help align the big data strategy on what to ingest, what to digest and what to divest. This will be the first step of start refining the junk to get some gold.