Subscribe to DSC Newsletter

I tried to put 10 step process for big data projects. Pls correct or suggest any addition/change. Your inputs are highly appreciated.
1. You really need to have a problem(s) for which you are not able to find the solution directly with your existing metrics and reports. 
2. You can have any size of data, however if it is small then you don't need to build any complex model around it. Its good if the size is good enough. Never use just a sample of data. 
3. It becomes more interesting if the data is unstructured. 
4. Tool evaluation - there are lot of options here, select the better one which you can afford and will suit your problem based on ur CTQ's. 
5. Expertise - many are still learning however you would need a good mix of Programming & SQL experts apart from data scientists 
6. Infrastructure - with commodity storage's you should be able to afford good size with the selected tech stack 
7. Ingest, store & access should be fast, you can adopt different concepts 
8. Start with flow charts for algorithms for the model, go back to board as & when required. 
9. Have robust tool which can be used to build your model for analytics 
10. Have a good visualizing tool for presentation

Views: 2226

Tags: bigdata

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent Granville on July 23, 2014 at 5:11am

I believe smaller data sets require more complex models, as they lack statistical power. It depends what you mean by small. Do you have a big data sera, but build a statistical model on summary statistics (aggregated tables) only? That's a good idea.

At the other end of the spectrum, very big data sometimes requires special techniques to discriminate between the few real valuable signals and abundant noise. Read my article on detecting spurious correlations for details.

Videos

  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service