I tried to put 10 step process for big data projects. Pls correct or suggest any addition/change. Your inputs are highly appreciated.
1. You really need to have a problem(s) for which you are not able to find the solution directly with your existing metrics and reports.
2. You can have any size of data, however if it is small then you don't need to build any complex model around it. Its good if the size is good enough. Never use just a sample of data.
3. It becomes more interesting if the data is unstructured.
4. Tool evaluation - there are lot of options here, select the better one which you can afford and will suit your problem based on ur CTQ's.
5. Expertise - many are still learning however you would need a good mix of Programming & SQL experts apart from data scientists
6. Infrastructure - with commodity storage's you should be able to afford good size with the selected tech stack
7. Ingest, store & access should be fast, you can adopt different concepts
8. Start with flow charts for algorithms for the model, go back to board as & when required.
9. Have robust tool which can be used to build your model for analytics
10. Have a good visualizing tool for presentation