For a world, that's churning out and recording infinite volumes of data every second, where dependency on data is steeply rising, the need to implement Big Data architecture becomes natural.
Big Data solutions can resolve specific big data problems and requirements for data analysis, curation, capturing, sharing, searching, storage, transferring, querying, visualization and information privacy.
Implementation of big data architecture has to have a well-defined purpose. There are various things you must bear in mind when you are choosing the parts in order to understand what fits the bill for your big data solution. For that, it's essential to derive the greatest value from your Big Data implementation at the enterprise level. One way is to explore commonly existing architecture, patterns, and various tools/options available in the big data space.
What architects also need to understand is that Big Data application is not just Hadoop. These applications are generally built for analyzing the data. It could be real time data or data that's stored in your storage system. And then some analytical theories are executed on it.
If you look at the Big data application in a structure or flow, you will typically have multiple data sources, such as operational data sources that your enterprise has or you may be interested in some files related data.
The data is ingested first, i.e., you need to extract the data and then put it on your storage system. Big Data application also has a storage system where data is stored and then you process it. It can also be processed in real time.
However, in a real time scenario when you have to process the data, a single server may not be able to scale it. Therefore, the need to have a distributed application like Hadoop, becomes vital.
Having abundant experience in normal solutions architecture is a pre-requisite for a Big Data Solutions Architect. Besides that, having significant knowledge in major big data solutions like Hadoop, Impala, Oozie, Mahout, Flume, ZooKeeper and/or Sqoop, MapReduce, Hive, HBase, Cassandra and MongoDB can prove to be a big help.
Understanding the architecture is imperative for technical leads, senior developers, QA professionals. It's also quite useful for pre-sales professionals, who go out and sell solutions to the customers.
For detailed understanding of the topic, watch this video.
Key takeaways would be:
Originally posted here.