There is little time, about 3 or 4 years, if you wanted to process a large amount of textual data or web logs, you need to mobilize large servers and implement consistent SQL programs, long to be developed long and long to give results. Fortunately requests were few and generally volumes were measured at most in terabytes. Now e-commerce and social media have been largely developed, and many companies see their customer relationships, and therefore their survival, entirely dependent on the ability of their computer resources to analyze web logs and text data. In addition, for many of them, the volume is now in the hundreds of terabytes or even in petabytes.
Most young companies in the world of e-commerce or social media does not have the resources to implement the solutions mentioned above, they needed. Experts have sought other ways and developed new solutions more efficient and less expensive, including those based on distributed file systems (DFS) and MapReduce programs. In this context, the open source Hadoop implemented in Java was a great success. So now companies that want to deal with large volumes of textual data or web logs complete cost their decision information system with specialized analytical platform.
Some predict the demise of enterprise data warehouses as we know it today, especially as providers offer cloud solutions. This will not be undoubtedly the case even in the medium term, and we will see companies managing several specialized systems internal or external. However this is actually the end of the single centralized data warehouse that handles all corporate data that very few companies have actually implemented.
In fact pioneers certainly show the way forward, the solution is to deal with different specialized systems, new one for multi-structured data and traditional one for structured data, in private or public cloud model. In fact, solutions are now provided in three forms: software only, appliance or cloud, and pioneers opting for hybrid solutions. The choice between these options must be based on the specific requirements of each company: regulatory requirements, industry, business function, relationships with customers (privacy), available expertise, security, the impact of localization data, etc.
One of the major short-term difficulties faced by the pioneers is the lack of Big Data skills. The use of Big Data is what is called the Data Science, a discipline that combines math, programming and business acumen. To take advantage of Big Data it is necessary to invest in a team with such skills, and work closely with business and IT. Indeed, it is possible to find trends, patterns, segments, etc. we did not know, but these results do not change anything, we must transform these elements into business opportunities and ultimately into concrete actions on the market. Experts from the Data Science know pave the way, but cannot go alone to the end.
Among the pioneers we found companies of very different sizes from large groups like Wal-Mart, Wells Fargo, Boeing, and many web related companies like eBay, Google, Amazon, Yahoo, and many much smaller companies like Facebook (3000 people), LinkedIn (1700 people), etc.. To go further on this subject can usefully consult the following link: http://www.asterdata.com/customers/index.php