Subscribe to DSC Newsletter

Which IT infrastructure for Big Data?

A Big Data decision support system requires particular capabilities in terms of volume, variety of data and processing speed.

Today companies to improve their knowledge models and forecasts, do not hesitate to take into account hundreds of factors, and do not hesitate to bring up new means of analysis that can handle large volumes of data. But the processing of large volumes of data is a challenge for traditional BI infrastructure. Storing large volumes is not a problem, but requires the use of massively parallel architectures, such as those offered by Teradata for example, or "MapReduce solutions" such as Hadoop or Aster Data. Here the choice of solution depends on the variety of data types to be treated and the expected velocity. MapReduce is indeed better than a relational database to handle unstructured data, and if Hadoop is then batch Aster Data is real time. Since there is no silver bullet solution, big companies bring a mix of resources to allow them to enjoy the benefits of different types of solutions.

From the moment you want to take into account all kinds of data, text, data from various sensors, geolocation data, data from social networks, pictures, etc. ..., these data do not present in a perfectly ordered and are not immediately ready for analytical use. Even the data from the web are not perfect from the start. Big Data systems common task is to support unstructured or multi-structured data, and process them to make them consumable by humans or analytical applications. A classic example in word processing is to determine what a word refers: Is Paris the capital of France? Is Paris city in Illinois? Is Paris the famous people? Etc. With Big Data, you have also to find the best way to store data, and relational databases are not always the most successful way possible, in particular for XML data or networks (graphs) data for example. Even where there is not an incompatible data type, a drawback of the relational database is the static nature of his patterns, and you should prefer Semistructured NoSQL databases, which provide enough structure to organize data, but do not require an exact pattern of the data before storing.

Speed requirements for data processing in recent years have increased similarly as volumes. This no longer concerns only a few specialized companies such as financial operators (traders), but most key economic sectors. In the era of mobile Internet the pace of business has accelerated, we consume differently, the forms of competition have changed and information flows as well. For example online retailers are able to track clicks for each client, from their first interaction to final sale. Those who are able to quickly use this information, recommending additional purchases for example, acquire a distinct competitive advantage.

The challenge is not only to take in charge the volume of incoming data, but especially speed analysis and relevant trigger actions. The freshness of the information delivered is paramount. For example: did you will walk across a street without looking, relying only on a view of traffic taken five minutes before? The speed feedback is a source of competitive advantage, especially for all web activities.

 

Faced with such needs, usual decision support technologies are enabled to support the rhythm, and only a mix of solutions enables businesses to meet expectations. Thus enterprises like eBay or LinkedIn for example, use a mix of traditional DBMS and new NoSQL solutions.

Views: 956

Tags: Analytics, Big, Business, DBMS, Data, Intelligence, Mining, NoSQL, Warehouse

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service