Subscribe to DSC Newsletter

Traditional approaches to enterprise reporting, analysis and Business Intelligence such as Data Warehousing, upfront modelling and ETL have given way to new, more agile tools and ideas. Within this landscape Data Preparation tools have become very popular for good reason.  Data preparation has traditionally been a very manual task and consumed the bulk of most data project’s time.  Profiling data, standardising it and transforming it has traditionally been very manual and error prone.  This has derailed many Data Warehousing and analysis projects as they become bogged down with infrastructure and consistency issues rather than focusing on the true value add – producing good quality analysis.

Fortunately the latest generation of tools, typically powered by NoSQL technologies take a lot of this pain away. They enable users with reasonable technical skills to rapidly explore, understand and analyse datasets ranging from small data to data that is petabytes in scale.  Most tools also feature a variety of adaptors meaning that a variety of structured and semi-structured sources such as spreadsheets, database tables and XML / JSON content can also be explored and analysed.

It’s never been easier to rapidly derive value from disparate data. Here are 10 top tools that have impressed the consultants at Data to Value.

For more blogs, webinars, videos or data management solutions please visit our website www.datatovalue.co.uk

 1. Paxata

Paxata is a self-service adaptive data preparation platform that lets analysts quickly and painlessly collect, explore, combine and transform data. It offers high flexibility not requiring pre-defined models when analysing raw data, moreover it works with a wide variety of formats or data management systems for users to easily see relationships across various data-sets.

 2. Alteryx

Alteryx is a tool that enables a user to blend data from different sources in one seamless workflow. Alteryx minimises the need for extensive data preparation, enabling a user to easily access the data they need. It can handle structured and unstructured data in different formats and from various sources. It also makes it easy for users with different expertise to collaborate together on a single workflow and solve problems more efficiently.

3. Lavastorm

Lavastorm analytics engine helps business users to self-service large data-sets from virtually any source and any format, making quick business decisions easy without rigorous modelling scripting or planning. Users can quickly create and automate data with a wide variety of data set blending options without IT support. Moreover it supports a sharing function for even greater productivity.

 4. SAP Lumira

SAP Lumira helps to attain, manipulate and visualise complex and large data-sets across a wide range of sources and formats in the same view. This allows to produce useful analytics in beautiful visualisations that Business Objects users will be very familiar with.  A good choice for those seeking an enterprise strength tool.

 5. Platfora

Platfora is a visually rich and very advanced end-to-end solution for business analysis built in the Hadoop infrastructure with features such as in-memory computing. It features many uses of partner tools within the Big Data ecosystem and enables users to explore data quickly and efficiently without custom code. This saves time and ensures that insights are used in line with the most recent data. Users can interact with various set of multi structured data and ask emerging questions in the seamless manner.

 6. Teradata Loom

Teradata Loom provides a data management tool for the Hadoop data lake. Loom enables users to rapidly find prepare and analyse data within a Hadoop cluster. With Loom you can reuse existing data filters use a framework called “Active Scan” which constantly catalogues and profiles data in HTFS and Hive.

 

 7. DataWatch

DataWatch provides a visual platform for business analytics. It offers an all-in-one tool for data cleansing transforming and preparation from structured and unstructured datasets. It allows users to discover data in real-time and execute dynamic queries according to the business needs.

 

 8. Datameer

Datameer is a big data analytics platform purposively built for Hadoop. It combines self-service data, analytics and infographics in useful and easy way for stakeholders to interpret. It provides an end-to-end single workflow to simplify the big data analytics process.

 

 9. Tamr

Tamr connects and enriches data allowing to quickly leverage and reduce the effort to access it. It uses advanced algorithms, machine learning and human guidance to resolve any uncertainty. It continually builds a data inventory and an expert directory while continually enhancing data assets for useful insights.

 10. Rapidminer Studio

Rapidminer Studio is a popular open source predictive analytics platform that grew out of the Data Mining community. The platform provides all of the necessary tools for a mature data mining process. It provides accurate pre-processing, supports multiple interfaces and executes a wide range of operations ranging from data preparation to model building and validation.

  

Views: 19893

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Amy Flippant on June 5, 2017 at 12:36am

Thanks for the information Zygimantas! Another platform which enables users to derive value from their disparate data sources is the Denodo Platform. Data virtualization connects to all disparate sources making them available from one single virtual layer in real-time. It aids self-service BI, predictive analytics and greatly eliminates the time previously needed to data preparation. Thank you!

Comment by Natalya St. Clair on July 12, 2016 at 12:57pm

 Zygimantas, this is a great article that includes a thorough and thoughtful list. Thank you for putting it together.

One more platform I would like to add for grades 5–14 is a free one called CODAP, though I imagine adults might like getting their feet wet with it, too. I'm happy to chat more about it if you would like.

Comment by Brad Kolarov on January 19, 2016 at 7:02am
These are all great, recently I became part of a public beta for a tool called Stackspace. It does the data prep, but takes care of all the infrastructure provisioning and data ingest, so all I had to do was point and click. I think it is going live later this spring for folks who can't get on the beta.
Comment by Vincent de Stoecklin on December 29, 2015 at 6:17am

Thanks for the interesting read ! The rise of Data Science Platforms is clearly a strong trend on the market and there are several I hadn't identified.

Personally I'm a big fan of Dataiku's Data Science Studio, that, from what I read in your description, has a similar value proposition to Datameer, allowing for a single tool to encompass all the steps necessary to build and deploy a data-driven application.

Comment by Devon Guerrero on September 22, 2015 at 9:26am

Why not Trifacta?

Comment by Zygimantas Jacikevicius on September 18, 2015 at 12:27am

Hello Dennis, thank you for your comment and suggestion we might have a think about that one :) Off the top of my head, if you are looking for open source solutions I am sure Apache will have something to suit your needs!

Comment by Denis Rasulev on September 17, 2015 at 8:30pm

Good reading, thank you! Is there a chance to enrich this information with two subsets... (OMG, I speak datascience language.... :))) - commercial tools/platfoms and open-source ones? Thanks.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service