Datameer, an end-to-end big data analytics platform, is built on Apache Hadoop to perform integration, analysis, and visualization of massive volumes of both structured and unstructured data. It can be rapidly integrated with any data sources such as new and existing data sources to deliver an easy-to-use, cost-effective, and sophisticated solution for big data analytics.
It simplifies data extraction, data transformation, data loading, and…Continue
Added by Raghavan Madabusi on June 19, 2017 at 6:30pm — No Comments
Dataiku Data Science Studio (DSS), a complete data science software platform, is used to explore, prototype, build, and deliver data products. It significantly reduces the time taken by data scientists, data analysts, and data engineers to perform data loading, data cleaning, data preparation, data integration, and data transformation when building powerful predictive applications.
It is easy and more user-friendly to explore the data and…Continue
Added by Raghavan Madabusi on June 15, 2017 at 2:30pm — No Comments
Nowadays, there are numerous risks related to bank loans both for the banks and the borrowers getting the loans. The risk analysis about bank loans needs understanding about the risk and the risk level. Banks need to analyze their customers for loan eligibility so that they can specifically target those customers.
Banks wanted to automate the loan eligibility process (real time) based on customer details such as Gender, Marital Status, Age,…Continue
Call Detail Record (CDR) is the information captured by the telecom companies during Call, SMS, and Internet activity of a customer. This information provides greater insights about the customer’s needs when used with customer demographics. Most of the telecom companies use CDR information for fraud detection by clustering the user profiles, reducing customer churn by usage activity, and targeting the profitable customers by using RFM…Continue
In the customer management lifecycle, customer churn refers to a decision made by the customer about ending the business relationship. It is also referred as loss of clients or customers. Customer loyalty and customer churn always add up to 100%. If a firm has a 60% of loyalty rate, then their loss or churn rate of customers is 40%. As per 80/20 customer profitability rule, 20% of customers are generating 80% of revenue. So, it is very important…Continue
This is second in a two part series that talks about Text Normalization using Spark.In this blog post, we are going to understand the jargon (jobs,stags and executors) of Apache Spark with Text Normalization application using Spark history server UI.
Web UI (aka Application UI or webUI or Spark UI) is the web interface of a running Spark application to…Continue
Added by Raghavan Madabusi on April 12, 2017 at 12:30am — No Comments
Data matching is the task of identifying, matching, and merging records that correspond to the same entities from several source systems. The entities under consideration most commonly refer to people, places, publications or citations, consumer products, or businesses. Besides data matching, the names most prominently used are record or data linkage, entity resolution, object identification, or field matching.
A major challenge in data matching is the lack of common entity…Continue
Added by Raghavan Madabusi on February 20, 2017 at 2:30pm — No Comments
Metabase, an open source, easy-to-use database visualization tool, is built and maintained by a dedicated Metabase team and comes with a Crate driver. It is written in Clojure and offers multiple options such as Mac application, Docker image, cloud images, and a jar file, which are specifically designed for particular use cases.
Metabase is mainly used for analyzing your existing data on a daily basis by quickly fetching answers to your common queries without dealing with complex…
Added by Raghavan Madabusi on August 21, 2016 at 12:30am — No Comments
Apache Zeppelin, a web-based notebook, enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and…Continue
A data-driven organization will use the data as critical evidence to help inform and influence strategy. To be data-driven means cultivating a mindset throughout the business to continually use data and analytics to make fact-based business decisions. Becoming a data-driven organization is no longer a choice, but a necessity. Making decisions based on data-driven approaches not only increases the accuracy of results but also provides consistency in how the results are interpreted and fed…Continue
Added by Raghavan Madabusi on February 11, 2016 at 5:00am — No Comments
The advent of NoSQL databases has lead many application developers, designers, and architects to apply the most appropriate means of data storage to each specific aspect of their systems, and this may involve implementing multiple types of database and integrating them into a single solution. The result is a polyglot solution.
Designing and implementing a polyglot system is not a straightforward task and there are a number of questions that need to be addressed…Continue
Added by Raghavan Madabusi on February 1, 2016 at 7:41am — No Comments
Apache Drill is a low-latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data. Inspired by Google’s Dremel, Drill is designed to scale to several thousands of nodes and query petabytes of data at interactive speeds that BI/Analytics environments require.
Apache Drill includes a distributed execution environment, purpose built for large-scale data processing. At the core of Apache Drill is the “Drillbit” service which…Continue
Added by Raghavan Madabusi on March 31, 2015 at 3:10pm — No Comments
This blog post is a follow up post to Embrace Relationships with Neo4J, R & Java
Neo4j Cypher is a declarative graph query language that allows for expressive and efficient querying and updating of the graph store. Cypher is a relatively simple but still very powerful language. Very complicated database queries can easily be expressed through Cypher. This allows…Continue
Added by Raghavan Madabusi on September 26, 2014 at 2:17am — No Comments
Graphs are everywhere, used by everyone, for everything. Neo4j is one of the most popular graph database that can be used to make recommendations, get social, find paths, uncover fraud, manage networks, and so on. A graph database can store any kind of data using a Nodes (graph data records), Relationships (connect nodes), and Properties (named data values).
A graph database can be used for connected data which is otherwise not possible with either relational or other NOSQL databases…Continue
Added by Raghavan Madabusi on September 19, 2014 at 6:31pm — No Comments