Datameer, an end-to-end big data analytics platform, is built on Apache Hadoop to perform integration, analysis, and visualization of massive volumes of both structured and unstructured data. It can be rapidly integrated with any data sources such as new and existing data sources to deliver an easy-to-use, cost-effective, and sophisticated solution for big data analytics.
It simplifies data extraction, data transformation, data loading, and…
ContinueAdded by Raghavan Madabusi on June 19, 2017 at 6:30pm — No Comments
Dataiku Data Science Studio (DSS), a complete data science software platform, is used to explore, prototype, build, and deliver data products. It significantly reduces the time taken by data scientists, data analysts, and data engineers to perform data loading, data cleaning, data preparation, data integration, and data transformation when building powerful predictive applications.
It is easy and more user-friendly to explore the data and…
ContinueAdded by Raghavan Madabusi on June 15, 2017 at 2:30pm — No Comments
Nowadays, there are numerous risks related to bank loans both for the banks and the borrowers getting the loans. The risk analysis about bank loans needs understanding about the risk and the risk level. Banks need to analyze their customers for loan eligibility so that they can specifically target those customers.
Banks wanted to automate the loan eligibility process (real time) based on customer details such as Gender, Marital Status, Age,…
ContinueAdded by Raghavan Madabusi on May 10, 2017 at 4:30pm — 1 Comment
Call Detail Record (CDR) is the information captured by the telecom companies during Call, SMS, and Internet activity of a customer. This information provides greater insights about the customer’s needs when used with customer demographics. Most of the telecom companies use CDR information for fraud detection by clustering the user profiles, reducing customer churn by usage activity, and targeting the profitable customers by using RFM…
ContinueAdded by Raghavan Madabusi on May 10, 2017 at 2:00pm — 1 Comment
In the customer management lifecycle, customer churn refers to a decision made by the customer about ending the business relationship. It is also referred as loss of clients or customers. Customer loyalty and customer churn always add up to 100%. If a firm has a 60% of loyalty rate, then their loss or churn rate of customers is 40%. As per 80/20 customer profitability rule, 20% of customers are generating 80% of revenue. So, it is very important…
ContinueAdded by Raghavan Madabusi on April 13, 2017 at 6:00pm — 7 Comments
This is second in a two part series that talks about Text Normalization using Spark.In this blog post, we are going to understand the jargon (jobs,stags and executors) of Apache Spark with Text Normalization application using Spark history server UI.
Web UI (aka Application UI or webUI or Spark UI) is the web interface of a running Spark application to…
ContinueAdded by Raghavan Madabusi on April 12, 2017 at 12:30am — No Comments
Data matching is the task of identifying, matching, and merging records that correspond to the same entities from several source systems. The entities under consideration most commonly refer to people, places, publications or citations, consumer products, or businesses. Besides data matching, the names most prominently used are record or data linkage, entity resolution, object identification, or field matching.
A major challenge in data matching is the lack of common entity…
ContinueAdded by Raghavan Madabusi on February 20, 2017 at 2:30pm — No Comments
Metabase, an open source, easy-to-use database visualization tool, is built and maintained by a dedicated Metabase team and comes with a Crate driver. It is written in Clojure and offers multiple options such as Mac application, Docker image, cloud images, and a jar file, which are specifically designed for particular use cases.
Metabase is mainly used for analyzing your existing data on a daily basis by quickly fetching answers to your common queries without dealing with complex…
Added by Raghavan Madabusi on August 21, 2016 at 12:30am — No Comments
Apache Zeppelin, a web-based notebook, enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and…
ContinueAdded by Raghavan Madabusi on May 23, 2016 at 1:30am — 1 Comment
A data-driven organization will use the data as critical evidence to help inform and influence strategy. To be data-driven means cultivating a mindset throughout the business to continually use data and analytics to make fact-based business decisions. Becoming a data-driven organization is no longer a choice, but a necessity. Making decisions based on data-driven approaches not only increases the accuracy of results but also provides consistency in how the results are interpreted and fed…
ContinueAdded by Raghavan Madabusi on February 11, 2016 at 5:00am — No Comments
The advent of NoSQL databases has lead many application developers, designers, and architects to apply the most appropriate means of data storage to each specific aspect of their systems, and this may involve implementing multiple types of database and integrating them into a single solution. The result is a polyglot solution.
Designing and implementing a polyglot system is not a straightforward task and there are a number of questions that need to be addressed…
ContinueAdded by Raghavan Madabusi on February 1, 2016 at 7:41am — No Comments
Apache Drill is a low-latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data. Inspired by Google’s Dremel, Drill is designed to scale to several thousands of nodes and query petabytes of data at interactive speeds that BI/Analytics environments require.
Apache Drill includes a distributed execution environment, purpose built for large-scale data processing. At the core of Apache Drill is the “Drillbit” service which…
ContinueAdded by Raghavan Madabusi on March 31, 2015 at 3:10pm — No Comments
This blog post is a follow up post to Embrace Relationships with Neo4J, R & Java
Neo4j Cypher is a declarative graph query language that allows for expressive and efficient querying and updating of the graph store. Cypher is a relatively simple but still very powerful language. Very complicated database queries can easily be expressed through Cypher. This allows…
ContinueAdded by Raghavan Madabusi on September 26, 2014 at 2:17am — No Comments
Graphs are everywhere, used by everyone, for everything. Neo4j is one of the most popular graph database that can be used to make recommendations, get social, find paths, uncover fraud, manage networks, and so on. A graph database can store any kind of data using a Nodes (graph data records), Relationships (connect nodes), and Properties (named data values).
A graph database can be used for connected data which is otherwise not possible with either relational or other NOSQL databases…
ContinueAdded by Raghavan Madabusi on September 19, 2014 at 6:31pm — No Comments
© 2019 Data Science Central ®
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles