Subscribe to DSC Newsletter

Luba Belokon's Blog (16)

The True Way to Define Business Metrics for Startups

Data has become a cargo cult. Collect more data, calculate more metrics, hire more analysts, let them figure out what this is all for – and you’re considered to be data driven. I've had it up to here while consulting startups over the past three years and helping them to define business…

Continue

Added by Luba Belokon on June 8, 2018 at 1:00am — No Comments

Simple Tips for PostgreSQL Query Optimization

A single query optimization tip can boost your database performance by 100x. At one point, we advised one of our customers that had a 10TB database to use a date-based multi-column index. As a result, their date range query sped up…

Continue

Added by Luba Belokon on May 17, 2018 at 2:30am — No Comments

Open Source ETL: Apache NiFi vs Streamsets

After reviewing 8 great ETL tools for fast-growing startups, we got a request to tell you more about open source solutions.There are many open source ETL…

Continue

Added by Luba Belokon on April 26, 2018 at 2:30am — No Comments

Choosing Between Modern Data Warehouses



When our customers ask us what the best data warehouse is for their growing company, we consider the answer based on their specific needs. Usually, they need nearly real-time data for a low price without the need to maintain data warehouse…

Continue

Added by Luba Belokon on April 19, 2018 at 5:30am — No Comments

ETL vs ELT: Considering the Advancement of Data Warehouses

ETL stands for Extract, Transform, Load. It has been a traditional way to manage analytics pipelines for decades. With the advent of modern cloud-based data warehouses, such as BigQuery or Redshift, the traditional concept of ETL is changing towards ELT – when you’re running transformations right in the data warehouse. Let’s see why it’s happening, what it means to have ETL vs ELT, and what we can expect in the future.

ETL is hard and outdated

ETL arose to solve a problem of…

Continue

Added by Luba Belokon on April 6, 2018 at 7:30am — No Comments

Data Collection Tools for Events Analytics

One of the first things we do after launching a website nowadays is connect to Google Analytics. A little bit down the road we’ll connect more “out-of-box” analytics tools to calculate funnels, retention, A/B tests, and more.

These tools are great and work fine until a company gets bigger and analytics requirements get more sophisticated. It’s time to set up a data infrastructure, which means selecting a data collection tool, ETL tool, data warehouse, and BI tool on top of…

Continue

Added by Luba Belokon on March 30, 2018 at 3:30am — No Comments

Event Analytics: How to Define User Sessions with SQL

Many of “out-of-the-box” analytics solutions come with automatically defined user sessions. It’s good to start with, but as your company grows, you’ll want to have your own session definitions based on your event data. Analyzing user sessions with SQL gives you flexibility and full control over how metrics are defined for your unique business.  

What is a session and why should I care?



The session is usually defined as a group…

Continue

Added by Luba Belokon on March 27, 2018 at 1:00am — No Comments

Event Analytics: How to Define User Sessions with SQL

Quite recently we’ve built event analytics for our team and thought to share this experience with you in this post .

Many of “out-of-the-box” analytics solutions come with automatically defined user sessions. It’s good to start with, but as your company grows, you’ll want to have your own session definitions based on your event data. Analyzing user…

Continue

Added by Luba Belokon on February 8, 2018 at 7:30am — No Comments

Using SQL to Estimate Customer Lifetime Value (LTV) without Machine Learning

The Statsbot team estimated LTV 592 times for different clients and business models. 

Customer lifetime value, or LTV, is the amount of money that a customer will spend with your business in their “lifetime,” or at least, in the portion of it that they spend in a relationship with you. It’s an important indicator of how much you can spend on acquiring new customers. For example, your customer acquisition cost (CAC) is $150, and LTV is…

Continue

Added by Luba Belokon on February 1, 2018 at 8:30am — No Comments

Machine Learning Algorithms: Which One to Choose for Your Problem

When I was beginning my way in data science, I often faced the problem of choosing the most appropriate algorithm for my specific problem. If you’re like me, when you open some article about machine learning algorithms, you see dozens of detailed descriptions. The paradox is that they don’t ease the choice.

In this article, I will try to explain basic concepts and give some intuition of using different…

Continue

Added by Luba Belokon on October 26, 2017 at 6:00am — No Comments

Improving Real-Time Object Detection with YOLO

In recent years, the field of object detection has seen tremendous progress, aided by the advent of deep learning. Object detection is the task of identifying objects in an image and drawing bounding boxes around them, i.e. localizing them. It’s a very important problem in computer vision due its numerous applications from self-driving cars to security and tracking.

Prior approaches of object detection…

Continue

Added by Luba Belokon on October 19, 2017 at 8:30am — No Comments

Bayesian Nonparametric Models

Bayesian Nonparametrics is a class of models with a potentially infinite number of parameters. High flexibility and expressive power of this approach enables better data modelling compared to parametric methods.

Bayesian Nonparametrics is used in problems where a dimension of interest grows with data, for example, in problems where the number of features is not fixed but allowed to vary as we observe more…

Continue

Added by Luba Belokon on October 12, 2017 at 3:00pm — No Comments

DevOps Pipeline for a Machine Learning Project

Machine learning is getting more and more popular in applications and software products, from accounting to hot dog recognition apps. When you add machine learning techniques to exciting projects, you need to be ready for a number of difficulties. The Statsbot team asked Boris Tvaroska to tell us how to prepare a DevOps pipeline for an ML…

Continue

Added by Luba Belokon on October 4, 2017 at 7:30am — No Comments

Generative Adversarial Networks (GANs): Engine and Applications


Generative adversarial networks (GANs) are a class of neural networks that are used in unsupervised machine learning. They help to solve such tasks as image generation from descriptions, getting high resolution images from low resolution ones, predicting which drug…

Continue

Added by Luba Belokon on August 17, 2017 at 6:30am — No Comments

Machine Learning Translation and the Google Translate Algorithm

Years ago, it was very time consuming to translate the text from an unknown language. Using simple vocabularies with word-for-word translation was hard for two reasons: 1) the reader had to know the grammar rules and 2) needed to keep in mind all language versions while translating the whole sentence.

Now, we…

Continue

Added by Luba Belokon on August 1, 2017 at 5:00am — No Comments

Recommendation System Algorithms

Today, many companies use big data to make super relevant recommendations and growth revenue. Among a variety of recommendation algorithms, data scientists need to choose the best one according a business’s limitations and requirements.

To simplify this task, my team has prepared an overview of the main existing recommendation system…

Continue

Added by Luba Belokon on July 28, 2017 at 4:00am — No Comments

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service