One of the first things we do after launching a website nowadays is connect to Google Analytics. A little bit down the road we’ll connect more “out-of-box” analytics tools to calculate funnels, retention, A/B tests, and more.
These tools are great and work fine until a company gets bigger and analytics requirements get more sophisticated. It’s time to set up a data infrastructure, which means selecting a data collection tool, ETL tool, data warehouse, and BI tool on top of that.
In the startup world this usually happens when a company has raised Series A and has around 25-50 employees. Google Analytics and other web analytics tools are not enough anymore. Their costs are rising, but requirements wise they are not delivering what you need. Also, at this point you probably have a lot of data in other places as well, such as production databases, marketing and sales tools, and you want your reports to consolidate data from all these places.
For the scope of this blog post, we’ll give you an overview of the best data collection tools specifically for events data.
You can probably build a simple proof of concept with AJAX requests sending events to your server and then writing it to your database in a couple of hours. But, for a production-ready solution at scale it could easily become a full-time job for several engineers at your company.
We’ve seen companies having this and loving it, and other companies suffering from maintenance and hoping to move away from this in-house setup to data collection tools at some point.
We’d recommend almost never doing it yourself, unless you have very specific requirements and/or a use case, where it’s impossible (or highly inefficient) to use available on-the-market options.
Pros: It is fun to write a code.
Cons: You probably need to focus on your core business instead.
Segment is quite popular for organizing events that stream to services, such as Google Analytics, Mixpanel, etc. We see that it became a part of the stack for a lot of companies early on. In a case when you have Segment already, it could be one of the most painless ways to “upgrade to SQL.” You just need to enable your warehouse as a data source and you’re good to go.
Pricing depends on monthly active users, Segment calls it Monthly Tracking Users (MTU). If you have a lot of users, which is usually the case for fast growing B2C startups, Segment could become quite expensive: 100,000 MTU is ~ $1,000 per month.
Segment could be a good option if you already use it to route events to different destinations and you don’t expect a high volume of monthly active users.
Pros: Easy to start if you already use Segment; good ecosystem with a lot of guides and ready to go solutions.
Cons: Vendor lock-in; your bill could go crazy.
Snowplow is an open source web, mobile, and event analytics platform. Since it’s open source you don’t have a vendor lock-in here and should not have to worry about bills getting crazy. However, the initial implementation could be pretty expensive and you probably need to hire a consultant if your team doesn’t have enough experience.
There are some options to make it easier, such as hosting your Snowplow collector at third party providers. As you scale, you can always host it yourself at some point.
Besides initial cost, you should also consider future maintenance, since you’re going to host it yourself. Snowplow itself is battle tested and production ready on a big scale. It is more a question of getting enough expertise to implement and maintain it later.
Pros: No vendor lock-in, good ecosystem with technology and consultant partners.
Cons: Initial implementation could be quite expensive.
Firebase started as a realtime backend-as-a-service. After it was acquired by Google in 2014, Firebase evolved into a bigger platform providing more features besides realtime backend, such as crash reporting, push notifications, and analytics.
With Firebase Analytics you can collect events data and assign properties to users. But the reason we mentioned it in our data collection tools overview is that it has native BigQuery integration, which makes it very convenient to load your data from Firebase to BigQuery.
It’s a go-to option if you already use other Firebase features, you have BigQuery, and are ready to have a long-term relationship with Google Infrastructure.
Pros: Easy to start if you have Firebase already; scales perfectly; affordable pricing.
Cons: Lock-in on Google Cloud products; iOS/Android only
Heap is a mobile and web analytics platform, similar to Mixpanel or Amplitude. The main difference of Heap is that it tracks everything automatically, you don’t need to specify events you want to send in your app. You create new events in the Heap interface by setting some rules, for example: a click on a specific button on a specific page could be considered a “Purchase event.”
Also, what makes Heap different and why it is here in the data collection tools list is that Heap provides a SQL feature, which basically is a managed Redshift instance. Heaps "owns" warehouse, but you can connect your BI tool by requesting credentials from the Heap team.
The “codeless” event creation is probably not a huge bonus here, since you want your data team to control your raw data and expose transformed data models to end users. However, not placing snippets of Heap’s code across your application to send events could make life easier in case of migration away from Heap.
Pros: Easy to start for Heap users.
Cons: Vendor lock-in both on data collection and data warehousing.