HACKER Q&A
📣 zrannie

Do we have a metrics store?


For data scientists, we spent much of the time creating and optimizing features - thus, there're multiple feature stores at tech companies to build on previous/crowdsourcing experience.

When it comes to metrics design, especially with data coming from different areas like ads and content in a video, how to measure a "success" is no longer simple as CTR etc.

Ask: do we also have a metrics store to pick up brains from different data scientists/companies? Would love to hear if there's one.


  👤 valyala Accepted Answer ✓
Take a look at ClickHouse [1] and VictoriaMetrics [2]. Both solutions share architecture details and are optimized for high performance and low resource usage. They can handle trillons of rows (i.e. more than 10^12 rows) on a single node and can scale to multiple nodes.

[1] https://clickhouse.tech/

[2] https://github.com/VictoriaMetrics/VictoriaMetrics


👤 gas9S9zw3P9c
There is Prometheus [1]. It has bindings for pretty much all languages and integrates nicely with Grafana [2], which is used to plot Prometheus queries onto nice-looking dashboards. I have been using Prometheus for many projects and couldn't be happier with it. It's incredibly small, fast, reliable, and memory-efficient. On one of my clusters, it has been 2 years without any prometheus downtime.

Most commonly used infra services, e.g. databases like postgres, Minio, Kafka, Docker, or k8s, ship with Prometheus metrics out of the box, so you don't need to setup anything to monitor these. Just enable the metrics endpoint. It also integrates with long-term storage (like timescaledb) for more advanced queries on historical data.

Alternatively, there is InfluxDB. I haven't used it, so I can't speak to the differences. These seem to be the two big ones, I'm sure there is a whole range of smaller "startup-ish" competitors/projects as well. Splunk or other log analysis systems are another option I see commonly used. They are not a metrics store, but many companies seem to "abuse" their logs extract metrics. IMO not a good idea.

[1] https://prometheus.io/

[2] https://grafana.com/