HACKER Q&A
📣 markus_zhang

Learning material about storage of gaming telemetries in column storage?


Hi experts,

I'm looking for learning material regarding the following information:

- Event based social game telemetry ETL with high frequency (let's say 50K JSON strings per second as a start) and high volume

(Think some events like game_load, level_beat, purchase_made)

- Must be loaded into a columnar database (think Vertica)

- Data modelling of the telemetries for business/data analysis

I can skip the first two points as I don't have control over that, but could be interesting to know the technology considerations. I think "Data intensive Application" would be a good read for those two.

I'm mostly interested in the last point -- how should I model the telemetry data, so that they can be easily used for business analysis? For games we typically look for answers for:

- How does an A/B test fare (engagement/revenue)

- Game Level difficulties

- Are we giving out too many coins? (Game economy)

My main concern is: with the introduce of columnar store, I see more and more often that we model the data into very wide tables. I understand it's done so for speeding up query speeds (less joins), but data modelling is also about business side, so how do we approach the problem?

Sorry for throwing a vague problem but that's the best way I can do...


  👤 markus_zhang Accepted Answer ✓
The reason for this question is that I see a lot of books talking about the ETL part (e.g. Design Data Intensive App, and tons of others), and also a lot of books talking about the business analysis part (data science/analyis/mining), but except for Inno/Kimball I see very few books talking about data modelling FOR COLUMNAR DB, FOR GAME TELEMETRY DATA, etc. And I'm realy really not sure if star schema/dimensional modelling still makes sense as everyone I talked replied that in the age of big data, columnar database, we need very wide tables where we can find every fact we need and every dimension we need.