I'm looking for learning material regarding the following information:
- Event based social game telemetry ETL with high frequency (let's say 50K JSON strings per second as a start) and high volume
(Think some events like game_load, level_beat, purchase_made)
- Must be loaded into a columnar database (think Vertica)
- Data modelling of the telemetries for business/data analysis
I can skip the first two points as I don't have control over that, but could be interesting to know the technology considerations. I think "Data intensive Application" would be a good read for those two.
I'm mostly interested in the last point -- how should I model the telemetry data, so that they can be easily used for business analysis? For games we typically look for answers for:
- How does an A/B test fare (engagement/revenue)
- Game Level difficulties
- Are we giving out too many coins? (Game economy)
My main concern is: with the introduce of columnar store, I see more and more often that we model the data into very wide tables. I understand it's done so for speeding up query speeds (less joins), but data modelling is also about business side, so how do we approach the problem?
Sorry for throwing a vague problem but that's the best way I can do...