HACKER Q&A
📣 cetaphil

What DB should I use for a web service which will serve around 5B rows?


What DB should I use for a web service serving 5B rows of data? The data is queried using UUIDs.

The data keeps on growing at around 500k per week and is receiving around 800 queries per second.

Management wants to put it all to AWS. I need help. Any recommendations? I'd really appreicate it. Thanks in advance.


  👤 davismwfl Accepted Answer ✓
Kinda hard to answer without a bunch more detail.

In general, you could easily serve this out of MySQL or Postgres (or Aurora on AWS) if you partition & structure the data properly. If the queries are mainly "key" lookups to pull back the row and there isn't lots of joining going on etc then performance would be good too. I've had SQL Server & Postgres databases both larger then that and with considerable complexity and both did a solid job, mainly because we knew what we were doing (and found people who knew more to teach us).

Assuming it is mainly key/value type data you could also use something like Dynamo, but you have to also look at pricing there as Dynamo is pretty cool but it can get super expensive fast when your data grows.

There are still a number of other options too but a lot depends on what type of queries are happening (access patterns), the data itself, security etc. For example, S3 can easily be used if the record is stored by a UUID and that is all you need to access the data by, just store a file by UUID in S3 and you are done, queries are nearly instant and data can grow forever and be super reliable.

FWIW: I have done large data in corporate data centers and in AWS and Azure (not GCP yet). I absolutely favor using the cloud for these things as usually there are more options, it is easier to experiment, and in the end I don't have to fight with IT about getting the right resources.


👤 alttab
Dynamo.