I've done a couple projects where we kicked off with postgres using JSONB columns for early iteration. Then we gradually migrated to normal SQL columns as the product matured and our design decisions crystallized. That gave us basically all the benefits of mongodb but with a very smooth journey toward classical database semantics as we locked down features and scaled.
If we were going to start from scratch today, we'd probably use Postgres. But, realistically, the primary motivation behind that decision would be because Postgres is available on AWS, and that would centralize more of our operations. (DocumentDB is, of course available. Its not Mongo. I'd be curious to hear from people who actually had Mongo deployments and were able to move to DocumentDB; its missing so many of MongoDB's APIs that we physically can't, our applications would not run).
Mongo isn't that bad. It has limitations. You work within the limitations... or you don't. But I really don't think a valid option is "mongodb fuckin sucks m8, shit tier db". We're not going to be migrating terabytes of data and tens of thousands of lines of code when the benefit is tenuous for our business domain.
Should you use MongoDB today? I'll say No, but not for the reasons anyone else is giving. MongoDB's Cloud Provider agreement has decimated the cloud marketplace for the database. Essentially, if you want to run a version released in the past few years (4.2+), you need to be on their first-party Atlas product. Many other parties, especially the big clouds, are on 3.6 (or have compatibility products like DocumentDB/CosmosDB which target 3.6). Atlas is great. Its fairly priced and has a great UX and operations experience. But, I don't feel comfortable about there being political reasons why I couldn't change providers if that changes. If you have business requirements which demand, say, data in a specific region, or government-class infra, or specific compliance frameworks, Atlas may not be able to meet them.
> Jepsen evaluated MongoDB version 4.2.6, and found that even at the strongest levels of read and write concern, it failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level.
What always irks me is when somebody suggests PostgreSQL's json (or jsonb) types as an alternative to using MongoDB. All it's saying is that the person hasn't really invested a lot of time into MongoDB because there are things that PostgreSQL simply cannot do over a json type, especially when it comes to data updates. Or it can do that but the query is just overly complicated and often includes sub queries just to get the indexes into the arrays you want to update. All of that is simple in MongoDB, not really a surprise - that's exactly what it was made for. The last time I worked with PostreSQL's json I sometimes ended up just pulling the value out of the column entirely, modified it in memory and set the it back to the db because that was either way easier or the only way to do the operation I wanted (needless to say there are only exceptional cases where you can do that safely).
Lastly, if you can easily replace MongoDB with PostgreSQL and its json types or you're missing joins a lot (MongoDB does have left join but it's rarely needed), chances are you haven't really designed your data in a "document" oriented way and there's no reason to use MongoDB in that case.
I didn't like the project attitude of a database being so lax with persistence, so I never used it again.
I do like how easy it is to get a mongo instance up and running locally. I found maintenance tasks for mongo are much easier than postgres.
One thing you still need to do is manage indexes for performance, I've had to spend many a days tuning these.
I have come across some rather frustrating issues, for example a count documents call is executed as an aggregate call, but it doesn't do projection using your filters. e.g you want to count how many times the name 'hacker' appears. It will do the search against name, then do the $count, but because it doesn't do a projection, it will read the whole document in to do this. Which is not good when the property you're searching against has an index, so it shouldn't have to read in the document at all.
- The document model is a no-brainer when working with JS on the front-end. I have JSON from the client, and Dictionaries on the backend (Flask), so it's as easy as dumping into the DB via the pymongo driver. No object relational mapping.
- Can scale up/down physical hardware as needed so we only pay for what we use
- Sharding is painfully easily, with one click
- Support has been incredible
I'm a bit surprised that developers and systems engineers get burned to the point that they disconnect from the daily reality of their occupation, that software is often shaky in its infancy but almost always improves over time.
Worked for a valley startup back in 2013 that picked Mongo as primary data store because the CEO liked the simplicity and was too busy with the future to learn anything more complicated.
I only implemented a couple of features before I got out of that mess. But from my experience, compared to dozens of SQL and NoSQL databases I've worked with; it was definitely the worst option. We spent way too much time dealing with Mongo-specific issues and limitations.
I like SQL because it is way more expressive and helps you answer questions you didn’t know you would have. I find that incredibly valuable. Pretty tough to do in mongo. Generally BQ is good for this (in addition to your app DB) but if you use mongo you’re gonna have a hell of a time stuffing that sloppy schema into any column-oriented dB.
I like some of the serverless GCP dbs like datastore and firestore over mongo. They Index every field and force you to add composite indices on first query run by spitting out an error with a link to create it. If you understand their unique but simple API, limits, and quotas, they work predictably and scale nearly limitlessly.
For testing/TDD i use mongo-memory-database. It creates a isolated in-memory instance for each of my test suite, so there is no need for mocking.
The issue with most of this stuff is that a lot of projects just need some persistence and basic querying, and almost any database can do that equally fine. At that point, maintenance, ops workload in general or reliability are the differentiators and most of those go away when you run them as a service with your cloud provider of choice. Which specific one you use is practically selected for you: take all the ones that are compatible with your application or framework and sort by price.
There was a short period of time (1-2 years or so?) where I did my projects with Mongo, but right now I don't see what it can offer over something extremely stable like postgres, which also handles JSON amazingly well.
In our last 3 years of usage, we never faced any issues from DB. It's improved a lot in last 2-3 years and I can say it's rock solid right now.
I am basically finding no reason to switch to any other DB.
If MongoDB came up with a better sharding experience where every box is equal and you don't need the dance with shards on top of replica sets plus mongos plus config servers plus arbiters I might consider it again.
That is, data which can be reconstructed from a consistent event store at any point, and is organized into structures for fast querying and loading for display on the client.
- Backend load -> update -> store updates must be atomic and consistent
- Client reads are fine being eventually consistent
- Need to store structured data (ie a calendar event with the names of all participants) but also find and update nested data (ie a participant's name changes).
- Client queries need indexed filter and sort capability on specific document fields.
Mongo or similar databases seem like they might be ideal for this use case, since they allow storage of nested document data while still being able to index and perform updates on that nested data, but I haven't really seen a deep dive from anyone using it for this purpose.
One of the main cons I've experienced with it, is it's beginner friendly nature and docs leads you to have a non-optimal data schema for No-SQL. Like even the way it does pagination with Skip, is not the performant way to do it.
As areas in our business mature and scale we suffer bottlenecks and have pretty big changes to optimize those areas of the data model.
Is MongoDB the new mini-Oracle?
What do I define as "document-DB-appropriate"? Blobbed stuff where you want it all in the same document sized doses, and always the same dose, to the point where you feel ridiculous putting bits together in the same shape over and over and notice that you CRUD the same dose/shape all day... Objects...
Don't get me wrong, there's times in computer data land when the first 128 bytes are the header and that the data is multiplexed in 32bit chunks with 24 bits zero-padded and this padding tells you "channel 2 is starting now buddy!"
But SQL is certainly a viable option for many things and rather standard and known and supported and stuff...
My team used Mongo, but we have a classic relational use case. We’re being bit by application code joins, no cascade delete, sloppy schema changes.
When a document database makes sense, it happens, I go with CouchDB. Its multi-primary architecture is attractive compared to MongoDB with a single primary node. But I'm thinking about using PostgreSQL and jsonb next time.
And the recent change to a restrictive license is worrisome as well. I have been thinking of forking 3.4 and make it back to “true” open source and awesome performance. (If any C++ devs want to help out, reach out to me! username @gmail.com)
Against a specific use case. I was never burned by Mongo, but I wouldn’t choose it if I had to have one backend only. It lends itself nicely as part of an ecosystem imho
We have used SQL databases in the past, and we've had to vertically scale them with incredibly expensive hardware.
Our use case is dynamic dashboards generation where a document contains multiple components for our frontend to render. Having a simple unstructured database really helps us build the dashboard efficiently. Using a relational database would increase the development time of each dashboard tremendously and having relational integrity would make it even worse.
Having a simple document with everything needed is a much nicer experience. Granted, our use case is very limited and it is read only.
I also think the "cool" factor has stopped. It was very chic to use Mongo back in 2010 when everybody else was trying to scale with SQL. Nowadays, DynamoDB/CosmosDB/Cassandra eats Mongo for lunch.
My favorite DB is RethinkDB. It's a shame that the company behind it fizzled out and was absorbed by Stripe. I still cannot wrap my mind around why it's not more popular. It's similar to MongoDB but much better. It's the perfect database. It adds constraints which improve the quality of your code. Also RethinkDB scales very well and the control panel that comes with it is mind-bendingly powerful, I'm not kidding; you can click a button to shard or replicate a table over multiple hosts! WTF! I can't say the same about Postgres unfortunately. There is nothing truly remarkable about it.
I use Postgres for one of my projects today but purely because of compatibility reasons with an existing system. I don't understand what all the hype is with Postgres.
I'd be curious to know how they compare from folks who have used both?
I'd put SNMP telemetry data in Mongo, only for the fact that recording that can be somewhat lossy. Plainly, I just don't trust Mongo with consistency, availability, or partition tolerance.
And because Mongo's backup facilities suck (requires taking the DB into readonly, or accepting no time consistency egads), the only good way to do a backup is to put the DB on LVM, and making a LVM snapshot.
Has anyone looked at or used ArangoDB as a NoSQL solution? It seems pretty decent at first glance with some interesting flexibility.
There are probably good use cases for Mongo, but I haven't found one yet.
Firestore from Firebase is pretty similar and nice to work with, they also have a very generous free tier
Honestly, I get everything I need out of a JSONB field in postgres, so no, not for greenfield projects
Not worth migrating off it for small internal apps though.
There was a time when JSON wasn't that integrated into databases, it is now. That was really MongoDB's killer feature.
PostgreSQL does most of this better now and more robust/reliable.
I only used RethinkDB and DynamoDB.
Mongo pushes the idea of keeping related data in a single document. So if you have a hierarchy of data, keep in all in a nested document under a 'parent' concept, say an 'Account'. The problem with this is that there is a document limit of 16MB and key overhead is high. At one point we had to write a sharding strategy where by data would be sharded across multiple documents due to this limit. This also broke atomic updates so we had to code around that. We also ran into a problem where for some update patterns, mongo would read the entire document, modify it, then flush the entire document back to disk. For large documents, this became extremely slow. At one point this required an emergency migration of a database to TokuMX which has a fast update optimization that avoids this read-modify-write pattern in many cases, as I recall it was something like 25x faster in our particular situation. This same issue caused massive operations issues any time mongo failed over as the ram state of the secondary isn't kept up to date with the master so updates to large docs would result in massive latency spikes until the secondary could get it's ram state in order. In general we just found that mongo's recommended pattern of usage just didn't scale well at all, which is in contrast to its marketing pitch.
I think at one point we had something around 6TB of data spread across 3 mongo clusters. After migrating most of that to PG or other stores and reworking various processes that could now use SQL transactions and views, the data size is a small fraction of what it was in mongo and everything is substantially faster. In one extreme example there was a process that synced data to an external store as the result certain updates. Because we couldn't use single documents and had no cross document update guarantees we would have to walk almost the entire dataset for this update to guarantee consistency. It got to the point that this process took over 24 hours and we would schedule these update to run over a weekend as a result. With the data moved PG, that same process is now implemented as a materialized view that takes ~20 seconds to build and we sync every 15 minutes just to be sure. Granted this improvement isn't just a database change but rather an architectural change, however mongo's lack of multi-doc transactions and document size limit are what drove the architectural design in the first place.
Then there are bugs, of which there were many, but the worst of which was a situation where updates claimed to succeed, but actually just dropped the data on the floor. I found a matching issue in mongo's bug tracker that had been open for years at that point. Ultimately I just can't trust a datastore that has open data loss bugs for years, regardless of its current state.
Their "we will try some different form of Open Source (which really isn't open source at all, but we still want you to think so, because we know open source is popular) also AWS did something evil by using our software accoding to the license that we used for our software" thing really didn't inspire any trust.