MongoDB One-to-Many Relationship tutorial with Mongoose examples
Model One-to-Many Relationships in MongoDB
Assume that you want to design a Tutorial Blog data model. Here are some relationships that you can think about:
A Tutorial has some Images (15 or less)
A Tutorial has many Comments
A Category has a lot of Tutorials
We call them One-to-Many relationships.
With the difference based on the quantity, we can distinguish between three types of One-to-Many relationships:
One-to-Few
One-to-Many
One-to-aLot
Depending on the types of relationships, on data access patterns, or on data cohesion, we will decide how to implement the data model, in other words, decide if we should denormalize or normalize data.
Let’s go to the next section, I will show you how to represent related data in a reference (normalized) form or in an embedded (denormalized) form.
Reference Data Models (Normalization)
In the MongoDB referenced form, we keep all the documents ‘separated’ which is exactly what ‘normalized’ means.
For example, we have documents for Tutorials and Comments. Because they are all completely different document, the Tutorial need a way to know which Comments it contains. That’s why the IDs come in. We’re gonna use the Comments’ IDs to make references on Tutorial document.
// Tutorial
{
_id: "5db579f5faf1f8434098f7f5"
title: "Tutorial #1",
author: "bezkoder"
comments: [ "5db57a03faf1f8434098f7f8", "5db57a04faf1f8434098f7f9" ],
}
// Comments
{
_id: "5db57a03faf1f8434098f7f8",
username: "jack",
text: "This is a great tutorial.",
createdAt: 2019-10-27T11:05:39.898Z
}
{
_id: "5db57a04faf1f8434098f7f9",
username: "mary",
text: "Thank you, it helps me alot.",
createdAt: 2019-10-27T11:05:40.710Z
}
You can see that in the Tutorial document, we have an array where we stored the IDs of all the Comments so that when we request Tutorial data, we can easily identify its Comments.
This type of referencing is called Child Referencing: the parent references its children.
Let’s think about the array with all the IDs. The problem here is that this array of IDs can become very large if there are lots of children. This is an anti-pattern in MongoDB that we should avoid at all costs.
That’s why we have Parent Referencing. In each child document we keep a reference to the parent element.
For example, a Category could have a lot of Tutorials, we don’t want to make a categories array with 200-500 items, so we normalize data with Parent Referencing.
We can also denormalize data into a denormalized form simply by embedding the related documents right into the main document.
So now we have all the relevant data about Comments right inside in one Tutorial document without the need to separate documents, collections, and IDs.
// Tutorial
{
_id: "5db579f5faf1f8434098f7f5"
title: "Tutorial #1",
author: "bezkoder"
comments: [
{
username: "jack",
text: "This is a great tutorial.",
createdAt: 2019-10-27T11:05:39.898Z
},
{
username: "mary",
text: "Thank you, it helps me alot.",
createdAt: 2019-10-27T11:05:40.710Z
}
]
}
Because we can get all the data about Tutorial and Comments at the same time, our application will need fewer queries to the database which will increase our performance.
So how do we actually decide if we should normalize or denormalize the data, keep them separated and reference them or embed the data?
When to use References or Embedding for MongoDB One-to-Many Relationships
As I’ve said before, we will decide how to implement the data model depending on the types of relationships that exists between collections, on data access patterns, or on data cohesion.
To actually take the decision, we need to combine all of these three criteria, not just use one of them in isolation.
Types of Relationships
– Usually when we have one-to-few relationship, we will embed the related documents into the parent documents. For example, a Tutorial has some Images (15 or less):
– For a one-to-many relationship, we can either embed or reference according to the other two criteria.
– With one-to-aLot relationship, we always use data references or normalizing the data. That’s because if we actually did embed a lot of documents inside one document, we could quickly make document become too large. For example, you can imagine that a Category has 300 Tutorials.
So the solution for that is, of course, referencing.
Data access patterns
Now with one-to-many relationship, are we gonna embed the documents or should we rather use data references? We will consider how often data is read and written along with read/write ratio.
– If the collections that we’re deciding about is mostly read and the data is not updated a lot, there is a lot more reading than writing (a high read/write ratio), then we should probably embed the data.
The reason is that by embedding we only need one trip to the database per query while for referencing we need two trips. In each query, we save one trip to the database, it makes the entire process way more effective.
For example, a blog Post has about 20-30 Images would actually be a good candidate for embedding because once these Images are saved to the database they are not really updated anymore.
– On the other hand, if our data is updated a lot then we should consider referencing (normalizing) the data. That’s because the database engine does more work to update and embed a document than a standalone document, our main goal is performance so we just use referencing for data model.
Now let’s assume that each Tutorial has many Comments. Each time someone posts a Comment, we need to update the corresponding Tutorial document. The data can change all the time, so this is a great candidate for referencing.
Data cohesion
The last criterion is just a measure for how much the data is related.
If two collections really intrinsically belong together then they should probably be embedded into one another.
In our example, all Tutorials can have many Images, every Image intrinsically belongs to a Tutorial. So Images should be embedded into the Tutorial document.
If we frequently need to query both of collections on their own, we should normalize the data into two separate collections, even if they are closely related.
Imagine that in our Tutorial Blog, we have a widget called Recent Images, and Images could belong to separated Tutorials. This means that we’re gonna query Images on their own collections without necessarily querying for the Tutorials themselves.
So, apply this third criterion, we come to the conclusion that we should actually normalize the data.
Another way is still embed Images (with appropriate fields) in Tutorial document, but also create Images collection.
All of this shows that we should really look all the three criteria together rather than just one of them in isolation. They are not really completely right or completely wrong ways of modeling our data.
Let’s implement each of them in a Node.js app using Mongoose.
Mongoose One-to-Many Relationship example
Setup Node.js App
Install mongoose with the command:
npm install mongoose
Create project structure like this:
src
models
Category.js
Tutorial.js
Image.js
Comment.js
index.js
server.js
package.json
Open server.js, we import mongoose and connect the app to MongoDB database.
That’s the first step, now we’re gonna create appropriate models and use mongoose to interact with MongoDB database. There are three cases that we will apply three types of one-to-many relationships:
Tutorial-Images: One-to-Few
Tutorial-Comments: One-to-Many
Category-Tutorials: One-to-aLot
Case 1: Mongoose One-to-Many (Few) Relationship
Now we will represent the relationship between Tutorial and its Images.
Let’s create Tutorial model with mongoose.Schema() constructor function.
In models/Tutorial.js, define Tutorial with 3 fields: title, author, images.
Now, if we want to embed Images (with appropriate fields) in Tutorial document, but also want to query Images on their own collections without necessarily querying for the Tutorials themselves, we can define Image model in Image.js like this:
The comments array field of Tutorial document contains reference IDs to Comments now.
This is the time to use populate() function to get full Tutorial data. Let’s create getTutorialWithPopulate() function like this: