Deep Dive Into MongoDB And Its Architecture
MongoDB(nosql) is a widely adopted database that doesn’t follow the traditional approch of relationships. In this blog, we’ll cover the architecture of MongoDB, how it works and what makes it best at what it does.
Let’s jump right into it.
Databases are used to store and manage data on disk. Storing data in disk is database management system and storing data in memory is data structure, we are structuring our data to make it optimistic.
MongoDB is a NOSQL database that uses document-oriented data model which means data is stored in BSON(Binary JSON) format.
MongoDB makes use of collections that is similar to table in Postgres or MySql.
Before jumping into mongoDB. Let’s understand how disk storage works and how database stored data on disk.
In a unix file system, we have a hierarchical file system that organizes files and directories into a tree-like structure. In the Unix file system, each file and directory has a unique path, starting from the root directory represented by the “/” symbol.
When a file is created in the Unix file system, it is stored as a sequence of bytes on disk. The disk is divided into blocks, and each file is stored in one or more blocks. The blocks are grouped into larger units called allocation blocks or disk blocks.
The Unix file system uses an inode (short for index node) to manage the storage of a file. An inode is a data structure that contains information about a file, such as its size, creation time, and permissions.
When a file is created, the Unix file system assigns an inode to the file and stores information about the file in the inode. The inode also contains pointers to the disk blocks that store the actual data for the file.
When a file is read, the Unix file system uses the inode to locate the disk blocks that contain the file’s data. The data is then read from disk and returned to the application that requested it.
The Unix file system uses a block map, also known as a bitmap, to keep track of which disk blocks are in use and which are free. The block map is a data structure that contains a bit for each disk block.
Now, we have an idea of how file is storted and read in unix file system. Let’s take a look at how mongoDB handle these things.
How mongoDB works?
MongoDB uses a memory-mapped file system to store its data. This means that the data is stored on disk in a binary format, and the operating system maps a portion of the file system into memory, allowing MongoDB to access the data directly.
This provides a performance boost over traditional file systems, as it allows MongoDB to access the data without having to read it from disk into memory.
Let’s try to understand what a document life cycle is before it is stored on disk.
- Document validation: Document validation is the process of checking the structure and contents of a document to ensure that it meets the specified requirements. In MongoDB, document validation can be applied to collections, which are equivalent to tables in a relational database.
For example, consider a collection for storing blog posts in a blogging platform. To ensure that all blog posts have a title, author, and body, you could specify a validation rule for the collection, such as:
const MongoClient = require('mongodb').MongoClient;
const uri = "mongodb+srv://<username>:<password>@cluster0.mongodb.net/test?retryWrites=true&w=majority";
const client = new MongoClient(uri, { useNewUrlParser: true });
client.connect(err => {
const db = client.db("test");
const blog_posts = db.collection("blog_posts");
// Define the validation rule for the collection
blog_posts.createIndexes([
{
key: { title: 1, author: 1, body: 1 },
unique: true
}
], { validator: { $jsonSchema: {
bsonType: "object",
required: ["title", "author", "body"],
properties: {
title: {
bsonType: "string",
description: "must be a string and is required"
},
author: {
bsonType: "string",
description: "must be a string and is required"
},
body: {
bsonType: "string",
description: "must be a string and is required"
}
}
} } });
// Insert a document into the collection
blog_posts.insertOne({
title: "My First Blog Post",
author: "John Doe",
body: "This is my first blog post"
}, function(err, res) {
console.log("Document inserted");
client.close();
});
});
This validation rule specifies that the collection must contain documents with three required fields: title
, author
, and body
. Each field must be a string. If a document is inserted into the collection that does not meet these requirements, MongoDB will reject the insertion and return an error.
2. Document Storage: The validated document is then stored in the collection on disk as a binary representation. As you can see in the above code snippet, we have used a InsertOne
method to insert the document into the collection. If the insert operation is successful, the message “Document inserted” will be printed. If there is an error inserting the document, an error will be returned.
3. Index Creation: If there are any indexes defined for the collection, MongoDB creates an entry for the new document in each of the indexes. This allows for fast and efficient querying of the data. Let say, we have defined an index on title
field.
// Create an index on the title field
blog_posts.createIndex({ title: 1 }, function(err, result) {
console.log("Index created on title field");
});
Every newly inserted document will be added to the index as well for faster access at read time.
MongoDB uses B-trees to implement its indexing. You can read indepth from here.
4. Memory Allocation: MongoDB also allocates a portion of its memory for the new document. This allows for fast access to the data without having to read it from disk into memory.
We can take example of above
insertOne
method, when we are inserting a new document into theblog_posts
collection using theinsertOne
method. When the document is inserted, MongoDB will allocate a portion of its memory to store the document. TheinsertOne
method returns an object that contains information about the inserted document, including its unique identifierinsertedId
, which represents the memory allocation for the document.
By allocating a portion of its memory for the new document, MongoDB can quickly access the document without having to read it from disk into memory, which can be a slow process. This allows for fast read operations and improved overall performance.
5. Document retrieval: When a query is executed to retrieve the document, MongoDB first checks its memory to see if the document is already in memory. If the document is in memory, MongoDB returns the data directly from memory. If the document is not in memory, MongoDB reads the document from disk into memory and then returns the data.
Storage Engine
There are two things that storage engine needs to do:
- when you give it some data, it should store that data
- when you ask for that data, it should give you that data back
Storage engine is responsible for these two things. We must choose the storage engine that will server us best according to our data and workload.
MongoDB uses WiredTiger Storage Engine
MongoDB supports multiple storage engines, including the default WiredTiger storage engine and the older MMAPv1 engine.
WiredTiger offers better performance and scalability, while MMAPv1 is more simple and can be a good choice for small deployments.
Here are few characteristics of WiredTiger Storage engine
- WiredTiger uses document-level concurrency control and supports transactions for multiple documents.
- It uses a cache for frequently accessed data and employs compression for storage optimization.
- Data is stored in B-trees, which provide efficient indexing and search capabilities.
- WiredTiger also supports point-in-time recovery and hot backups
- With WiredTiger, MongoDB supports compression for all collections and indexes. Compression minimises storage use at the expense of additional CPU.
- WiredTiger supports transactions, allowing you to make multiple updates to different documents in a single atomic operation.
Benefits of MongoDB over SQL:
- Flexible Schema: MongoDB uses a document-based data model, which allows for flexible and nested data structures that can better match the structure of the objects in your application.
- Scalability: MongoDB is designed to scale horizontally by adding more nodes to a cluster, allowing you to handle increasing amounts of data and traffic.
- High performance: MongoDB uses an in-memory data structure for storing the most recently used data, resulting in high read and write performance.
- Strong consistency: With latest versions , MongoDB provides strong consistency for transactions, ensuring that data is always in a consistent state across all nodes in a cluster.
- Rich querying: MongoDB provides a rich query language and supports indexing, allowing for efficient and flexible querying of data.
- Efficient use of resources: MongoDB uses a more efficient binary data format, called BSON, which results in smaller data sizes and lower resource usage compared to traditional SQL databases.
MongoDB is a growing database, that provides lot of flexible features that are very useful in today flexible world. I have tried to give you an understanding of how exactly mongoDB works, how the underlying storage engine works. You can use it according to your requirement.
If you like this blog and interested in reading more content like this do check out my other blogs & follow for email updates.
Do checkout these blogs:
- Deep dive into System design
- Most commonly used algorithms.
- Consistent Hashing
- Bit , Bytes And Memory Management
- CAP Theorem Simplified
- Event Driven Architecture
Other Architecture blogs:
Other Life changing blogs for productivity and Focus: