Unlocking the Power of NoSQL Databases: A Dive into Scalability and Flexibility

Gaurav Kumar
10 min readApr 11, 2024

--

This is part of the Data Engineering Roadmap.

1. Foundational Knowledge — NoSQL.

Introduction to NoSQL

NoSQL represents a category of database management systems (DBMS) specifically engineered to manage extensive quantities of data in unstructured or semi-structured formats. Unlike conventional relational databases, which rely on rigidly defined schemas within tables for data storage, NoSQL databases employ adaptable data models capable of accommodating fluctuations in data structures. Moreover, they possess the capability to scale horizontally, facilitating seamless expansion to accommodate escalating data volumes.

Initially coined as “non-SQL” or “non-relational” databases, the term NoSQL has evolved to signify “not only SQL.” This evolution acknowledges the broadening scope of NoSQL databases, which now encompass a diverse array of database architectures and data models.

NoSQL databases encompass a spectrum of categories, typically grouped into four primary classifications:

  1. Document databases: These repositories organize data in semi-structured documents like JSON or XML, offering flexibility in schema and enabling retrieval through document-oriented query languages.
  2. Key-value stores: Within these databases, data is structured as pairs of keys and corresponding values, prioritizing rapid and straightforward read and write operations.
  3. Column-family stores: These databases organize data into column families, treating them as cohesive units for optimized querying of extensive datasets, ensuring swift and efficient data retrieval.
  4. Graph databases: Specifically designed to manage intricate data relationships, these databases store data as interconnected nodes and edges, adeptly handling the complexities inherent in relational data structures.

Advantages of Utilizing NoSQL Databases

Embracing NoSQL databases offers a plethora of advantages that cater to the evolving needs of modern data management. Here are several key benefits of incorporating NoSQL databases into your technology stack:

  1. Scalability: NoSQL databases are inherently designed for horizontal scalability, allowing organizations to effortlessly expand their database infrastructure as data volumes grow. This scalability is particularly beneficial in scenarios where traditional relational databases struggle to cope with the increasing demands of Big Data and high-velocity data streams.
  2. Flexibility in Data Modeling: Unlike rigidly structured relational databases, NoSQL databases provide the flexibility to store and retrieve unstructured or semi-structured data with ease. This adaptability enables organizations to accommodate diverse data types and evolving data schemas without the need for extensive schema modifications.
  3. High Performance: NoSQL databases are optimized for performance, offering exceptional throughput and low latency for read and write operations. This makes them well-suited for use cases requiring real-time data processing, such as IoT applications, streaming analytics, and high-frequency trading platforms.
  4. Support for Distributed Architectures: Many NoSQL databases are designed with distributed architectures in mind, allowing them to seamlessly distribute data across multiple nodes or clusters. This distributed nature enhances fault tolerance, resilience, and high availability, ensuring continuous operations even in the event of node failures or network partitions.
  5. Cost-Effectiveness: NoSQL databases often provide a more cost-effective solution compared to traditional relational databases, especially when dealing with large-scale deployments. The ability to leverage commodity hardware and open-source software reduces infrastructure costs, while the scalability of NoSQL databases ensures that organizations only pay for the resources they need.
  6. Handling of Big Data: With the exponential growth of data in the digital age, NoSQL databases excel in managing vast amounts of data efficiently. Whether it’s storing petabytes of unstructured data or processing millions of transactions per second, NoSQL databases offer the scalability and performance required to handle the challenges of Big Data effectively.
  7. Support for Modern Applications: NoSQL databases are well-suited for powering modern, data-intensive applications such as social networks, e-commerce platforms, content management systems, and real-time analytics dashboards. Their ability to handle diverse data types, support high concurrency, and scale horizontally makes them the preferred choice for building agile and responsive applications.

Environment Setup for NoSQL:

There are various NoSQL databases you can get started with, the popular ones are :-

Sure! Here are some examples of popular NoSQL databases:

  1. MongoDB
  2. Cassandra
  3. Redis
  4. Couchbase
  5. Amazon DynamoDB
  6. Neo4j
  7. Apache HBase

For this article we will keep our focus on the MongoDB, it’s installation, usage and couple of notable features.

Setting up a NoSQL environment, particularly MongoDB, across various operating systems involves distinct procedures. Here’s a comprehensive guide to installing and configuring MongoDB on Windows and Linux systems:

Installing MongoDB on Windows:

  1. Download MongoDB: Visit the official MongoDB website (Try MongoDB Community Edition | MongoDB) and download the appropriate installer for Windows.

2. Installation Steps:

  • After the download finishes, simply double-click the installation file to proceed with the setup. Follow the provided instructions accordingly.
Click Next
  • Now, choose Complete to install MongoDB completely.
  • Then, select the radio button “Run services as Network service user.”
  • The installation process will also offer you the option to install MongoDB Compass, the official graphical user interface (GUI) provided by MongoDB. You can select the checkbox to include this during installation if desired.

3. Starting MongoDB:

After the installation process is finished, you’ll need to initiate MongoDB. Here’s how to do it:

  • Launch Command Prompt.
  • Navigate to the directory where MongoDB is installed, typically located at “C:\Program Files\MongoDB\Server\7.0\bin”.
  • Simply enter the command ‘mongod’ to commence the server.

Step-by-Step Guide: Accessing the Terminal for Executing Commands

To access the terminal where you can execute MongoDB commands, follow these steps:

  1. Open Command Prompt (Windows):
  • Press the Windows key on your keyboard.
  • Type “Command Prompt” into the search bar.
  • Press Enter to open the Command Prompt.

2. Navigate to the MongoDB Bin Directory (Optional):

  • If MongoDB is installed and its bin directory is not in your system’s PATH variable, you’ll need to navigate to the MongoDB bin directory.
  • Use the cd command to change directories. For example:
cd C:\Program Files\MongoDB\Server\7.0\bin

3. Start MongoDB Service (Optional):

  • If MongoDB is not running as a service, you’ll need to start it manually.
  • Use the mongosh command to start the MongoDB server. For example:
mongosh

Database Create and Drop in MongoDB:

In MongoDB, creating and dropping databases is a straightforward process. Below are the steps to create and drop databases in MongoDB:

Creating a Database:

To create a new database in MongoDB, you don’t need to explicitly create it. MongoDB creates databases automatically when you first store data in them. However, you can switch to a non-existing database and insert data into it, and MongoDB will create the database for you. Here’s how to do it:

  1. Switch to the Database: Use the use command to switch to the desired database. If the database doesn't exist, MongoDB will create it.
use my_database

2. Insert Data: You can now start inserting documents into collections within the database. MongoDB will create the collections as well if they don’t exist.

db.my_collection.insertOne({ "name": "John", "age": 30 })

3. Verify Database Creation: You can verify that the database has been created by listing all databases using the show dbs command.

show dbs

Dropping a Database:

To drop a database in MongoDB, you can use the dropDatabase() method. Be cautious when dropping databases as this action cannot be undone and will delete all data within the database. Here's how to do it:

  1. Switch to Admin Database: Before dropping the target database, switch to the admin database.
use admin

2. Drop the Database: Use the dropDatabase() method to drop the desired database.

db.dropDatabase()

Alternatively, you can specify the name of the database you want to drop.

db.getSiblingDB('my_database').dropDatabase()

3. Verify Database Deletion: You can verify that the database has been dropped by listing all databases using the show dbs command.

show dbs

By following these steps, you can create and drop databases in MongoDB according to your requirements. Remember to exercise caution when dropping databases, as it will permanently delete all data within them.

MongoDB Datatypes

In MongoDB, data is stored in flexible, schema-less documents in BSON format (Binary JSON). BSON is a binary representation of JSON-like documents and is designed to be lightweight, traversable, and efficient. MongoDB supports various data types for storing different kinds of data. Here’s a list of MongoDB data types along with query examples:

1. String:

  • Represents UTF-8 encoded strings.
db.Testdb.insert({"string data type" : "This is a sample message."})
db.Testdb.find({"string data type" : "This is a sample message."})

2. Integer:

  • Represents 32-bit signed integers.
db.Testdb.insert({"Integer example": 62})
db.Testdb.find({"Integer example": 62})

3. Double:

  • Represents 64-bit floating-point numbers.
db.Testdb.insert({"double data type": 3.1415})
db.Testdb.find({"double data type": 3.1415})

4. Boolean:

  • Represents true or false values.
db.Testdb.insert({"Nationality Indian": true})
db.Testdb.find({"Nationality Indian": true})

5. Object:

  • Represents embedded documents.
db.Testdb.insert({ address: { city: "New York" } })
db.Testdb.find({ address: { city: "New York" } })

6. Array:

  • Represents arrays or lists of values.
db.Testdb.insert({ tags: { $in: ["mongodb", "database"] } })
db.Testdb.find({ tags: { $in: ["mongodb", "database"] } })

7. Date:

  • Represents date and time values.
db.Testdb.find({ created_at: { $gte: ISODate("2022-01-01") } })
db.Testdb.insert({ created_at: { $gte: ISODate("2022-01-01") } })

8. ObjectId:

  • Represents a unique identifier. This identifier can be used to fetch the data or locate the data. This is unique to the object.
db.Testdb.insert({ _id: ObjectId("61523b525b772a08d0939f4e") })
db.Testdb.find({ _id: ObjectId("61523b525b772a08d0939f4e") })

9. Null:

  • Represents null or empty values.
db.Testdb.insert({ status: null })
db.Testdb.find({ status: null })

10. Regular Expression:

  • Represents regular expression patterns.
db.Testdb.insert({ email: /@example\.com$/ })
db.Testdb.find({ email: /@example\.com$/ })

These are some of the commonly used data types in MongoDB along with corresponding query examples. Understanding these data types and their usage will help you effectively model and query data in MongoDB databases.

MongoDB - Create Collection

What is a Collection?

A collection is a container for documents. It is analogous to a table in a relational database. Collections do not enforce a schema, meaning that documents within a collection can have different fields and data types.

Create a collection in mongodb

To establish a collection in MongoDB, you can utilize the db.createCollection(name, options) method. However, it's worth noting that you often won't need to explicitly create collections as MongoDB handles this process automatically when you begin inserting documents into a database. Here's a syntax example demonstrating how to create a collection in MongoDB:

db.createCollection(collection_name, options)

In MongoDB, the `db.createCollection()` method is employed to initiate the creation of a new collection. Within this method, “name” represents a string data type indicating the desired name for the collection to be created. Additionally, “options” is an optional parameter, represented as a document type, which allows for the specification of memory size and indexing preferences for the collection within the database.

Below is an illustration demonstrating the syntax of the `createCollection()` method along with its optional options parameter:

db.createCollection(<collection_name>, { capped: <boolean>,
autoIndexId: <boolean>,
size: <number>,
max: <number>,
storageEngine: <document>,
validator: <document>,
validationLevel: <string>,
validationAction: <string>,
indexOptionDefaults: <document>,
viewOn: <string>,
pipeline: <pipeline>,
collation: <document>,
writeConcern: <document>} )

Here is the detailing of some important fields that can be used as options in the create createCollection() method:

Let us take an example to see how to implement the command in MongoDB:

Example:

db.createCollection("MyCollection")

MongoDB is capable of generating collections automatically upon the insertion of documents. Consider a scenario where you aim to add a document via the `insert()` method into a collection named “movie”. In such an instance, the command could resemble the following:

db.books.insert({
"title": "To Kill a Mockingbird",
"author": "Harper Lee",
"publication_year": 1960
})

The above operation will automatically create a collection if the collection with this name does not currently exist. Also, if you want to check an inserted document, you can use the find() method. Its syntax is:

Syntax:

db.books.find()

MongoDB — Drop Collection

To remove a collection from MongoDB, utilize the `collection.drop()` method. This action removes the collection entirely from the database, including any linked indexes. It’s important to note that this method does not accept arguments, and attempting to pass them will result in errors. Upon execution of this method, all indexes associated with the collection are also deleted.

The syntax for using this method is:

Syntax:

db.collection_name.drop()

Here is an example that is showing the use of the drop() method:

Continuing from the example in the previous lesson where you have created a collection inside the my_project_db database. Now use this database to remove or drop the movie collection and see the changes:

Example:

use my_project_db
db.books.drop()
show collections

So let’s see how much you’ve retained from the blog

Conclusion

It is highly suggested to look for a project in MongoDB to understand the concept completely. You can find many project guidelines on youtube and try to build your own once you get comfortable. Stay tuned for more such articles and Happy Learning !!!

--

--