Agata Werszler, 2020-12-15

Which DB to choose for your specific data type in your software project

Creating a new project includes choosing the right database to store your data. Many developers that I know choose the relational database by default from the beginning. But is it the best decision? Of course, it depends on many factors. In this article, I would like to introduce you to other types of databases to make your choices easier and help you be prepared in your future endeavors.

The type of your database is not the only topic to consider. There are many other issues to think about, e.g., how many active users the application may have? Do you need strong consistency everywhere? Will the eventual consistency suffice in some cases? There are so many questions without straightforward answers as “the more you get into it, the more complicated it becomes.” So, please, make note that this article is focused only on database types.

Take a cup of coffee and enjoy this reading.

General database classification

At the beginning, it is good to know that there are two main types of databases: relational (SQL) and non-relational (NoSQL).

  • The SQL databases are structured in a relational way, which means that you store data in tables and keep relations between them.
  • The NoSQL (Not only SQL) databases, unlike the relational ones, are not well-structured and, thus, allow more adaptability and flexibility.

There is one more type apart from the two mentioned above, namely an in-memory database. It cannot be classified neither as relational nor non-relational because it relates to where the data is physically stored. Every database could be stored on a disk or in memory.

Types of databases

1. Relational

In my opinion, it is the most popular type of database. It works well with structural data where you want to keep relations between the records. The database structure is described in a schema.

The main advantages regard transactions (they help to ensure data integrity and follow the ACID rules) and the ability to handle lots of complex queries.

When to choose it?

It is useful for keeping data that does not change structurally very often and that you need to store permanently, for example:

  • CRM (Customer Relationship Management),
  • Order management,
  • ERP (Enterprise Resource Planning),
  • data warehousing or inventory management,
  • accounting or finance.

Examples:

Amazon Aurora, Microsoft Azure SQL Database, PostgreSQL, MySQL.

Relational databases are insufficient for many new applications and you need to have more than one database. In the next part of the article, I will focus on the non-relational databases.

2. Key-value

It stores each data value with a unique key. It means that data is accessed by a single key, just like it is done in a hash map. In contrast to relational databases, it neither enforces the scheme nor relationships between records. Most of these databases do not ordinarily support update operations. To modify data, you have to overwrite the whole existing set.

When to choose it?

It is useful for data that you want to read/write fast (but do not update very often):

  • real-time bidding, ad serving,
  • data caching,
  • session management,
  • shopping carts,
  • customer preferences or profile management.

Examples:

Memcached, Amazon DynamoDB, Azure Cosmos DB, Redis.

3. Document

It stores collections of documents. Every document contains fields with data, which may be simple values or complex elements, like lists or child collections. It is important to know that every document may have a different structure, even if they represent the same thing (each document is unique and evolves over time). For example, the first customer document contains less info than the second:

{
 "FirstName": "John",
 "LastName": "Fake",
 "Motorcycles:" [
  {
    "Model": "BMW",
    "Year": 2020
  }
 ]
}
{
 "FirstName": "Alex",
 "LastName": "Nolastname",
 "Age": 15,
 "Address": {
    "Country": "Poland",
    "City": "Somewhere"
  },
 "Motorcycles:" [
  {
    "Model": "BMW",
    "Year": 2020
  }
 ]
}

When to choose it?

It is useful for data that require a flexible schema for fast processing:

  • product catalogs,
  • CMS (content management system),
  • user profiles and personalization.

Examples:

Amazon DocumentDB, Azure Cosmos DB, MongoDB, Redis.

4. Graph

It uses a graph structure and is built of two elements: nodes and edges. Nodes are analogous to table rows or JSON documents. Edges are relationships between the nodes – they are as important as nodes. Both of them can have properties. Moreover, edges can have a defined direction of a relationship.

When to choose it?

It is useful when your data is similar to a graph, i.e., relationships between data items are dynamic and change over time. Furthermore, it is a good choice for when a business or technical team need to understand relationships within their data. Some prominent examples include:

  • organization charts,
  • fraud detection,
  • social graphs/networking,
  • recommendations engines,
  • knowledge graphs.

Examples:

Amazon Neptune, Neo4j, ArangoDB, Titan.

5. Time series

It stores data organized by time. Typically, it accumulates enormous amounts of data in real-time. It is most often used to save data, though updating is very rare. Generally, a timestamp is used as the primary key and/or sorting data. Some databases allow defining tags to be included as additional information, like data origin or type.

When to choose it?

It is useful to store small amounts of data appended sequentially in chronological order, for example in:

  • DevOps,
  • application monitoring,
  • monitoring and event telemetry,
  • IoT applications (like data collections from device sensors).

Examples:

Azure Time Series Insights, Amazon Timestream, InfluxDB.

6. Ledger

It provides an immutable, transparent, and cryptographically verifiable transaction log owned by a central authority. – Amazon’s QLDB Overview

Let's shortly explain every keyword in the above quotation:

  • immutable – means that a record created in this database cannot be deleted, modified, or even overwritten,
  • transparent – it tracks and keeps a sequence record of each change in your data,
  • cryptographically verifiable – data created in this database is verified by cryptographic hashing techniques, similar to blockchains (using the SHA-256 hash function).

When to choose it?

It is useful to store accurate history, e.g., logging sequenced entry of every data change, as in:

  • finances (history of debit or credit transactions),
  • manufacturing (track where parts were sourced),
  • insurance,
  • HR and payroll,
  • retail,
  • supply chains.

Examples:

Amazon QLDB

Conclusions

There is no simple answer to the question posed in the title of this article. The only way to choose the right database is to learn more about your data. Answer the question: “What kind of data does your application generate?”, and you will be able to make the right choices.

Furthermore, you should know the business requirements and the application domain very well. You need to know how you will use the data, what queries you will send to the database, how many times you will keep, read, update, or delete your data. All of these things matter, but not all developers pay enough attention to these areas.

Please think about your data in the application you develop to improve/create better software. Overall, I hope your will get to know your data well enough to store it in a place where it will be happy.

Read more:

A few tricks to speed up your JavaScript application

Ways to increase your Rails performance

Facts and myths about cooperating with external software development partner