June 28, 2016
Object-relational mapping, and its NoSQL cousin, Object-document mapping, are techniques for interacting with a database in an object-oriented fashion. I’ll refer to ORMs and ODMs interchangeably, although ODM is typically used with a NoSQL where data is stored in documents rather than tables.
As a contributor to the BlueOak Server framework, I’m often asked whether we plan to add an ORM to BlueOak Server. I’ll explain why we don’t include an ORM within BlueOak and talk through some of the potential pitfalls of using an ORM, particularly with NoSQL databases. If you do decide to use an ORM with BlueOak, or use another framework that comes with an ORM, you can make an informed decision about whether an ORM is right for you.
BlueOak Server was created out of the need to quickly stand up REST APIs for our mobile applications. It turns out that the majority of the REST APIs we build are part of what is referred to as an aggregation tier. An aggregation tier doesn’t typically talk directly to a database. Instead, it interfaces with existing services that often are part of a system of record. Often times these services are backed by legacy SOAP or REST APIs that do talk to a database. Our goal is primarily to transform these APIs into something more easily consumable by mobile applications, for example turning what would be a dozen requests into a single REST call.
The Benefits of an ORM
ORMs allow developers to use familiar programming language constructs to interact with a database rather than complicated SQL queries. For example, let’s say there’s a database for storing vehicle information. A vehicle has a unique identification number (VIN), as well as a make and model. If the make and model are stored in a separate table, adding a new vehicle record might look like:
An ORM, on the other hand, might use function calls and constructors to accomplish the same thing.
It’s easy to see the appeal of using an ORM. Interacting with an ORM-generated object feels more natural to a developer than writing SQL queries. There are many ORM tools and libraries for Node.js. Sails.js and Strongloop/IBM’s Loopback are both popular frameworks for creating ORM-backed APIs and websites. There are also more-specialized ORM libraries such as Mongoose and Sequelize which can be integrated into other frameworks, like BlueOak Server.
ORMs have been around since long before Node.js. It’s a familiar and proven technology to enterprise Java developers, who have long used Entity Beans with Container-Managed Persistence. There’s is, however, one important difference today. The rise in popularity of Node.js has coincided with the popularity of NoSQL databases. Developers are embracing ORM and NoSQL without truly understanding the complexities involved. This can lead to many bugs that are difficult to detect and cause serious problems.
Referential Integrity in NoSQL Databases
Relational databases support the concept of constraints. Constraints are rules that maintain the integrity of data by ensuring that it meet certain criteria. One such constraint checks that references between tables are always correct whenever data is added or removed from the database.
A well-designed relational database is normalized, meaning that certain pieces of related data are split into separate tables and linked together through keys. Imagine a database of an e-commerce site containing product data. Each product has a supplier, and each supplier can supply multiple products. An un-normalized database would include the supplier information within the same row as the rest of the product data.
This presents a problem. If the address of the supplier changes, we need to update every row for every product using that supplier.
Normalizing the database splits the data into two tables, a product table and a supplier table. The product table refers to the supplier by a single ID, known as a foreign key, rather than having to embed all the information about the supplier.
Updating the supplier address is now simply a matter of changing a single field in the supplier table.
If we try to delete a supplier, the database will ensure that the supplier isn’t being referenced by any products. This is known as referential integrity. Without such a check, we can end up with nonsensical data, such as a product without a supplier.
Let’s imagine using a NoSQL database for this same data. There are many types of NoSQL databases, but most of them don’t have the concept of a reference or a constraint. If we want to normalize our data we can invent keys to associate data with each other, but the database won’t enforce that relationships are valid. It is therefore up to the ORM to implement the referential integrity checks.
Implementing referential integrity checks within the ORM adds additional complexity. Deleting a document is no longer a single database operation. Instead the ORM has to first perform a query to ensure that no other document references the document that we intend to delete.
Even when the ORM performs integrity checks, certain race conditions exist which can still break referential integrity. Since our delete operation is now two distinct operations–a query followed by a delete, there’s a small window of time in between when those operations are processed in which bad data can be added to the database. To do this safely we need transactions.
Transactions in NoSQL Databases
A transaction in a series of operations performed such that if one operation fails, none of the subsequent operations complete and the database is returned to its previous state. The classic example of a transaction is transferring money between two bank accounts. A successful transfer requires two database operations–one operation to deduct some amount from the first account and another operation to credit that amount to the second account.
If the first account doesn’t have proper funds to allow such a transfer, then the second operation of depositing the funds shouldn’t occur. Or if a failure occurs trying to deposit the funds to the second account, then the first account should remain unchanged. Failing to enforce the transaction could mean that either both accounts end up with the money or neither account gets the money.
While transactions are a common feature of relational databases, they’re less common in NoSQL databases. There are patterns that can be used in certain NoSQL databases to mimic some of the qualities of transactions, but unless they’re supported by the ORM, they won’t be very useful. The lack of constraints in a NoSQL database elicits the need for transactions, yet transactions are often unsupported. If you need transaction support, take a very close look at both your database and ORM to be sure it’s possible.
Atomic Operations in NoSQL Databases
Atomicity is a characteristic of transactions, meaning that multiple operations can execute as a single operation in an all-or-nothing manner. I mention it separately because there are certain operations that you might believe to be atomic but aren’t handled that way by the ORM.
Let’s imagine using a NoSQL database to store the customer’s cart on an e-commerce site. The cart can be represented by the following JSON document:
The items in the cart are stored in an array. Each element of the array contains a product ID and quantity. Adding an item to the cart seems trivial enough. In a relational database, this would be a single INSERT. In the case of a NoSQL database we simply want to append a new item onto the array.
Suppose a customer clicks on the “Add to Cart” button for two different products in rapid succession. We would expect both those items to end up in the cart. However, that might not be the case.
The problem is that the ORM might not update the item array atomically. Most likely it will fetch the entire cart document, update the item array, and then save the entire document back to the database. If the two add-to-cart requests occur close to each other, the first request can cause the server to fetch the cart document from the database, but before it’s able to update the data and write it back to the database, the server has re-fetched data for the second request. That means that when the ORM writes the data back to the database, the data from one of the requests is overwritten.
To prevent this, the cart document needs to be updated atomically. Some databases, like MongoDB, do provide atomic operations. Instead of fetching the entire cart document, updating it, and then writing it back to the database, the product data can be appended onto the items array in a single operation.
CouchDB takes a different approach. It’s able to detect conflicts when a document is updated with stale data. The data from both operations is preserved, although it’s up to the developer to write code that’s able to resolve the conflict. In the add-to-cart case, it makes sense to merge the items in the cart together. However, the correct way to resolve the conflict is going to depend on the business case.
Deciding on an ORM
Before deciding on an ORM and a NoSQL database for your next application, it’s critical to understand the technology. They do solve a problem, but that problem can come at a cost. Do you need transaction support? Do you care about referential integrity? This is a decision that only you can make on a case-by-case basis. The ability to quickly build an API may outweigh any other risks. If you can’t accept the risk, then a relational database is a safe choice.