Embedding Databases Speeds Time to Market

A light-weight relational database management system can dramatically simplify the building of complex embedded applications, speed time to market, and provide a highly robust data environment.

By Malcolm Colton

When designing a new embedded application, few software engineers look beyond the data management facilities offered by the local file system. For many simple applications these data management services are adequate. Increasingly, however, developers find themselves dealing with environments in which data is shared among several applications. In these cases, the limitations of the file system force developers to write a great deal of data management infrastructure code.

Infrastructure code has the unfortunate quality of being invisible to the end user customer if it works correctly, but causing serious problems if it fails. As a result, infrastructure code does not add to the perceived value of a product, but does add significantly to the development and maintenance costs of the application. It also slows time to market and increases the customer’s total cost of ownership.

A more cost-effective solution to the problem of sharing data among applications is to use a commercial off the shelf (COTS) embedded data management platform. While it may seem strange to consider installing a relational database management system (RDBMS) in an ATCA product, there is no reason they cannot run effectively. Such systems deliver as much or more processing power than is typically needed to run enterprise applications and their RDBMS.

Several platforms exist that are suited to building data management infrastructure applications. They include the database management system (DBMS) itself, as well as functions for managing redundant database pairs and distributing data across a network of databases. Such platforms fit into a few megabytes of memory at run time, and provide sophisticated data management services that can dramatically simplify the task of building complex embedded applications.

Figure 1 - Managing embedded system data using just an ordinary file system may require considerable design effort to implement the necessary infrastructure, diverting resources from the application software that provides most of a product’s perceived value.

There are three basic data management models: network, object oriented, and relational. Network databases have fallen out of fashion for all but LDAP directories, because they lack flexibility. Object-Oriented DBMS (ODBMS) have found niches in some specialized applications, but the lack of a common query language has precluded the development of the rich array of third party tools that surrounds Relational DBMS. The vast majority of application developers needing database management turn to an RDBMS because it offers flexibility, speed, standard interfaces, a large community of trained software engineers, and a plethora of third party tools.

A relational DBMS offers three fundamental features: flexible and self-documenting data structures that offer content-based search, multi-user shared data access, and guaranteed data integrity even after a crash. They all contribute to the simplification of applications development.

Flexible Data Structures Simplify Access

A perennial problem in any application is maintaining documentation about data structures. An RDBMS automatically maintains system catalogs that describe the current structure of the data. Any application can query the catalogs to determine the current structure of the data, removing much of the need to maintain external documentation. An RDBMS also allows for the data structures to evolve over time without impact on existing applications.

In an RDBMS data resides in named tables made up of rows that in turn contain columns. Each row contains the data that defines a given entity and has a primary key that uniquely defines that piece of data. For example, a Home Location Registry (HLR) table might use a phone number as the primary key, and each row would contain information related to the subscriber having that phone number. All data access is through SQL language calls that retrieve data by means of its contents and operate on one or more rows in the database.

An RDBMS offers ways to automatically associate tables of data together, by means of foreign keys. A foreign key is the primary key from another table included as a data element in a row. For example, in a unified billing system, a user may have a fixed phone number, a fax number and a mobile number. Each of these numbers can be stored in a row that is accessed by user ID, to provide links to the rows in other tables defined by the different phone numbers. The RDBMS automatically ensures the integrity of such linked structures without the need for application code. For example: delete a user, and the RBDMS automatically deletes the associated rows.

Figure 2 - Database management is critical for maintaining data synchronization in high-availability applications where a standby mode must be ready to take over quickly if the primary fails. An RDBMS can maintain synchronization as well as maintain local data copies on individual nodes for speedy access.

Transactions Protect Data Integrity

Applications often need to make multiple data changes as a unit of work. In database terms, this is a transaction. A partial transaction should never occur; partial transactions damage the data’s consistency. The classical example of a transaction is transferring money from a checking account to a savings account. It is not acceptable that only the debit or the credit take place-either both must occur, or neither.

In database terms, transactions are defined as having ACID characteristics: Atomic, Consistent, Isolated and Durable.

  • Atomic - Even under failure and fault conditions, a transaction takes place either completely or not at all. If the RDBMS goes down during a transaction, when it is restarted it will roll back whatever changes were made by the incomplete transaction, returning the data to a consistent state.
  • Consistent - Every transaction, successful or not, leaves the database in a consistent state.
  • Isolated - Multiple transactions can be taking place simultaneously, but they do so in isolation. Intermediate states are invisible to other applications until the transaction commits or rolls back. This is important because an application that reads data part way through a transaction would get an inconsistent view of the data-the money has left the checking account, but not yet been credited to the savings account, for example.
  • Durable - A committed transaction is guaranteed not to be lost, even if the database or its machine fails.

Without an RDBMS enforcing these transaction rules, each application must be relied on to lock and unlock data, and to recover properly from all error conditions, including outages that occur during a transaction. It is easy to make mistakes in a system that does not rely on an RDBMS, resulting in a crippling loss of data integrity.

Once an application becomes complex and the data stores grow large, the problem of quickly finding data becomes significant. Without an RDBMS, applications need to maintain complex data structures that enable fast searching. All applications must be relied on to update these structures whenever they insert or delete data, because any error or failure on the part of any application will result in a loss of data integrity that will impact all the applications.

An RDBMS maintains and uses highly efficient index structures automatically. The indexes are defined declaratively as part of the database definition, and the RDBMS selects the best index access to the data automatically. Applications enjoy rapid access to the data without the need to write a single line of access method code.

Databases Enhance System Reliability

An RDBMS has the ability to recover its database to a consistent state after an unexpected halt. In fact every time an RDBMS starts up, it runs recovery algorithms that ensure that all committed transactions are properly recorded in the database, and all uncommitted transactions are completely undone. This means that an application can begin using the database immediately on start-up, confident that data integrity has been preserved.

Many modern embedded RDBMS offer features that go beyond plain database management to support highly available and distributed applications, as well. Some mission-critical applications have uptime requirements that exceed the reliability of the hardware on which they run. One example is the chassis controller in an ATCA chassis, where loss of the controller means that the functioning of the entire chassis is lost. Such high availability (HA) applications run on redundant pairs of cards.

In HA systems, the standby card must be able to take over operations from a failed primary within a very short period of time. For telecom applications, this time is measured in tens of milliseconds; not nearly enough time to reboot an application stack. So the standby node must be “hot”, with current data and running applications that need only be told that they are now active.

Some RDBMS can be configured in a hot-standby mode, in which the pair of RDBMS cooperate to maintain constant synchronization of their databases. The use of an HA RDBMS means that applications need not be aware that they are running in a redundant system, making it much easier to convert an existing application to a higher level of fault tolerance.

Supporting Shared Data

Few applications stand alone. Increasingly they participate in complex, multi-tier distributed architectures, passing operational and control data to and fro among various parts of the system. Even what looks like a single application running in an ATCA chassis may comprise three major tiers: the Element Management System (EMS), the chassis controllers and the operational payload cards. It is the sharing of data between the application elements that provides control and coordination in such systems.

An embedded RDBMS can often be configured to support this kind of environment, replicating data throughout the system while maintaining control over global data integrity. This technique can be used, for example, to move FCAPS (Fault, Configuration, Accounting, Performance and Security) data between operational applications and their controlling applications.

In any distributed system where data is shared, there is a trade-off to be made between the freshness of the data and the amount of bandwidth consumed in communicating data changes. Modern data replication systems offer both compression of the data stream, and some form of out of band signaling that allows a replica to know when it should refresh itself.

Given the value of RDBMS systems, the designer’s decision them becomes to build or to buy. In this competitive world, time to market is a key business driver, and engineering managers eagerly seek technologies that can reduce development time and cost. An embedded RDBMS encapsulates many tens or hundreds of person-years of development effort, and can easily be plugged into an application architecture. Incorporating such a component into an embedded application ensures that scarce and expensive development resources can be focused on delivering differentiated value to the customer. This speeds time to market, ensures a higher level of robustness for the application, and lowers the total cost of ownership.

Malcolm Colton is a database expert, having spent the last 15 years at Sybase, Illustra, Informix, Cloudscape and Solid. In his career he has written software, managed engineers, worked as a post-sales field consultant, and written numerous technical articles and presentations about databases. He is now an independent consultant.