Error Prevention Techniques for Embedded Systems

Error prevention in embedded systems is a two-stage “divide and conquer” effort that prevents errors on both the host and target systems.

By Dr. Adam Kolawa

The software used in embedded systems must be thoroughly tested before products are released, both independently and as part of the system hardware. Unfortunately, using traditional debugging techniques to remove errors in embedded systems development is complex, requiring substantial expenditures in time and effort. Error prevention- the process and practices for preventing bugs before they even occur- is crucial for the efficient and timely production of reliable embedded products.

Embedded applications are everywhere, from simple household products to vital defense systems and other mission-critical applications. Vendors cannot afford to allow errors to slip into these embedded systems and then face a product recall or, worse, watch it fail and endanger or take human lives. Traditional debugging techniques are increasingly inadequate for handling today’s complex system designs. Fortunately, a better approach to error removal has evolved to meet the needs of today’s embedded software developers: error prevention.

Error prevention is a two-stage process. The first stage involves preventing errors on the host system. The second is preventing errors on the target system. Both stages are necessary to achieve a thorough cleanup of software errors.

Testing and validating software on the target is both difficult and time-consuming. Fortunately, techniques such as unit testing and coding standards can be performed on the host, allowing developers to perform realistic or semi-realistic testing before moving to the target. Starting embedded error prevention on the host is cost and time advantageous. The large variety of analytical tools available for standard software-development platforms makes it much easier to test and validate software on the host system. Moreover, when as many errors as possible are exposed on the host, code corrections can be easily made and validated immediately, saving valuable time. And, by temporarily ignoring the target, error prevention is focused on a software-only environment, instead of an environment where software and hardware are interconnected.

Error prevention on the host comprises two distinct phases: coding standards analysis and unit testing.

Coding Standards Analysis

Coding standards are language-specific programming rules that greatly reduce the probability of introducing errors into software. Coding standards originate from industry experts who analyzed the cause of each error they encountered and correlated these errors to specific coding practices. They then took these correlations between errors and coding practices and designed a set of coding standards that prevent the most common and serious errors. By following these coding standards, developers automatically improve code quality, maintainability, and reusability, as well as ensure that erroneous or dangerous coding practices do not turn into debilitating errors when the code is compiled and run.

In addition to enforcing industry-standard C and C++ coding rules, developers can also enforce rules specially designed for embedded systems. These embedded coding standards can be expanded with extensions that allow developers to graphically create their own company, personal, or target-specific embedded coding standards. For example, target-specific coding standards could enforce target-specific restrictions on memory usage or variable length.

As an example of coding standards, consider the following C code:

 char *substring (char string[80],
 int start_pos, int length)
 {
      .
      .
      .
 }			

This code declares the magnitude of a single dimensional array in an argument declaration. The code is dangerous because the C language will pass an array argument as a pointer to the first element in the array, and different invocations of the function may pass array arguments with different magnitudes. If the developer of this code had followed the coding standard “Do not declare the magnitude of a single dimensional array in an argument declaration” (taken from Motorola’s set of C coding standards), the code would have avoided the problem and the code would read:

 char *substring (char string[],
 int start_pos, int length)
 {
      .
      .
      .
 }       

Another coding standard that could help embedded software developers prevent errors is “The const data type should be used for pointers in function calls if the pointer is not to be changed” (also from Motorola’s C coding standards). The const specifier guarantees that the value of the variable cannot be changed. The compiler will report an error if the value is changed when const is used. Following this coding standard will prevent developers from mistakenly modifying the value of this variable, and thus prevent errors that may otherwise result from attempts to modify the value of this variable.

Another example of code with potential problems that could have been prevented by enforcing coding standards is:

 void Foo (int *ptr1, char *ptr2,
 float *ptrF)
 {
      *ptr1 = 10; // possibly null
 pointer dereference
      ptr2 = 0;
      if( ptrF==0) {
          return;
      }
      *ptrF = 0; // OK
      return;
 }       

This code dereferences a possibly null pointer. Such dereferencing is a common cause of program failure. This problem could have been prevented by enforcing the coding standard “Don’t pass possibly null pointers as parameters.” Violations of this coding standard do not always result in an error, but it is much less time-consuming to enforce the coding standard and check if violations will cause a problem than it would be to later trace strange application behavior back to a specific line of code.

Coding standards can also prevent problems that may not surface until code is ported. For example, the following code may work on some platforms, but problems may arise when it is ported to others:

 #include
			
 void test(char c) {
    if( ‘a’ <= c && c <= ‘z’) {
 //Violation
    }
    if( islower(c) ) { //OK
    }
		
    while( ‘A’ <= c && c <= ‘Z’) {
 //Violation
    }
    while( isupper(c) ) { //OK
    }
 }       

Character tests that do not use the ctype.h facilities (isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper) cause the portability problem. The ctype.h facilities for character tests and upper-lower conversions are portable across different character code sets, are usually very efficient, and promote international flexibility.

The fastest and most accurate way to enforce such coding standards is to enforce them automatically with a development tool that checks code against standards. Some things to consider when selecting such a technology include:

  • Does it work with your code and/or compiler?
  • Does it contain a set of meaningful coding standards?
  • Does it let you easily create and enforce custom coding standards (including target-specific coding standards)?
  • Is it easy to customize reports to your team and project priorities?
  • How well would it integrate into your existing development process?

Unit Testing

The second phase of host error prevention focuses on unit testing. Unit testing involves testing software code at its smallest functional point, which is typically a single function.

The objective of unit testing is to test not only the functionality of the code, but also to ensure that the code is structurally sound and robust, and able to respond appropriately in all conditions. Performing thorough host-level unit testing reduces the amount of work needed at the application level and on the target, and drastically reduces the potential for errors.

Ideally, unit testing should start applying the following types of tests:

  • Construction testing (also known as white-box testing). This confirms that code is structurally sound, and can process a wide variety of permissible inputs without crashing or encountering serious problems. It is much like posing a series of “what if?” questions that determine whether the unit continues to behave appropriately under unusual or exceptional conditions, and verifies that any inputs thrown at the code will be received and addressed with the proper behavioral response. The success of construction testing hinges on the test inputs’ ability to cover the code as fully as possible and to find inputs that might cause problems.
  • Functional testing (also known as black-box testing). This verifies that software conforms to its specification and that all of the intended functionality is included and working correctly. It is used to validate that software works correctly when users perform a series of actions within the software’s specification. The intent is to confirm expected behavior, from the smallest possible unit of code to the entire application. Because it tests each component both in isolation and as part of the system, it allows developers to frame or isolate the functionality of each piece and isolate any potential errors that could affect system functionality.
  • Regression testing. This tests modified code under the same set of inputs and test parameters used in previous test runs. This testing verifies whether modifications have successfully eliminated errors and have not introduced new errors. It is important to use regression testing as a regular part of any testing process to find new errors during development when they are easiest to fix, and before new lines of code are added that depend on the current code base. Ideally, regression testing is performed nightly (during automated nightly builds) to ensure that errors are detected and fixed as soon as possible.

By automating the unit testing process, developers can maximize the amount of testing that can be performed with existing resources. In addition, automation not only improves test accuracy, but also makes testing efforts repeatable, which helps determine when system modifications introduce problems into previously correct functionality. Some things to look for when selecting a unit testing technology include:

  • Does it work with your code and/or compiler?
  • Does it automatically create test harnesses?
  • Does it automatically create test cases?
  • Does it provide an easy way to enter user-defined test cases and stubs?
  • Does it automate regression testing?

Once testing on the host is completed, error prevention moves to stage two. In stage two, the software and hardware come together on the target system. If the stage one practices were followed successfully, then approximately 70 to 80 percent of potential errors that could have been ported to the target will have already been eradicated. The remaining errors will most likely be hardware-related, but to be sure that no preventable software errors remain, it is necessary to conduct unit tests on the software directly on the target system.

To conduct unit testing on the target, a cross compiler must instrument an executable of the validated and tested code, then compile a small set of test cases from the unit-test session on the host. These test cases address specific input values and other expected data in order to examine the software behavior within the actual operational environment.

The target runs all hard-coded tests and reports back to the host via a serial interface or a TCP/IP connection. Unit testing in small hard-coded clusters must be repeated - the instrumented executable is recompiled - until all necessary unit tests have been compiled and executed on the target. Any necessary code modifications made at this stage must be verified by regression testing, running the same series of hard-coded unit tests to verify the code changes.

This “divide and conquer” error prevention strategy, testing both on the host and on the target, makes correcting any remaining hardware-related issues much easier because all potential software errors have been diagnosed and corrected. The great advantage to adopting this method: it reduces the amount of testing needed on the target, the more difficult and expensive testing, by several orders of magnitude over traditional embedded debugging techniques.


Dr. Adam Kolawa is the co-founder and CEO of Parasoft, a leading provider of Automated Error Prevention (AEP) software solutions. He has contributed to and written more than 100 commentary pieces and technical articles for publication. Kolawa holds a Ph.D. in theoretical physics from the California Institute of Technology.