Architecture Migration to Intel® Multicore Processors Invites Software Implications and Decisions

By Lori Matassa and Max Domeika

The performance headroom offered by multicore Intel® architecture processors is motivating a migration of applications from other processor architectures. Before planning a migration to Intel architecture, the designer must first understand specific software implications and the decisions that need to be made. This article, which is the first in a two-part series, summarizes platform decisions and optimization opportunities. The follow-on article will detail a process to aid such migrations.

Migration: Platform Decisions

An architecture migration begins with an understanding of the hardware differences that affect the correct execution of the software on the actual platform itself. These factors require the designer to make decisions on topics like choice of instruction set, endian architecture, system-initialization options, operating systems, driver support, library support, and tools support.

Instruction Set (32- or 64-bit)

The migration to the Intel architecture includes a choice of instruction set—either IA-32 or Intel® 64 Instruction Set Architecture (ISA). This choice affects several facets of the migration. The 32-bit x86 ISA is a good fit for many embedded applications. However, one might consider employing the Intel 64 ISA for several reasons:

  • Access to more than 232 Bytes of memory
  • Greater number of internal resources including 16 generalpurpose registers and 16 XMM registers
  • Larger native data size for higher performance on 64-bit operations
  • Greater number of byte registers for higher performance on applications that make frequent use of 8-bit values
  • Instruction relative addressing (RIP addressing) makes positionindependent code easier to implement and more efficient.

The Intel 64 and IA-32 Architectures Software Developer’s Manuals contain the details for each Intel architecture instruction. Use this set of manuals as a reference for converting (re-writing) native assembly-code instructions to equivalent functionality with Intel architecture instructions.

The following topics also may require specific attention during a translation of assembly code. They may have an impact on the migration of higher-level language source code as well:

  • Alignment: Data-alignment requirements are specific to the processor architecture. Exceptions and incorrect execution may arise if they aren’t taken into account.
  • Vector-oriented instructions: At the assembly language level, translation of vector-oriented code is not 1:1 between architectures. It requires knowledge of both instruction sets. Many higher-level languages enable programming using intrinsics, which also do not translate 1:1.
  • Calling conventions: Some architectures specify that function arguments are passed in registers while others pass arguments on the stack. The translation of the assembly language needs to take this into account. For higher-level languages, a compiler will typically take care of this difference.

Byte Order (Endian Architecture)

The byte order defines the physical ordering of component bytes that comprise larger-size elements, such as 16-, 32-, and 64-bit elements. A big endian architecture specifies that the most significant bytes are stored at the lowest memory address. In contrast, a little endian architecture specifies that the least significant bytes are located at the lowest address. IA-32 and Intel 64 architecture processors assume little-endian byte order. Other processors, however, assume big-endian byte order. This difference can cause problems if the code makes assumptions about the locations of particular bytes comprising a larger element. Code that abstracts the memory architecture and can execute correctly on both bigand little-endian architectures is defined as endian-neutral.

Software migrations need to accommodate endian differences by transforming the code to be endian-neutral. This amounts to the following recommendations:

  • Data storage or shared memory: Use a common format for storing data. If the big-endian format is chosen for data storage, a byte-swap macro should be used on the little-endian architecture to read and store data correctly.
  • Data transfer: Employ byte-swap macros to translate data to an agreed-upon network-transfer byte ordering. In networking, the typical agreed-upon format is big endian.
  • Data type: Data types including unions, pointers, byte arrays, and bit fields present unique challenges.

For further details on issues around byte order, see the paper by Matassa (Reference i).

System Initialization And Operating System

When moving to multicore Intel architecture processors, two primary decisions must be made: the choice of system-initialization firmware and operating system (OS) to employ. This selection should be guided by how well the features of the firmware and OS map to the needs of the embedded project. Take the OS decision, for example. It involves a natural tradeoff: The more dissimilar the current OS is from the OS under consideration, the more porting work will be required. Another factor in the OS decision is the established software ecosystem available on the OS. An OS with a broader user base tends to have a more diverse, competitive, and complete software ecosystem. Thus, it may favor more mainstream desktop and server OSs, such as Linux or Windows.

Aside from these two rather obvious factors (similarity to current OS and popularity of OS), a critical factor is the level of support for current and future Intel architecture features. Processor features with benefits spanning performance, security, multitasking, mobility, manageability, reliability, and flexibility have been and continue to be added to Intel architectures. Many of these features require firmware and OS support. Whether the current and target OSs are the same or different, device drivers, libraries, and software-development tools need to be surveyed. In addition, availability for the Intel architecture must be determined.

For system-initialization firmware, the options are summarized as follows:

  • Boot loader: Custom firmware that’s usually developed when requirements might include optimization for speed, size, or specific system requirements. It will support minimal upgrade or expansion capabilities. One example that supports the Intel® Atom™ processor is the QNX* fastboot technology.
  • BIOS: Obtain system-initialization products from an independent BIOS vendor. If the design will support multiple standard interfaces and expansion slots, the optimal choice may be a BIOS and/or Unified Extensible Firmware Interface (UEFI) firmware. Alternatively, it might be best to select a host mainstream OS with a broad set of pre-OS features, which are ready to run multiple applications.

For further information on BIOS and boot loaders specific to the embedded Intel architecture, visit the Intel® Embedded Design Center (Reference ii).

Drivers And Libraries

Typically, drivers and libraries that are developed in house will
need to be updated for the Intel architecture. Open-source versions of
drivers and libraries providing similar functionality may help to guide
the changes that are required. Platform- and OS-specific drivers can
usually be obtained from various sources. Information on chipset
drivers is available online.iii Depending on the OS, Intel architecture
device drivers are available from various providers.

The real-time operating-system (RTOS) board-support packages
(BSPs) for Intel® embedded chipset drivers are available from
RTOS vendors. The standard desktop, mobile, and server drivers
for Microsoft Windows* (XP or Vista*) and Linux* can be download from In addition, BSPs for
Microsoft Windows CE* can be downloaded from third-party vendors
like Adeneo Corp.,* BSQUARE,* and Wipro Technologies.*


Once the architectural and platform software differences are mitigated, one should seek an understanding of the software-development tools that are available for the Intel architecture. The common set of software tools relevant in this discussion are compilers, debuggers, performance analyzers, static analyzers, and code coverage. One relatively easy method of resolving this is to check if a tools vendor offers the same set of tools targeting the Intel architecture. Open-source/GPL tools available for the previous list include GCC, GDB, oprofile, Splint, and gcov.

Intel also offers software-development products including compilers, debuggers, performance analyzers, performance libraries, and threading tools, which make use of the latest Intel architecture features. These tools can be employed early in the migration process. They therefore enable the designer to take advantage of the latest performance-differentiating features.

Migration: Optimization Opportunities

Once the application is executing correctly, performance optimizations can be considered. The use of parallelism in the form of vector and multicore processing is becoming more widely used.

Vector Instructions

Single Instruction, Multiple Data (SIMD) is a technology used for vector-oriented code. Intel® Streaming SIMD Extensions (Intel® SSE) are extensions to the fundamental processor-architecture instruction set. On PowerPC* architectures, AltiVec* is used. As a result, it’s common for ports from PPC to desire to make use of existing software investments. For information on translating AltiVec to Intel SSE instructions, see the AltiVec/SSE Migration Guide (Reference iv).

To simplify the translation of AltiVec to Intel SSE, Intel is working with N.A. Software Ltd.* They plan to bring a VSIPL library for Intel® processors and AltiVec conversion tools to market for the Linux* and Wind River* VxWorks* operating systems. These tools will reduce the digital-signal-processing (DSP) software-conversion effort:

  • Vector-signal image-processing library (VSIPL): Highly efficient computational middleware for signal- and imageprocessing applications. VSIPL is an open standard for embedded-signal- and image-processing software and hardware vendors.
  • Altivec.h include file for Intel® architecture: same as the PPC altivec.h, but targets the Intel SSE instruction set instead of AltiVec.
  • Altivec Assembler to Intel® Assembler-Compiler: Translates small blocks of AltiVec assembler into C code, which can then be compiled into IA-32 assembler code. This tool is currently under development. N.A. Software* is targeting mid-2010 for availability.

Multicore Optimization

Multicore processors offer opportunities for higher performance over single-core processors. However, enacting performance gains requires explicit action on the part of the software developer. A good resource for information on embedded multicore processors is the book by Domeika (Reference v). One technique of obtaining performance from multicore processors is multithreading. A development model supplemented with tools is required to help with multithreading.

Multicore Software-Development Cycle

To take advantage of multicore processors using a symmetric- multiprocessing (SMP) solution, follow a development cycle summarized by the following four steps:

  1. Analysis and high-level design: Determine where to enact multithreading in the application.
  2. Implementation and low-level design: Write source code to enact multithreading.
  3. Debug: Find and fix multithreading bugs.
  4. Performance tune: Improve performance.

Software tools and a description of specific multicore features found in these tools follow:

  • Compilers: programming application programming interfaces (APIs) and compiler technology to take advantage of multicore processors. OpenMP* allows programmers to easily communicate parallelism. In addition, technology like automatic parallelization enables the compiler to find and enact parallelism in the code. Intel® Threading Building Blocks is a C++ runtime library that abstracts the lowlevel threading details necessary for optimal multicore performance. It uses common C++ templates and coding style to eliminate tedious threading-implementation work.
  • Thread verification tools: facilitate the debugging of multithreaded programs by automatically finding common errors, such as storage conflicts, deadlock, API violations, inconsistent variable scope, thread stack overflows, etc.
  • Performance-analysis tools: detail time spent in serial regions, parallel regions, and critical sections and graphically display performance bottlenecks due to load imbalance, lock contention, and parallel overhead.

A good reference on multicore tools is found in the paper by Brutch (Reference vi).

This article summarizes the areas of software that are affected by the architecture migration to Intel® multicore processors. Keep an eye out for the follow-on article, which will provide guidance for the migration process.

Visit the Intel® Embedded Design Center at for the one-stop shop to embedded Intel architecture design information and more details about migrating to Intel architecture.


i. Matassa, L., “Endianness Whitepaper,”

ii. Embedded Design Center, Development-Tools.

iii. Intel® Embedded and Communications Chipset Drivers, embedded/chipsets.htm?iid=embed_body+chip.

iv. AltiVec/SSE Migration Guide, library/documentation/ Performance/Conceptual/Accelerate_sse_migration/ Accelerate_sse_migration.pdf.

v. Domeika, M., “Software Development for Embedded Multicore Systems: A Practical Guide Using Embedded Intel Architecture,” Newnes 2008, ISBN: 978-0-7506-8539-9.

vi. Brutch, T., “Parallel Programming Development Life Cycle: Understanding Tools and their Workflow when Migrating Sequential Applications to Multicore Platforms,” USENIX, Oct. 2009.

Lori Matassa is a Staff Technical Marketing Engineer in Intel’s Embedded and Communications Division and holds a BS in Information Technology. She has over 20 years of engineering experience developing software for embedded systems. In recent years at Intel she has contributed to Carrier Grade Linux, as well as the software enablement of multicore adoption and architecture migration for embedded and communication application.

Max Domeika is a senior staff software engineer in the Developer Products Division at Intel, creating tools targeting the Intel Architecture market. Max earned a BS in Computer Science from the University of Puget Sound, an MS in Computer Science from Clemson University, and a MS in Management in Science & Technology from Oregon Graduate Institute. Max recently authored, “Software Development for Embedded Multi-core Systems” from Elsevier Inc.