Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.
Usually heterogeneity in the context of computing referred[when?] to different instruction-set architectures (ISA), where the main processor has one and other processors have another - usually a very different - architecture (maybe more than one), not just a different microarchitecture (floating point number processing is a special case of this - not usually referred to as heterogeneous).
In the past heterogeneous computing meant different ISAs had to be handled differently, while in a modern example, Heterogeneous System Architecture (HSA) systems eliminate the difference (for the user) while using multiple processor types (typically CPUs and GPUs), usually on the same integrated circuit, to provide the best of both worlds: general GPU processing (apart from the GPU's well-known 3D graphics rendering capabilities, it can also perform mathematically intensive computations on very large data-sets), while CPUs can run the operating system and perform traditional serial tasks.
Recent findings show that a heterogeneous-ISA chip multiprocessor that exploits diversity offered by multiple ISAs can outperform the best same-ISA homogeneous architecture by as much as 21% with 23% energy savings and a reduction of 32% in Energy Delay Product (EDP). AMD's 2014 announcement on its pin-compatible ARM and x86 SoCs, codename Project Skybridge,
suggested a heterogeneous-ISA (ARM+x86) chip multiprocessor in the making.
Heterogeneous CPU topology
A system with heterogeneous CPU topology is a system where the same ISA is used, but the cores themselves are different in speed. The setup is more similar to a symmetric multiprocessor. (Although such systems are technically asymmetric multiprocessors, the cores do not differ in roles or device access.) There are typically two types of cores: a higher performance core usually known as the "big" or P-core and a more power efficient core usually known as the "small" or E-core.
A common use of such topology is to provide better power efficiency in mobile SoCs.
ARM big.LITTLE (succeeded by DynamIQ) is the prototypical case, where faster high-power cores are combined with slower low-power cores.
Apple has produced Apple silicon ARM cores with similar organization.
Intel has also produced hybrid x86-64 cores codenamed Lakefield, although not without major limitations in instruction set support. The newer Alder Lake reduces the sacrifice by adding more instruction set support to the "small" core.
Heterogeneous computing systems present new challenges not found in typical homogeneous systems. The presence of multiple processing elements raises all of the issues involved with homogeneous parallel processing systems, while the level of heterogeneity in the system can introduce non-uniformity in system development, programming practices, and overall system capability. Areas of heterogeneity can include:
Compute elements may have different cache structures, cache coherency protocols, and memory access may be uniform or non-uniform memory access (NUMA). Differences can also be found in the ability to read arbitrary data lengths as some processors/units can only perform byte-, word-, or burst accesses.
Compute elements may have differing types of interconnect aside from basic memory/bus interfaces. This may include dedicated network interfaces, Direct memory access (DMA) devices, mailboxes, FIFOs, and scratchpad memories, etc. Furthermore, certain portions of a heterogeneous system may be cache-coherent, whereas others may require explicit software-involvement for maintaining consistency and coherency.
A heterogeneous system may have CPUs that are identical in terms of architecture, but have underlying micro-architectural differences that lead to various levels of performance and power consumption. Asymmetries in capabilities paired with opaque programming models and operating system abstractions can sometimes lead to performance predictability problems, especially with mixed workloads.
While partitioning data on homogeneous platforms is often trivial, it has been shown that for the general heterogeneous case, the problem is NP-Complete. For small numbers of partitions, optimal partitionings that perfectly balance load and minimize communication volume have been shown to exist. 
Heterogeneous computing hardware can be found in every domain of computing—from high-end servers and high-performance computing machines all the way down to low-power embedded devices including mobile phones and tablets.