Radiation hardening is the process of making electronic components and circuits resistant to damage or malfunction caused by high levels of ionizing radiation (particle radiation and high-energy electromagnetic radiation), especially for environments in outer space (especially beyond the low Earth orbit), around nuclear reactors and particle accelerators, or during nuclear accidents or nuclear warfare.
Most semiconductor electronic components are susceptible to radiation damage, and radiation-hardened (rad-hard) components are based on their non-hardened equivalents, with some design and manufacturing variations that reduce the susceptibility to radiation damage. Due to the extensive development and testing required to produce a radiation-tolerant design of a microelectronic chip, the technology of radiation-hardened chips tends to lag behind the most recent developments.
Radiation-hardened products are typically tested to one or more resultant-effects tests, including total ionizing dose (TID), enhanced low dose rate effects (ELDRS), neutron and proton displacement damage, and single event effects (SEEs).
See also: Radiation damage
Environments with high levels of ionizing radiation create special design challenges. A single charged particle can knock thousands of electrons loose, causing electronic noise and signal spikes. In the case of digital circuits, this can cause results which are inaccurate or unintelligible. This is a particularly serious problem in the design of satellites, spacecraft, future quantum computers, military aircraft, nuclear power stations, and nuclear weapons. In order to ensure the proper operation of such systems, manufacturers of integrated circuits and sensors intended for the military or aerospace markets employ various methods of radiation hardening. The resulting systems are said to be rad(iation)-hardened, rad-hard, or (within context) hardened.
Typical sources of exposure of electronics to ionizing radiation are the Van Allen radiation belts for satellites, nuclear reactors in power plants for sensors and control circuits, particle accelerators for control electronics particularly particle detector devices, residual radiation from isotopes in chip packaging materials, cosmic radiation for spacecraft and high-altitude aircraft, and nuclear explosions for potentially all military and civilian electronics.
Two fundamental damage mechanisms take place:
Lattice displacement is caused by neutrons, protons, alpha particles, heavy ions, and very high energy gamma photons. They change the arrangement of the atoms in the crystal lattice, creating lasting damage, and increasing the number of recombination centers, depleting the minority carriers and worsening the analog properties of the affected semiconductor junctions. Counterintuitively, higher doses over short time cause partial annealing ("healing") of the damaged lattice, leading to a lower degree of damage than with the same doses delivered in low intensity over a long time (LDR or Low Dose Rate). This type of problem is particularly significant in bipolar transistors, which are dependent on minority carriers in their base regions; increased losses caused by recombination cause loss of the transistor gain (see neutron effects). Components certified as ELDRS (Enhanced Low Dose Rate Sensitive) free, do not show damage with fluxes below 0.01 rad(Si)/s = 36 rad(Si)/h.
Ionization effects are caused by charged particles, including the ones with energy too low to cause lattice effects. The ionization effects are usually transient, creating glitches and soft errors, but can lead to destruction of the device if they trigger other damage mechanisms (e.g., a latchup). Photocurrent caused by ultraviolet and X-ray radiation may belong to this category as well. Gradual accumulation of holes in the oxide layer in MOSFET transistors leads to worsening of their performance, up to device failure when the dose is high enough (see total ionizing dose effects).
The effects can vary wildly depending on all the parameters – type of radiation, total dose and radiation flux, combination of types of radiation, and even the kind of device load (operating frequency, operating voltage, actual state of the transistor during the instant it is struck by the particle) – which makes thorough testing difficult, time consuming, and requiring many test samples.
The "end-user" effects can be characterized in several groups,
A neutron interacting with the semiconductor lattice will displace its atoms. This leads to an increase in the count of recombination centers and deep-level defects, reducing the lifetime of minority carriers, thus affecting bipolar devices more than CMOS ones. Bipolar devices on silicon tend to show changes in electrical parameters at levels of 1010 to 1011 neutrons/cm², CMOS devices aren't affected until 1015 neutrons/cm². The sensitivity of the devices may increase together with increasing level of integration and decreasing size of individual structures. There is also a risk of induced radioactivity caused by neutron activation, which is a major source of noise in high energy astrophysics instruments. Induced radiation, together with residual radiation from impurities in used materials, can cause all sorts of single-event problems during the device's lifetime. GaAs LEDs, common in optocouplers, are very sensitive to neutrons. The lattice damage influences the frequency of crystal oscillators. Kinetic energy effects (namely lattice displacement) of charged particles belong here too.
The cumulative damage of the semiconductor lattice (lattice displacement damage) caused by ionizing radiation over the exposition time. It is measured in rads and causes slow gradual degradation of the device's performance. A total dose greater than 5000 rads delivered to silicon-based devices in seconds to minutes will cause long-term degradation. In CMOS devices, the radiation creates electron–hole pairs in the gate insulation layers, which cause photocurrents during their recombination, and the holes trapped in the lattice defects in the insulator create a persistent gate biasing and influence the transistors' threshold voltage, making the N-type MOSFET transistors easier and the P-type ones more difficult to switch on. The accumulated charge can be high enough to keep the transistors permanently open (or closed), leading to device failure. Some self-healing takes place over time, but this effect is not too significant. This effect is the same as hot carrier degradation in high-integration high-speed electronics. Crystal oscillators are somewhat sensitive to radiation doses, which alter their frequency. The sensitivity can be greatly reduced by using swept quartz. Natural quartz crystals are especially sensitive. Radiation performance curves for TID testing may be generated for all resultant effects testing procedures. These curves show performance trends throughout the TID test process and are included in the radiation test report.
The short-time high-intensity pulse of radiation, typically occurring during a nuclear explosion. The high radiation flux creates photocurrents in the entire body of the semiconductor, causing transistors to randomly open, changing logical states of flip-flops and memory cells. Permanent damage may occur if the duration of the pulse is too long, or if the pulse causes junction damage or a latchup. Latchups are commonly caused by the X-rays and gamma radiation flash of a nuclear explosion. Crystal oscillators may stop oscillating for the duration of the flash due to prompt photoconductivity induced in quartz.
SGEMP are caused by the radiation flash traveling through the equipment and causing local ionization and electric currents in the material of the chips, circuit boards, electrical cables and cases.
Single-event effects (SEE) have been studied extensively since the 1970s. When a high-energy particle travels through a semiconductor, it leaves an ionized track behind. This ionization may cause a highly localized effect similar to the transient dose one - a benign glitch in output, a less benign bit flip in memory or a register or, especially in high-power transistors, a destructive latchup and burnout. Single event effects have importance for electronics in satellites, aircraft, and other civilian and military aerospace applications. Sometimes, in circuits not involving latches, it is helpful to introduce RC time constant circuits that slow down the circuit's reaction time beyond the duration of an SEE.
SET happens when the charge collected from an ionization event discharges in the form of a spurious signal traveling through the circuit. This is de facto the effect of an electrostatic discharge. Soft error, reversible.
Single-event upsets (SEU) or transient radiation effects in electronics are state changes of memory or register bits caused by a single ion interacting with the chip. They do not cause lasting damage to the device, but may cause lasting problems to a system which cannot recover from such an error. Soft error, reversible. In very sensitive devices, a single ion can cause a multiple-bit upset (MBU) in several adjacent memory cells. SEUs can become Single-event functional interrupts (SEFI) when they upset control circuits, such as state machines, placing the device into an undefined state, a test mode, or a halt, which would then need a reset or a power cycle to recover.
SEL can occur in any chip with a parasitic PNPN structure. A heavy ion or a high-energy proton passing through one of the two inner-transistor junctions can turn on the thyristor-like structure, which then stays "shorted" (an effect known as latch-up) until the device is power-cycled. As the effect can happen between the power source and substrate, destructively high current can be involved and the part may fail. Hard error, irreversible. Bulk CMOS devices are most susceptible.
Single-event snapback is similar to SEL but not requiring the PNPN structure, can be induced in N-channel MOS transistors switching large currents, when an ion hits near the drain junction and causes avalanche multiplication of the charge carriers. The transistor then opens and stays opened, a hard error, which is irreversible.
SEB may occur in power MOSFETs when the substrate right under the source region gets forward-biased and the drain-source voltage is higher than the breakdown voltage of the parasitic structures. The resulting high current and local overheating then may destroy the device. Hard error, irreversible.
SEGR was observed in power MOSFETs when a heavy ion hits the gate region while a high voltage is applied to the gate. A local breakdown then happens in the insulating layer of silicon dioxide, causing local overheat and destruction (looking like a microscopic explosion) of the gate region. It can occur even in EEPROM cells during write or erase, when the cells are subjected to a comparatively high voltage. Hard error, irreversible.
While proton beams are widely used for SEE testing due to availability, at lower energies proton irradiation can often underestimate SEE susceptibility. Furthermore, proton beams expose devices to risk of total ionizing dose (TID) failure which can cloud proton testing results or result in pre-mature device failure. White neutron beams—ostensibly the most representative SEE test method—are usually derived from solid target-based sources, resulting in flux non-uniformity and small beam areas. White neutron beams also have some measure of uncertainty in their energy spectrum, often with high thermal neutron content.
The disadvantages of both proton and spallation neutron sources can be avoided by using mono-energetic 14 MeV neutrons for SEE testing. A potential concern is that mono-energetic neutron-induced single event effects will not accurately represent the real-world effects of broad-spectrum atmospheric neutrons. However, recent studies have indicated that, to the contrary, mono-energetic neutrons—particularly 14 MeV neutrons—can be used to quite accurately understand SEE cross-sections in modern microelectronics.
Hardened chips are often manufactured on insulating substrates instead of the usual semiconductor wafers. Silicon on insulator (SOI) and silicon on sapphire (SOS) are commonly used. While normal commercial-grade chips can withstand between 50 and 100 gray (5 and 10 krad), space-grade SOI and SOS chips can survive doses between 1000 and 3000 gray (100 and 300 krad). At one time many 4000 series chips were available in radiation-hardened versions (RadHard). While SOI eliminates latchup events, TID and SEE hardness are not guaranteed to be improved.
Bipolar integrated circuits generally have higher radiation tolerance than CMOS circuits. The low-power Schottky (LS) 5400 series can withstand 1000 krad, and many ECL devices can withstand 10 000 krad.
Magnetoresistive RAM, or MRAM, is considered a likely candidate to provide radiation hardened, rewritable, non-volatile conductor memory. Physical principles and early tests suggest that MRAM is not susceptible to ionization-induced data loss.
Capacitor-based DRAM is often replaced by more rugged (but larger, and more expensive) SRAM.
Choice of substrate with wide band gap, which gives it higher tolerance to deep-level defects; e.g. silicon carbide or gallium nitride.
Shielding the package against radioactivity, to reduce exposure of the bare device.
Shielding the chips themselves (from neutrons) by use of depleted boron (consisting only of isotope boron-11) in the borophosphosilicate glass passivation layer protecting the chips, as naturally prevalent boron-10 readily captures neutrons and undergoes alpha decay (see soft error).
Use of a special process node to provide increased radiation resistance. Due to the high development costs of new radiation hardened processes, the smallest "true" rad-hard (RHBP, Rad-Hard By Process) process is 150 nm as of 2016, however, rad-hard 65 nm FPGAs were available that used some of the techniques used in "true" rad-hard processes (RHBD, Rad-Hard By Design). As of 2019 110 nm rad-hard processes are available.
Use of SRAM cells with more transistors per cell than usual (which is 4T or 6T), which makes the cells more tolerant to SEUs at the cost of higher power consumption and size per cell.
Use of Edge-less CMOS transistors, which have an unconventional physical construction, together with a unconventional physical layout.
Error correcting code memory (ECC memory) uses redundant bits to check for and possibly correct corrupted data. Since radiation's effects damage the memory content even when the system is not accessing the RAM, a "scrubber" circuit must continuously sweep the RAM; reading out the data, checking the redundant bits for data errors, then writing back any corrections to the RAM.
Redundant elements can be used at the system level. Three separate microprocessor boards may independently compute an answer to a calculation and compare their answers. Any system that produces a minority result will recalculate. Logic may be added such that if repeated errors occur from the same system, that board is shut down.
Redundant elements may be used at the circuit level. A single bit may be replaced with three bits and separate "voting logic" for each bit to continuously determine its result (triple modular redundancy). This increases area of a chip design by a factor of 5, so must be reserved for smaller designs. But it has the secondary advantage of also being "fail-safe" in real time. In the event of a single-bit failure (which may be unrelated to radiation), the voting logic will continue to produce the correct result without resorting to a watchdog timer. System level voting between three separate processor systems will generally need to use some circuit-level voting logic to perform the votes between the three processor systems.
Hardened latches may be used.
A watchdog timer will perform a hard reset of a system unless some sequence is performed that generally indicates the system is alive, such as a write operation from an onboard processor. During normal operation, software schedules a write to the watchdog timer at regular intervals to prevent the timer from running out. If radiation causes the processor to operate incorrectly, it is unlikely the software will work correctly enough to clear the watchdog timer. The watchdog eventually times out and forces a hard reset to the system. This is considered a last resort to other methods of radiation hardening.
Radiation-hardened and radiation tolerant components are often used in military and aerospace applications, including point-of-load (POL) applications, satellite system power supplies, step down switching regulators, microprocessors, FPGAs, FPGA power sources, and high efficiency, low voltage subsystem power supplies.
However, not all military-grade components are radiation hardened. For example, the US MIL-STD-883 features many radiation-related tests, but has no specification for single event latchup frequency. The Fobos-Grunt space probe may have failed due to a similar assumption.
The market size for radiation hardened electronics used in space applications was estimated to be $2.35 billion in 2021. A new study has estimated that this will reach approximately $4.76 billion by the year 2032.
In telecommunication, the term nuclear hardness has the following meanings: 1) an expression of the extent to which the performance of a system, facility, or device is expected to degrade in a given nuclear environment, 2) the physical attributes of a system or electronic component that will allow survival in an environment that includes nuclear radiation and electromagnetic pulses (EMP).
See also: Comparison of embedded computer systems on board the Mars rovers
((cite journal)): CS1 maint: uses authors parameter (link)
((cite journal)): Cite journal requires