The time is right to write this brief blog and take away some of the confusion caused by the term Electromagnetic Resilience.
1. Where did the phrase ‘Electromagnetic Resilience’ first come from?
It was invented by IET Standards’ editors, who thought it sounded cool.
Not for any technical reasons!
As I warned them at the time, this new phrase is causing confusion, which is why I have written this brief blog.
IEC 61508 [4] is the IEC’s Basic Standard on Functional Safety, which all other IEC standards are required to follow when they contain functional safety requirements. It was developed because modern digital systems had too many digital states for them to ever be fully tested even once; and also because digital systems are non-linear which makes it impossible to interpolate between tested states to know if the untested states could cause unacceptable safety risks.
IEC 61508 developed a risk management approach based on the application of a large number of well-proven “Techniques and Measures” (T&Ms) in management, design, verification, validation, realisation (manufacture, assembly, installation, commissioning, etc.), and for maintaining functional safety risks at low-enough levels throughout the lifecycle. Many of those T&Ms had already been in use by the designers of safety-critical systems for a decade or more.
Unfortunately, 61508’s T&Ms did not address the functional safety risks that can be caused by the effects of Electromagnetic Interference (EMI) on electronic hardware and software. They seem to have assumed that electromagnetic (EM) immunity testing would take care of that – but why would they think that when they already knew that digital systems cannot be proven safe by any amount of testing?
Spotting this large omission in [4], in 1998 I started my first Working Group on filling in the gaps in 61508’s T&Ms, which eventually resulted in IET Standards’ 2017 Code of Practice on Electromagnetic Resilience [1].
Just before [1] was published, the IET Standards’ Editors – who are not technically educated in either EMC or Functional Safety – chose the phrase ‘Electromagnetic Resilience’ (EM Resilience) because they thought it sounded modern and marketable, and presented it to me as a done deal.
I tried to dissuade them, saying it was an undefined term in [4] and so would cause confusion because [1] is all about the achievement of functional safety but calling it EM Resilience would make people incorrectly assume that it was about EMC or EMI instead.
However, I could not get them to budge at all!
So anyway, we are where we are, and we have to deal with the IET’s unfortunate choice of title somehow, which is why I have written this blog.
When IEEE Std 1848:2020 [2] was first drafted, IEEE Standards’ Editors insisted on using a lengthy title that said what the document was really about: “Techniques and Measures to Manage Functional Safety and Other Risks as Regards Electromagnetic Disturbances”.
To help avoid differences in interpretation arising if [1] and [2] described the exact same thing in different ways, we did our best to make the IEEE’s text in [2] a direct copy of the IET’s text in [1].
The text in [1] uses the phrase EM Resilience very many times in its 200 pages, and we felt that it would harm the readability of [2] if we replaced each instance of it with the phrase: “techniques and measures to manage functional safety and other risks as regards EM disturbances”. So, the IEEE’s text in [2] defines the IET’s invented phrase ‘EM Resilience’ as having the exact same meaning as ‘techniques and measures to manage functional safety and other risks as regards EM disturbances’.
So that’s our starting point for this blog: the term ‘EM Resilience’ that was invented purely for marketing/sales reasons by IET Standard’s editors, is defined in [2] as meaning: ‘Techniques and measures to manage functional safety and other risks as regards EM disturbances’.
2. But before we can get properly started, we need some definitions
Resilience: let’s define the ‘resilience’ of a safety-related system as it is defined in IEC 61508 [4]: “the ability of the system to remain acceptably safe despite unforeseeable events”.
One type of unforeseeable event can be an electromagnetic (EM) disturbance, and we need to agree some definitions for this, too!
EM Phenomenon: IEC 60050 [3] does not have a definition for this phrase, but it is included in its subclause 161-01-01, where it states: “(the) electromagnetic environment (is the) totality of electromagnetic phenomena existing at a given location. Note 1 to entry: In general, the electromagnetic environment is time-dependent and its description can need a statistical approach.”
EM Disturbance: subclause 161-01-05 in IEC 60050 states that this is: “(an) electromagnetic phenomenon that can degrade the performance of a device, equipment or system, or adversely affect living or inert matter. Note 1 to entry: An electromagnetic disturbance can be an electromagnetic noise, an unwanted signal or a change in the propagation medium itself.”
EM Interference: subclause 161-01-06 in IEC 60050 states that EMI: “(is the) degradation in the performance of equipment or transmission channel or a system caused by an electromagnetic disturbance. Note 1 to entry: the terms “electromagnetic disturbance” and “electromagnetic interference” designate respectively the cause and the effect, and should not be used indiscriminately.”
However, IEEE 1848:2020 [1] uses the definitions in ANSI C63.14-2014 [5], which has a slightly different definition of EMI in its subclause 4.169: “Any electromagnetic disturbance that interrupts, obstructs, or otherwise degrades or limits the effective performance of electronics and electrical equipment.”
3. What about these two different definitions for EMI?
From section 2 above, we can see that EM phenomena in the EM environment should be called EM disturbances when they are capable of interacting with devices, equipment, or systems in such a manner that they can cause their performance to degrade.
And – when EM disturbances actually do degrade the performance of a device, equipment, or system – [5] says we should call them EMI. But [3] says that we should use the term EMI for the degraded performance that is caused by the EM disturbances.
We are not going to be able to do anything about having these two different definitions for EMI, except to always be aware that people might use one or the other.
Despite the difference in these two definitions, it is quite clear that:
- EMI can only be said to exist when an EM disturbance causes actual degradation in the performance of an electronic device, equipment or system; and,
- The term ‘EM disturbance’ is used to mean a potential cause of performance degradation in an electronic device, equipment or system.
The term ‘EMI’ is used when performance degradation has been caused by an EM disturbance.
EMC engineers often use the phrase EMI when they should use EM disturbance, and this can lead to confusion. (I have often been guilty of this mistake, as anyone who reads my previous material will notice.) Let’s all try to be more rigorous in our use of these two terms in future!
4. Now (finally!) we can tackle the subject of this blog: what is EM Resilience, really?
From the definition of ‘resilience’ in [4], we could say that EM Resilience should mean: ‘the ability of the system to remain acceptably safe despite unforeseeable EM disturbances’.
However, EM Resilience is effectively defined in [2] as meaning: ‘Techniques and measures to manage functional safety and other risks as regards EM disturbances.’
My conclusion, is that the definition of EM Resilience in [2] is close enough to how it would probably have been defined in [4]. I think this is a lucky outcome, given the way the phrase was originally invented! So, I am very happy with the definition in [2]. Nevertheless, there are some issues that I think are very minor, which should be borne in mind, and which we can discuss if you want to.
Functional safety academic and engineering experts usually ‘get’ IET 2017 [1] and IEEE Std 1848 [2] instantly. But EMC experts often take a while to understand it correctly (it took me over 20 years!), but with [2] now published, EMC experts should be able to reach a good understanding of EM Resilience very quickly.
5. But what are the T&Ms for managing functional safety risks as regards EM disturbances, really?
It very important to understand that EM Resilience is all about keeping functional safety risks acceptably low. This does not necessarily mean preventing the occurrence of EMI, because preventing all EMI is impossible.
Because the lifecycle EM environment is complex and unforeseeable, and because we cannot fully test any modern microprocessor or any software more complex than a printer driver, we must expect that EMI will happen during a lifecycle.
We must expect this, and design accordingly to keep functional safety risks low enough, whatever we do to mitigate EM disturbances, and whatever EM immunity tests we pass.
Luckily, functional safety engineers long ago learned to use T&Ms that deal with unforeseen operational degradations of any type – whether due to component or solder-joint failures, undiscovered software bugs (e.g. caused by the digital states that could not be tested), corroded electrical connections, etc. (even EMI too – to some extent).
These T&Ms are based on the fact that many (not all) types of equipment are functionally safe when they are shut down, or in standby. And many types of equipment have operational modes that cannot cause harm to people.
These are all examples of ‘safe states’, and these T&Ms are used to design safety-related systems that detect performance degradations that could increase safety risks (in themselves and in their EUCs) – then either repair the degradation (e.g. by error correction T&Ms) or by switching the EUC into one of its ‘safe states’ before it has time to become too dangerous.
Many functional safety experts regard switching the EUC into one of its ‘safe states’ as being ‘job done’.
We can see that it is perfectly feasible – and often less costly too – to design functionally safe equipment without using any EM mitigation T&Ms such as shielding, filtering, transient/surge protection, galvanic isolation, etc. During EM immunity testing, such equipment would be considered to pass as regards functional safety even if it automatically switched into a safe state every time an EM phenomenon was applied to it!
For this reason, the European Union’s Product Liability Directive (PLD) warns us that making products safe enough by relying on switching them into safe states could result in situations where too much operational uptime (hence too much money) is lost in safe states, and the product’s users or owners might foreseeably make unapproved modifications to restore reliable operation and stop losing so much money.
In other words, they might foreseeably modify or defeat certain aspects of its safety-related system, perhaps even by simply ripping the whole thing out! Such behaviour is human nature, and the PLD understands human nature enough to require products to be safe enough even ‘considering foreseeable misuse’.
And anyway, every equipment sold or ‘taken into service’ is required by law to achieve EMC, which, if done properly, should ensure that equipment is not switched into its safe states too often.
5.1 An example
Let’s consider an example of a typical signal processing system, implemented by a combination of hardware and software. It can be made much more reliable by using three identical processing channels in parallel, with a ‘voter’ which continually compares all three channel outputs with each other. The voter then outputs a signal that at least two (any two) of the three channels agree upon, ignore the output of a channel that does not agree with the other two, at any time. This approach is called ‘2oo3’, in IEC 61508 [4].
Using such a 2oo3 system should not be expected to improve the immunity of the original single channel to EMI, because all three channels are identical and exposed to the same EM environment, so will most likely suffer identical performance degradations that the voter cannot detect.
A.2.3 in [2] therefore recommends that the individual processing channels in a redundant system be designed to be EM-diverse, so that they will most probably respond differently to any given EM disturbances at any time.
Then, even though at any time, an EM disturbance may be causing one of the three processing channels to experience degradations in its performance (i.e. suffer EMI), the outputs of the other two should be unaffected and agree with each other. Because of the output voter, the 2oo3 processing system can suffer EMI to any of its channels at any time without creating unacceptable functional safety risks as a result.
Note that this well-proven T&M allows EMI to occur. It detects the performance degradation that is the EMI and ensures that it does not increase functional safety risks.
But however hard we try we cannot guarantee that some unforeseen extreme, or complex, EM disturbances might not cause all of our EM-diverse processing channels to fail in the same way at the same time.
So, if 2oo3 is not considered to reduce the safety risks caused by EMI to acceptable levels, why not vote 3oo5, or 5oo7, or even more?
An alternative T&M that might be considered is the use of large/heavy/costly/rugged/ugly military-grade EM mitigation (filtering, shielding, transient/surge suppression, etc., see A.2.11 in [2]), then spend even more money on ensuring its high-performance is maintained throughout the lifecycle. But even using this approach cannot guarantee that EM disturbances during the lifecycle will not allow EMI to occur that increases safety risks. This is at least partly due to the unknowable behaviour of the untested digital states.
So, it is a very good thing that [4] provides many more different types of well-proven T&Ms that we can use in any combinations to help ensure acceptable safety risks throughout the lifecycle!
Let’s consider an EM disturbance that is causing all of our EM-diverse redundant channels to give incorrect outputs, and assume that some of those incorrect outputs happen to be similar enough that the voter picks them for its output: a false positive result that could result in an unsafe operation of the EUC.
Using one or more of the automatic self-test techniques described by A.3.15.1, A.3.15.4, or A.3.18.5 in [2] should detect faulty operation of the whole processing system and trigger actions that maintained low-enough risks despite the false positive output from the voter.
Such actions might be, for example, ignoring the faulty data until the EM disturbance has gone and the processing is working just fine again. If correct processing cannot be resumed soon enough, it would instead switch the EUC into one of its safe states before the EUC became too dangerous. Alternatively, it might be appropriate to switch control to a lower-technology backup signal processor and continue operation with some acceptable degradation in performance and/or short-term-acceptable higher safety risk. The best approach always depends strongly on the individual application.
There are other T&Ms than self-testing, listed in [2], that may also be used to help ensure that performance degradations, however they are caused, should not propagate through a safety-related system and result in unacceptable safety risks.
It is not always acceptable for EUCs to switch to a safe state. For example, there is no safe state for a heart pacemaker. It has to keep working.
For many years, pacemakers have had to comply with very tough EM immunity test standards, but, as I said before, passing immunity tests cannot (on its own) prove that functional safety risks will remain acceptable throughout the lifecycle. So, pacemaker manufacturers use automatic self-testing to detect when the pacemaker’s digital controller is not working correctly – whether the cause is EM disturbances or anything else – and switch control from the microprocessor to a very simple low-technology relaxation oscillator. The simple oscillator (which could even be electromechanical) keeps pulsing the heart, keeping the user alive – although by all accounts not enjoying the experience very much!
Low-technology back-up T&Ms are, of course, also covered in [2].
References
[1] The IET’s “2017 Code of Practice on Electromagnetic Resilience”, available from: https://shop.theiet.org/code-of-practice-for-electromagnetic-resilience
[2] IEEE Std 1848:2020, “Techniques and measures to manage functional safety and other risks as regards electromagnetic disturbances”, available from: https://standards.ieee.org/standard/1848-2020.html
[3] IEC 60050, “Electropedia: The World’s Online Electrotechnical Vocabulary”, free from: http://www.electropedia.org/iev/iev.nsf/6d6bdd8667c378f7c12581fa003d80e7?OpenForm
[4] IEC 61508 [4]:2010, “Functional safety of electrical/electronic/programmable electronic safety-related systems”, in seven parts, from the IEC’s Webshop.
[5] ANSI C63.14-2014, the American National Standard Dictionary of Electromagnetic Compatibility (EMC) including Electromagnetic Environmental Effects (E3)”, from ANSI’s Webshop.
About the Author: Keith Armstrong
Keith Armstrong graduated in 1972 from Imperial College, London, specialising in electronic design and field theory. After working in several countries, he joined the first pro-audio custom-mixing-desk manufacturer to use digital audio. Five years of intense work resulted in the ability to design the lowest-noise A-D convertors correct at the first iteration, with the most sensitive analog and digital circuits on the same ground plane. Later, Keith added switching power converters to his ‘right first time’ approach to Signal Integrity and Power Integrity.
When manufacturers began testing to the EMC Directive, those using Keith’s SI and PI design approach found the EMC tests easy to pass. This became the basis of Keith’s independent company, started in 1990: Cherry Clough Consultants (after his house – a “Cherry Clough” being related to the Norwegian Viking phrase for a valley with cherry trees).
Enough of Cherry Clough’s 800+ satisfied customers worldwide have adopted Keith’s “Good EM Engineering” approach that it is now well-proven to be cost-effective in all applications, with even the very toughest EMC standards, up to 24GHz (so far!).
Since 1987, Keith’s IEE, IET, and IEEE Working Groups have created the brand-new safety-engineering discipline of managing the functional safety risks that can be caused by EMI. This is hugely important for future safety, most especially as regards autonomous systems, such as vehicles.
Keith can be contacted as follows:
Email: keith.armstrong@cherryclough.com
Mobile/text: +44 (0)7785 726 643
LinkedIn: https://www.linkedin.com/in/keith-armstrong-449801172/
Keith also has a Blog, visit: https://www.emcstandards.co.uk/blog