Cover Image

Not for Sale



View/Hide Left Panel
Click for next page ( 519


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 518
A.5.5 Reliability Definition of Reliability The reliability of a system is defined by a measure of He success wad which the system conforms to Be perfonnance specification for which it was designed and commissioned (formally tested and accepted). Without a performance specification, there is little basis for defining He reliability of He system. When He performance of a system deviates from Hat specified, this is called a failure. A failure is thus an event related to specified performance deviation, wad reliability of the system being inversely related to the frequency of failure events. A system has internal states which, in aggregate support the extemal state of He system. Thus the external state of a system is an abstraction of its internal states. Usually a specification defines external states of a system which are He operations performed by the system. Thus, ntemal state failures cause failures of He external stem, unless the system is designed to be "tolerant" of intemal state failures. The rotating of He internal states are considered to be the algon~ms of He system or He methodology by which the system operates. Errors are not faults as long as errors occur within He bounds of the specification. In communications, bit errors are finite and are a function of the "internal states" of He communications system. The major cause of bit errors In a system can be related to signal level versus noise on a communications channel and may vary USA He type of modulation udlized. Timing jitter is another cause of bit errors. Wig a siren signal to noise ratio and jitter of dining signals, predictable bit error rates are achievable. A prudent communications equipment designer Includes bit error rate In his specifications. A system is composed of components, which in a non-integrated sense may also be considered as a system. Terminology for this case is subsystem. Subsystems are composed of modules and modules are composed of discrete components. The system is the interworldugs of components, modules, and subsystems to perform the system function. Thus, a communications system, is a subsystem of an US system. Its function is to provide communications paths from source to desdnation within specified time periods and within specified error allowances. Failures of subsystems and components may occur, however, as long as He system performs to specifications, no system failure has occurred. Modem communications system architecture ~\NCHRP~p ~NCHRP3-51 Phase2F~nalReport AS-20

OCR for page 518
supports failure fault tolerance by providing alternate communications paws. A subsystem failure which does not cause a system failure is, of course, important from a maintenance standpoint. In the sense of fault tolerant design, a subsystem or component failure usually means Me loss of redundancy, ~us, the probability of system failure decreases. Therefore, it is important to restore redundancy to maintain a low probability of failure at the system level. With properly designed systems, errors should be tolerated, especially human errors. Systems which fall because of errors, have not been designed to accommodate a normal occurring phenomena For this reason, systems should be designed to validate human entries from We standpoint of reasonable parametric values and entry actions. While it may be impossible to inhibit an error that meets entry validation logic, from being propagated Trough a system, logic should not cause a failure of the system (i.e. a system "crash") but rather an output which is logically related to the errored input (even though the input is in errors. ~ a prudently designed system, algorithms can be employed to test values, tagging abnormal changes that are either impossible or highly unusual. The medium through which symbolic information is transmitted (fiber, copper, air, etc.) directly relates to bit error rate as well as to the structure of the signal and its ability to maintain Information content dunng transmission Trough He medium. This "signal structure" is referred to as modulation. A third element is "noise" associated with the signal transfer, in relation to signal strength. For example, wireless communications may have an expected bit error rate (BER) of ~ error in 105 bits while a fiber optic communications linic may have a BER of ~ error in 10~ bits or less. Being a function of signal to noise, wireless can sacrifice communications distance for BET. For a typical operating distance with the eransminer effective radiated power within FCC specifications, a typical wireless network is specified wad a BER of 105. Errors should be expected in systems and the system design should accommodate expected errors. An error occurrence within specification, agate, is not a system failure. An error rate which exceeds specification is a failure as long as all over specifications are met (such as transr~iit~d power, receiving sensitivity, timing stability, etch. c;WCHRE`Pb~' NCHRP3-51 Phase2FinalReport AS-21

OCR for page 518
Within a modem communications system, at1 internal error (assuming a properly designed system) should be detected, tagged, statistically counted, and prevented from propagating through the system. This is accomplished by use of simple panty, check sum panty, and techniques such as forward error correction coding (which is a more sophisticated form of parity). Monitoring the statistics of bit errors can be an indication of a pending failure. An increase in bit error rate indicates a problem in communicators links, such as increased interfering noise in wireless transmission from an aging transmitter solid state device causing a decrease In transmit signal power. Within wireless communications systems, a failure may occur and yet the system may be performing to specification. The reason is that He wireless communications system must operate within an environment which it does not control. Specifications for wireless communication define Be maximum noise environment in which Be system provides BER to specifications. Design margins for signal strength at receivers, when properly designed, accommodate normal fade conditions based on statistical weaker vanabons. Should a very unusual weaker condition occur or a noise source be instaUed near a receiving site, not considered urchin Be initial installation design, Be noise floor win rise or the signal strength win decrease which negatively impacts signal-to-noise (S/N) ratio, Bus causing the BER to exceed specification. In fact, the wireless system is performing to specification as predictive by the SIN ratio. Thus, ~ reless systems, monitoring noise floor and signal strength at the receiver provides a means of evaluating whether the system has failed or the medium environment has changed. This is not true of wire line nor optical links since the cable is part of the system and can be maintained. It is impossible to "maintain" the atmosphere; it can only be monitored as to Be conditions supporting com}nuIiications. From an operator's standpoint, an unacceptable BER, causing system response to degrade and responsive "answers" to be unavailable, is a system failure and corrective action is rEqui - . For wireless, where an unusual change In medium conditions has occulted, this is not a system failure in Be true sense of reliability. Where a system has been specified to need a 15 dB signal margin to accommodate fade and ambient noise changes and where this signal margin is not provided, Ben Be system has failed in Be Sue sense of a failure, since Be requiem margin is "out of specification." ~;`NCEDWha ~NCHRP3-51 Phase2F~nalReport A5-22

OCR for page 518
In summary, physical components have finite functional life. As components wear and/or age parameter values may change. Component parameters change with temperature. The failure of the component may vary based on the ability of interconnected components to accommodate normal changes such as wear and/or aging. Poor designs, not employing worst case design principles for component parameters vanation, can experience rapid aging and failure. Failure rate of a component is statistically predictable. Reliability models can be developed to predict failure rate of integrated components, modules, and subsystems. The predicted MIBF of a system, when analytically computed using recognized reliability engineering procedures, is significant to the success of achieving a reliable system when installed. Formal reliability testing on a subsystem may be conducted. These tests, to be valid, must include a statistically significant sample of equipment and must be tested over the operational environmental vanabon expected for He system. This includes over and under power conditions, temperature extremes, and rapid temperature vanations. Any reliability information obtained without formal analysis and testing is 'highly suspect" and is usually incorrect. Reliability does not just "happen." It must be designed into equipment and a Quality Assurance Program must be in place to assure that material, production processes, assembly, and tests are conducted in compliance u ith Be design and associated production drawings and procedures. Reliability is assured wig proven design and test approaches. Product "burn-in" (for environmental stress testing) to preclude "infant mortality" of components is a must for quality products. In Be ITS procurement process, reliability has not been adequately stressed and managed. Even with the emerging 'life Cycle Cost?' procurements, verifiable MTBF information is necessary; otherwise, the cost analysis is unsubstandable. A.5.5.1 How Reliability is Achieved Reliability in communications equipment starts u id quality design. Quality design includes consideration for Be statistical var~abon of signals based on component differences, tolerance differences in assembly, variations over operation temperature, and variations due to prime L;\NCHR~Pha ~NCHRP 3-51 Phase 2 primal Report AS-23

OCR for page 518
voltage. A quality product design works with worst case var~adons anticipated through the manufacturing process and during operations. Second, product reliability is achieved by assuming Mat products are manufactured under a well managed, quality assurance program such as Eat recommended by Bellcore or ISO 9000, through 9003. Quality Assurance (QA) involves verifying that correct manufacturing assembly and test processes and correct materials are used in production of the product. This includes attention to Electronic Static Discharge (ESD) and protection of sensitive devices. QA also verifies that We product conforms to specification prior to shipping. Third, product reliability is achieved Group use of Electrical and Environmental Stress (EES) testing over We operational specifications of the equipment. Stress testing, of which '~burn-'n" is a subset, assures Eat weak components, and marginal electromechanical connections are found. Thus infant moronity of marginal components is precluded and aH electromechanical connections are verified to be suitable to support functional operation. Without ESS at We factory level, We jurisdictional installation becomes the place of burn-in. Use of large scale integration fiercer supports product reliability. The reason is Eat mechanical interconnects are minimized and noise interference is more easily managed when protection is considered a part of We integrated circuit design. Perhaps We use of large scale integrated circuits has enhanced reliability of advanced communications products compared with convendonal products more so Dan improved manufacturing methods and testing. Selection of components which are certified over operating temperature range is important. Also important is a design which manages heat dissipation from components, especially integrated circuits. Components which are operated at temperatures over cerdficabon can have their life expectancy reduced by 50% or more. Redundancy is another method of improving reliability by increasing fault tolerance. There are various fonns of redundancy depending on the desired reliability. Redundancy provides We capability for a backup device or unit to assume operation In case of the primary unit failure. The use of real-time fault tolerance requires failure detection, fault propagation prevention, and switch-over to the redundant unit. Non-real-time redundancy may be used. This involves a L::~NCHR~t NCHRP3-51 Phase2FmalRepo~ A5-24

OCR for page 518
standby unit being manually switched into operation upon detection of failure in a primary unit. Very high reliability systems use a two-out~f-~ee match: where two outputs do not agree, a failure occurs. Some space systems use two-out-of-~ee matching for reliability. Win real-time fault tolerance, as long as the unit continues to perform to specifications, it has not failed. Thus Mean Time Between Failures (MIBl;) in Be lOs of years awe possible through use of real-~ne fault tolerance. Network design in communications also supports high reliability. Having alternate commun~cabons paths provides a means of commun~cadon if a primary paw is disrupted. Disruption may be through cable damage, or in the case of winless, an obstruction in the radio line of sight Pa~-sw~tched optical rings and packet radio networks are examples of cable and wireless communications path disruption fault tolerance. Id packet radio networks, to achieve fault tolerance a system must be designed wad a minimum of two communications paths for each transceiver location. In junsdictional communications networks, Mere is a high probability of cable Carnage or wireless communication path blockage due to construction activity as Be jurisdiction grows. Thus, use of communications paw redundancy is very important in achieving reliability. There are a variety of approaches to paw redundancy. An unfolded counter rotating ring is perhaps Be simplest, with star structures providing more redundancy paths and thus higher path reliability at higher cost per communications path (i.e. cable infrastructure). Reliability must be traded off wad cost of more elaborate, redundant network geometries such as star versus ring versus nterwor~ng nags. A compromise on star architecture is interwor~ng optical ring topology related to SONET. Interwor~g rings can sustain more than a single break in a fiber cable (number depending on ring interworking geometry) and can provide more usable geographic area coverage compared with a "stat' topology, usually at a lower cost compared with "star?' topology Origin a metropolitan area Again, reliability must be designed into equipment and into software. Reliability cost can usually be justified based on life cycle cost analysis which considers cost of emergency repairs L:~NCHR~Pba~t NCHRP3-51 Phase2FmalReport N A5-25

OCR for page 518
and related logistics in large systems. Unfortunately, system reliability objectives may be compromised because of limited funds for system acqutsidon, since maintenance funds usually come from different funding sources and cannot be used for acquisition.. A.5.6.2 Reiiabilky Considerations of Older Communications Technology Versus Advanced Com munications technology Older commun~cadons technology was designed wad discrete components and limited integrated circuitry. Only since 1990 has hybnd integrates! circuitry been stressed In systems. Hybnd indicates the use of a combination of digital and analog technology in an integrated circuits. Win evolution of integrated dig~tal/analog circuitry in a single "chip," and advances in applications-specific integrated circuits, significant component reduction has resulted. MTBFs of co~nurucations devices have grown from several thousand hours to tens of thousands of hours. It is not uncommon to find a 20,000 hour MTBF advanced communications product, and new products are emerging win five-year (43,800 hours) MTBFs. Using redundancy, systems availability levels of 99.98% are achievable and are reasonably common win Me use of advanced communications technology within Me telephone ~ndus~y (such as SONET). Older communications technologies generally used analog techniques which are susceptible to noise. Digital technology provides improved noise immunity. This is especially true of distal video communications and optical commun~cabons versus twisted pair wire line systems using analog modems. Thus, use of advanced communications technology, assuming Mat it has successfully been Trough beta testing and is in production status, usually exhibits a much hider MIBF Man conventional technology. A.5.5.3 tmpactofInstalIation Design on Reliability Communications equipment may be designed win high reliability; however, if installation design is improper, reliability can be significantly impacted. Installation design must properly consider communications paw length, signal attenuation, environmental noise, signal distortion caused by marginal bandwiddls, modulation types used, propagation time, synchronization, and over L::WCH=Pbase~pt N~" 3-51 ~ 2 Few ~n AS-26

OCR for page 518
factors impacting reliable point-to-point communications. Without adequate margins to accommodate increases In attenuation and noise, commun~cabons reliability is impacted. Similarly, installation design must consider ambient temperature extremes, such as reading internal heat dissipation and requned air conditioning. If air conditioning is critical to equipment operation, then it should be fault tolerant and should be backed-up with emergency power. Similarly, power outage and need for interruptible power supply for system reliability must be considered. Batteries must be sized to Me maximum outage period acceptable, prior to activating a motor/generator. Continuous load factor must be considered to assure reliability because some UPS equipment only allows 80% continuous load. Proper instalIabon design is just as important as proper equipment selection in insuring high reliability systems. Attention to professional installation design and quality inspection of Me installation provides high confidence that system reliability is not compromised. A.5.5.4 Environmental Impact on Reliability Temperature mismanagement is one of the major contributors to equipment failure. It is also one of the least understood requirements by junsdictional personnel. ~ we review deployed ITS communications equipment today, one of Me major deficiencies noted is a disregard for operating temperature specifications. Junsdictions uaR emphasize the need for NEMA TS-2 environmental specifications for field controllers and will interconnect these controllers with communications devices (such as Hayes modems) which are only designed to operate from +10 to +40 C and wad humidity of 30% to 85% RH, noncondensing. The argument is Hat the devices work; however, what is not understood is Hat reliability is sit ficantly decreased. areas where devices are operated in high ambient temperatures (such as 50 C in Arizona, Nevada, or California desert communities) reliability can be impacted causing failure after a relatively short operating penod. Electronic equipment using solid state components (such as integrated circuits and transistors) must be designed to maintain operating temperatures of junctions at an acceptable level to achieve the reliability objective of He equipment within He specified operating temperature range and cooling approach. There are many types of solid state components from low cost plastic, to industrial grade ceramic, to surface mount MIL-STD-~83 compliant. There are many t.\NCHRmPh~.rp ~NCHRP3-51 Phase2F~nalReport A5-27

OCR for page 518
approaches to cooling components from attachable fins for extended surface and conduction/convechon cooling to use of a ground plane on a printed circuit board which conducts heat to printed circuit board gu~des/card holders, and finally to a large heat sync such as Me case. Equipment operated in an office environment usually does not require sophisticated heat management; thus, plastic (not a good heat transfer matenal) components and cooling fans are used. When a device designed to operate in a stable office environment is employed on the roadside, it is exposed to a much wider temperature environment which it was not designed to manage. ~ a NEMA cabinet on a freeway in Arizona wig a 49 C ambient environment and solar loading of Me cabinet, an internal temperature of 74 C is highly probable dunng peak solar loacling periods. Solid state component junction temperature will increase perhaps by 60 C causing an MIFF decrease from 107 hours to 104 hours (10 million to 10 thousand hours). Figure / A.5.5.4-! illustrates typical failure rates of silicon and GaAs integrated circuit. When heat is properly managed, reliability of 107 hours is achievable at Me component level. '~lack box" reliability is obtained through analyzing the interconnection of components in senal and parallel structures. Reliability of a serially interconnected group of semiconductor components is calculated by the same basic formula used to calculate resistance of parapet regions. (Se~ia1 Fo'6ue Ram = Ail ~"2 ^u Where: PR = n = L:\NCH]Wh~t Failure rate of component in service Nib component NCHRP 3-51 Phase 2 Fmal Report A5-28

OCR for page 518
At o o F C) Z _ \ O _ Cat Cat 'A ' , hi\ _ . ~1 c Hi' ~ ~1 Ow Z Icy A) At: 1 ~ 4o W~ Z _ ~- ~I A O \ ~` C _ m <: \ \ .0 ~ \, ~A \ \ o CD \ .. \ C`c _ O at, ~_ _ ~ ~ ~(0 _ _- _ O O O 111 - ~_ -= llJ O ~ I - O O O ~r' cat o O ''O to to to C~ o o LL L,I _` _ ~_ C~ ~ d u) ~ ~ Q - Ln o O o X L~ C ~ Z JO O o ~r In ~ 8 ~ ~0

OCR for page 518
As can be seen, the failure rate of integrated components can be even more significantly impacted by reduction in each individual failure role because of the serial interconnects. Paying attention to environmental design specifications of communications equipment applies not only to old components but also to new technology. Where air condidon~ng is used to meet environmental requirements, failure of the air condidon~ng unit becomes a critical point of failure for the system, even if fault tolerance is used in site electronics modules. The reason is that both the back-up and primary electronic modules win be subject to rapid decrease in time between failures. Thus, use of redundant air conditioning is necessary or use of rising temperature alarms which support air conditioner repair prior to the cndcal failure temperature of He electronics. A.5.5.5 Software Reliability The Institute of Software Eng~neenng OSE) is recognized by Me U.S. Government and ~ndus~y as the certifying agency for software quality. Companies are certified based on Be qualifier of Weir software development organization, management, development tools, test approach, documentation, and quality assurance. The Institute of Electrical and Electronic Engineers ~EE3 has published stardoms and guidelines for producing reliable, quality software. IEEE Standard 982 defines "Measures to Produce Reliable Software," JEER Communications Magazine (Volume 32, No. 10, October 1994 pas. 58-63) presents an excellent article entitled 'life after ISO 9001. Britishtelecom's Approach to Software Quality," discussing Be impact of software quality assurances on software reliability For software to be reliable it must be weB designed and weB tested, including stress testing. Stress testing is associated with worst case loading of Be processor (within its specified operational envelope) to assure Hat He integrated hardware and software can perform. Hardware/software integrated timing are Bus tested and the ability of He hardware/soRware to handle buffenug requirements under worst case loading. Furler tested are errors to validate Hat He software does in fact detect error conditions and prevent error propagation Hat could cause a software "crash." L:\NCHR~Pb~!pt NCHRP3-51 Phase2FinalReport A5-30

OCR for page 518
Structured software also supports reliability by preventing multiple entry and exit points from software modules (100 to 200 lines of code to perform a subtask). While Mere are design techniques for enhancing software reliability, these are seldom used because of cost For instance, a two-out-of-~ree rating technique, where three software routines written by Free different software engineers have outputs compared, wad the acceptance factor being that "two must match" for valid output. In general, weB tested software uric provide reliable communications operations. As software matures, '~bugs" are identified and repaired in new rehearse versions. In general, higher software versions should be more reliable; however, major performance addidons may increase probability of software failure. In general, when communications equipment uses software, procurement attention should take into consideration: . Was it developed under a formal software quality and reliability program? -Is it structured? Was it thoroughly tested, including software stress testing? Were formal test procedures used and test results documented? Is it designed to accommodate expected error conditions?, and Is adaptation to a specific application by table driven entnes? Software quality and reliability is a formal process and should be treated by jur~sdichons as an important requirement of a procurement Advanced communications systems employ more software and firmware Wan convendonal commun~cabons systems. The result is Fat hardware components are minimized since these systems use digital processors which support multifunctional operations. As digital signal processors have been introduced into communications equipment, higher processing performance ~;~NCHR~ ~NCHRP3-51 Phase2F~nalReport A5-31

OCR for page 518
has had an additional impact on my zing hardware components. As components are eliminated and hardware reliability is transposed to digital signal processor (DSP) chips, advanced communications hardware reliability is increased. Since software and/or firmware provides Me communications functionality, reliability of software must be considered. Contrary to the classic statement: "Software doesn't break," poorly designed software does in fact result in ma~nchons. For this reason it is important for junsdictions to procure software from companies which have formal software reliability and quality programs. A.5.5.6 Specifications and Practices Related to Reliability of Communications Systems Tables A.5.S.6-la and b summarize specifications and practices related to U.S. telecommunications and related data communications industries. Within the BeDcore family of specifications (a), there are numerous specifications related to communications devices, subsystems networks, and services which relate to system reliability. These specifications define standards, procedures, and tests to be used in construction, integration and testing of communications hardware and software for implementation within public and private telecommun~cadons systems. Quality Assurance directly relates to reliability. Thus BeNcore has included Quality and Reliability In common sped if icadons. Poor quality assurance usually results in poor reliability when products are integrated into operational systems. BeNcore TRUEST 000332 entitled 'reliability P - ichon Procedures for Electronic Equipment" is an effort to standardize how reliability figures are placed on equipment. TEl NW~-000332 is Bellcore's equivalent of M1L HDBK-217F which is Me Department of Defense's methodology for developing reliability predictions. Table A.5.5.6-2 summarizes hardware and super-design issues related to communications reliability. L.~.NCHRP\Phase2.rp ~NCHRP3-51 Phase2FmalRepon A5-32 \

OCR for page 518
Table A.5.5.~1a Industry Recognized Specifications and Practices Related to Reliability Belicore Specifications Specification Number Title . FR-NWT-000796 Reliability and Quality: Generic Requirements . . _ SR-NWT-001756 Automatic Protection Switching for SONET SR-NWT-001907 Transport Reliability Analysis, Generic Guidelines . .. SR-NWT-002159 Quality Systems and Software Requirements SR-STS-02579 Algorithms for Redundancy Management . SR-TSY-000385 Reliability Manual for Telecommunications SR-TSY-001130 Reliability and System Architecture Testing SR-TSY-001369 ~ Reliability of Laser Diodes and Modules: Generic Requirements TA-NWT-357 Assuring Reliability of Components Used In Telecommunications Equipment: Generic Requirements _ . TA-NWT-000942 Hardware Reliability Assurance Program: Generic Requirements TA-NWT-001089 ~ Electromagnetic Compatit lity and Electrical Safety: Generic Criteria for Network Telecommunications Equipment TA-NWT-001202 Supplier Quality Process Requirements TA-NWT-001221 | Generic Requirements for Passive Fiber Optic Component Reliability Assurance Practices TR-NWI-000063 Network Equipment- Building Systems: Generic Requirements (Environment and Electrical Safely) TR-NWT-000284 Reliability and Quality, Switching Systems: Generic Requirements TR-NWT-000332 ~ Reliability Prediction Proc ! d ures for Electronic Equipment TR-NWT-000418 T Re li abi I ity Assu rance R eq Hi reme nts to r Fibe r O ptic Transpo rt Systems TR-NWT-000468 Reliability Assurance Practice for Optoelectronic Devices in Central Office Applications TR-NWT-000870 ~ Electrostatic Discharge C' control in the Manufacture of Telecommunications Equipment: Generic Requirements TR-NWr-000974 Generic Requirements for Telecommunications Line Protectors TR - NWT-001011 | G ene ric Req u i remeets fc S u rge P rotective Devices on AC :\NCHRmPba~t NCHRP 3-51 Phase 2 Fmal Report AS-33

OCR for page 518
Specification Number rule _ TR-NWT-001042 SONET Path Protection Switched Ring . . __ TR-NWT-001075 Generic Requirements for Outside Plant Bonding and Grounding of System Hardware TR-NWT-001230 SONET Bidirectional Line Switched Ring Equipment Generic Criteria .__ TR-NW1-001274 Reliability Qualification Testing of Printed Wiring Assemblies Exposed to Airborne Hygroscopic Dust TR-NWT-001305 Generic Requirements for Surge Protected Terminal Blocks . . . TR-NW[-001349 Reliability and Quality Measurements for Telecommunications Systems; Supplier Support Measures . TR-NWT-001400 SON ET Unidirectional, Dual Feed' Self-Healing Ring Generic Criteria TR-TSY-000512 Reliability: Generic Requirements ~ . TR-TSY-000757 | Generic Requirements to Uninterruptable Power System (UPS) . ._ . _ TR-TSY-000929 Reliability and Quality Measurements for Telecommunications s~m TR-TSY-000983 Reliability Assurance Practices for Optoelectronic Devices in Loop Applications - Table A.5.5.6-1b Technical Requirements and Specifications Related to Maintainability Institute of Electrical and Electronic Engineers . . . Specification Number Title 62.41-1991 | Recommended Practice n Surge Voltages in Low Voltage AC Power Circuits _ . _ 62.47-1992 | Guide on Electrostatic Di. charge 63.12-1987 Recommended Practice for Electromagnetic Compatibility Limits 1 41 -1 986 | Recom m ended P ractice f ~ r E lecincal Powe r D istri button 142-1991 Recommended Practice for Grounding of Industrial and Commercial Power Systems _ . 241-1990 | Recommended Practice rElectricalPowerSystemsin | Commercial Buildings 242-1986 Recommended Practice for Protection and Coordination of Do_ ~ let NCHRP3-51 ~ ~ase2FmalReport AS-34

OCR for page 518
Specification Number 446-1 987 ~Title . . .._ Recommended Practice for Emergency and Standby Power Systems 493-1 990 Recommended Practice for the Design of Reliable Industrial and Commercial Power Systems _ Guide for the Installation of Electrical Equipment to Minimize Noise to Controllers from External Sources 518-1982 730-1 989 IEEE Standard for Software Quality Assurance Plans l I IEEE Standard for Softwa ~ Configuration Management Plan l 828-1 990 829-1 983 IEEE Standard for Test Documentation 982.1/2-1 988 Measures to Produce Reliable Software 1 008-1 987 l I Standard for Software Uni Testing = 1012-1986 1042-1987 Standard for Software Verification and Validation Plans Guide to Software Configuration Management 1061-1992 Standard for Software Quality Metrics Methodology Standard for Developing Software Lde Cycle Process 1074-1991 1110-1992 Recommended Practice for Powering and Grounding Sensitive Electronic Equipment IEEE Standards for Electromagnetic Compatibility SH 15537 Society for Automotive Engineers _ 1 _ SAEJ 1113 Department of Defense DOD 5000.3 Electromagnetic Susceptibility Procedures for Verifying Vehicle Components . . Test and Evaluation Master Plan - Guidelines MIL-HDBK-21 7 MIL-HDBK-781 MIL-STD-210 C ~ aJ~_~= Reliant ~ ;= ~ ~ ~ ~ =~ :~ == == and Production .. MIL-STD461 C MIL-STD462 EMI and Susceptibility Requirements Test Procedures for EMI/EMC MIL-STD-810 E Environmental Test Methods and Engineering Guidelines Test Methods and Procedures for Microelectronics MIL-STD-883 D MIL-STD-21 68 Software Quality Assurance ~;wcHRnPb ~NCHRP3-51 Phase2FnalReport A5-35

OCR for page 518
Specification Number ~ Federal Communications Commission Part 15 Unintentional Radiators, Class A and B Devices, Radiated Emissions and Susceptibility Table A.5.5.6~2 Summary Overview of Communication Reliability Considerations Equipment Related Issues ~ _ '~ _ _ Formally determined, guaranteed mean Proper network design with link marked time between failure and high probability of maintaining signal/noise , System designed to reliability objectives Compatible specifications at all OSI using subsystem MTBF vex: Fault tolerant hardware and network Fault tolerance network architecture architecture Equipment designed for compatibility Installation environment designed to with environmental variations equipment compatibility and to reliability Equipment meeting radio frequency objectives (such as redundant air interference and electromagnetic conditioning) compatibility standards Equipment designed for compatibility with power variation Equipment designed using worst case Prime power interconnect designed to component parametric variations reliability objectives with use of battery backed-up, uninterruptable power system Equipment with dynamic test, fault~ Data element secured to prevent detection and fault propagation unskilled, unknowledgeable tampering _ ~ Equipment using large scale integration Installation design considering electromagnetic compatibility and grounding standards Equipment manufactured under formal| Lightning protection on power end quality assurance program (using metallic signal lines specification-compliant components, materials, processes, and test procedures including bum-in) Use of recognized standards and Port installation testing using stress protocol testing techniques (full loading) L;`NCHR~t NCHRP 3-51 Phase 2 Final Report A5-36

OCR for page 518
A.5.5.7 Summary of Reliability Issues New communications equipment is more reliable Han older equipment for the following reasons: Use of large scale integration; Use of softwaIe/firmware to reduce hardware components; Perfection of fault tolerant technology and application to advanced communications . equipment; Advances in adaptive signal processing and modulation technology; Evaluation of open standards; and Advances in network routing protocol and network management protocol. Table A~.5.7-! summaries technology advances affecting commun~cabons reliability. \ L::\N~t NCHRP3-51 Phase2FmalReport A5-37

OCR for page 518
Table A.5.5.7~1 Summary of Technology Advancements Impacting Communications Reliability L Technology Advancements | Benefits Large scale integrated circuits, Improved power management and programmable logic arrays (PLA) application filtering technology specific integration circuits (ASICs) and digital signal processors reducing component count Evolution of fault tolerar t technology from Advanced modulation techniques NASA and communications equipment with improved immunity to n~nd He and mu, Development of OSI interface standards to 7 | Emphasis on product design for levels reliability and quality assurance sups ng ~ ~ ~ Emphasis on network interoperability with | Improved IC packaging and heat L associated standards for bridges, routers, management on printed circuit and switches Roams Emphasis onnetworcprotocolstandards | Development ofinten~et protocol with congestion control and dynamic routing and dynamic routing Improved transmission medium adaptation | Improvements in hardware and through use of echo cancelers and adaptive software test methodology modern technology | Evolution of optics communications | Advancements in error detection technology with low-loss, low-noise and error correction transmissions | Improved circus design and printed circuit | Development of hybrid and board design through modem computer- microwave integrated circuit aided design technology NehNorkmodelingtechnologytoprove- | Lower cost perforrnanceevatuation before-build and tradeoff analysis Several articles amplifying this information may be of interest to the reader: Proceedings of the [EKE, Volume 82, No. 7, July 1994, pp. 992-10(~4; "Predicting We Reliability of Electronic Equipment," by M. Pechy and F. Nash. IEEE Communications Magazine, October, 1994, pp. 64-68; "Network Reliability Design Techniques to improve Customer Satisfaction," by M. Taka and T. Abe. L;`NCHRmPh~t NCHRP3-51 Phase2F~nalReport AS-38

OCR for page 518
IEEE Communications Magazine, June 1993, pp. 4043; "Incorporating Reliability Specifications into He Design of Telecommunications Networks," by S. Nojo and H. Watanabe. Electronic Design News, June 1994, pp. 109-!16; "Keep Metastability from Killing your Digital Design," by G. Grosse. Note: This article deals wad Among stability versus MTBF of equipment. 01d communications equipment typically had MTBFs In the 2000 to 3000 hour range. Advanced commun~cadons system technology can be obtained wad MTBFs in He 50,000 to 100,000 hour range. The bottom line is whether a failure is critical. Win modern fault tolerance, network availability in the range of 99.998% is achievable. L;`NCHIWba~' NCHRP3-51 Phase2FmalReport AS-39