Since the early 2000s, the U.S. Department of Defense (DoD) has carried out several examinations of the defense acquisition process and has begun to develop some new approaches. These developments were in response to what was widely viewed as deterioration in the process over the preceding two or so decades.
During the 1990s, some observers characterized the view of defense acquisition as one of believing that if less oversight were exercised over the development of weapons systems by contractors, the result would be higher quality (and more timely delivery of) defense systems. Whether or not this view was widely held, the 1990s were also the time that a large fraction of the reliability engineering expertise in both the Office of the Secretary of Defense (OSD) and the services was lost (see Adolph et al., 2008), and some of the formal documents providing details on the oversight of system suitability were either cancelled or not updated. For instance, DoD’s Guide for Achieving Reliability, Availability, and Maintainability (known as the RAM Primer) failed to include many newer methods, the Military Handbook: Reliability Prediction of Electronic Equipment (MIL-HDBK-217) became increasingly outdated, and in 1998 DoD cancelled Reliability Program for Systems and Equipment Development and Production (MIL-STD-785B) (although industry has continued to follow its suggested procedures).1
1 A criticism of this standard was that it took too reactive an approach to achieving system reliability goals. In particular, this standard presumed that approximately 30 percent of system reliability would come from design choices, while the remaining 70 percent would be achieved through reliability growth during testing.
During the early to mid-2000s, it became increasingly clear that something, possibly this strategy of relaxed oversight, was not working, at least insofar as the reliability of fielded defense systems. Summaries of the evaluations of defense systems in development in the annual reports of the Director of Operational Testing and Evaluation (DOT&E) between 2006 and 2011 reported that a large percentage of defense systems—often as high as 50 percent—failed to achieve their required reliability during operational test. Because of the 10- to 15-year development time for defense systems, relatively recent data still reflect the procedures in place during the 1990s.
The rest of this appendix highlights sections from reports over the past 8 years that have addressed this issue.
DOT&E 2007 ANNUAL REPORT
In the 2007 DOT&E Annual Report, Director Charles McQueary provided a useful outline of what, in general, needed to be changed to improve the reliability of defense systems, and in particular why attention to reliability early in development was important (U.S. Department of Defense, 2007a, p. i):
Contributors to reliability problems include: poor definition of reliability requirements, a lack of understanding by the developer on how the user will operate and maintain the system when fielded, lack of reliability incentives in contracting, and poor tracking of reliability growth during system development.
He also wrote that addressing such concerns was demonstrably cost-beneficial and that best practices should be identified (pp. i-ii):
[O]ur analysis revealed reliability returns-on-investment between a low of 2 to 1 and a high of 128 to 1. The average expected return is 15 to 1, implying a $15 savings in life cycle costs for each dollar invested in reliability…. Since the programs we examined were mature, I believe that earlier reliability investment (ideally, early in the design process), could yield even larger returns with benefits to both warfighters and taxpayers…. I also believe an effort to define best practices for reliability programs is vital and that these should play a larger role in both the guidance for, and the evaluation of, program proposals. Once agreed upon and codified, reliability program standards could logically appear in both Requests for Proposals (RFPs) and, as appropriate, in contracts. Industry’s role is key in this area.
2008 REPORT OF THE DEFENSE SCIENCE BOARD
In May 2008, the DoD’s Defense Science Board’s Task Force on Developmental Test and Evaluation issued its report (U.S. Department of Defense, 2008a), which included the following assertions about defense system development (p. 6):
The single most important step necessary to correct high suitability failure rates is to ensure programs are formulated to execute a viable systems engineering strategy from the beginning, including a robust reliability, availability, and maintainability (RAM) program, as an integral part of design and development. No amount of testing will compensate for deficiencies in RAM program formulation.
The report found that the use of reliability growth in development had been discontinued by DoD more than 15 years previously. It made several recommendations to the department regarding the defense acquisition process (p. 6):
[DoD should] identify and define RAM requirements within the Joint Capabilities Integration Development System (JCIDS), and incorporate them in the Request for Proposal (RFP) as a mandatory contractual requirement … during source selection, evaluate the bidder’s approaches to satisfying RAM requirements. Ensure flow-down of RAM requirements to subcontractors, and require development of leading indicators to ensure RAM requirements are met.
In addition, the task force recommended that DoD require (p. 6)
[the inclusion of] a robust reliability growth program, a mandatory contractual requirement and document progress as part of every major program review … [and] ensure that a credible reliability assessment is conducted during the various stages of the technical review process and that reliability criteria are achievable in an operational environment.
This report also argued that there was a need for a standard of best practices that defense contractors could use to prepare proposals and contracts for the development of new systems. One result of this suggestion was the formation of a committee that included representatives from industry, DoD, academia, and the services, under the auspices of the Government Electronics and Information Technology Association (GEIA). The resulting standard, ANSI/GEIA-STD-0009, “Reliability Program Standard for Systems Design, Development, and Manufacturing,”2 was certified by the
2 This standard replaced MIL-STD-785, Reliability Program for Systems and Equipment.
American National Standards Institute in 2008 and designated as a DoD standard to make it easy for program managers to incorporate best reliability practices in requests for proposals (RFPs) and in contracts.
ARMY ACQUISITION EXECUTIVE MEMORANDUM
About the same time, the Army modified its acquisition policy, described in an Army acquisition executive memorandum on reliability (U.S. Department of Defense, 2007b). The memo stated (p. 1): “Emerging data shows that a significant number of U.S. Army systems are failing to demonstrate established reliability requirements during operational testing and many of these are falling well short of their established requirement.”
To address this problem, the Army instituted a system development and demonstration reliability test threshold process. The process mandated that an initial reliability threshold be established early enough to be incorporated into the system development and demonstration contract. It also said that the threshold should be attained by the end of the first full-up, integrated, system-level developmental test event. The default value for the threshold was 70 percent of the reliability requirement specified in the capabilities development document. Furthermore, the Test and Evaluation Master Plan (TEMP)3 was to include test and evaluation planning for evaluation of the threshold and growth of reliability throughout system development.4 Also about this time, the Joint Chiefs of Staff published an updated instruction about system requirements (CJCSI 3170.01F)5 that declared materiel availability, a component of suitability that is a function of reliability, a “key performance parameter.”
RELIABILITY IMPROVEMENT WORKING GROUP
Also in 2008, DOT&E established a Reliability Improvement Working Group, with three goals: ensuring that each DoD acquisition program incorporates a viable systems engineering strategy, including a RAM growth program; promoting the reconstitution of cadres of experienced test and evaluation and RAM personnel across government organizations; and implementing mandated integrated developmental and operational testing, including the sharing of and access to all appropriate contractor and
3 The TEMP is the high-level “basic planning document for all life cycle Test and Evaluation (T&E) that are related to a particular system acquisition and is used by all decision bodies in planning, reviewing, and approving T&E activity” (U.S. Department of the Army, Pamphlet 73-2, 1996, p. 1).
4 Many of these initiatives are described in the succession of DOT&E Annual Reports.
5 Available: http://www.dtic.mil/cjcs_directives/cdata/unlimit/3170_01.pdf [August 2014].
government data and the use of operationally representative environments in early testing.
The subsequent report of this working group (U.S. Department of Defense, 2008b) argued for six requirements: (1) a mandatory reliability policy, (2) program guidance for early reliability planning, (3) language for RFPs and contracts, (4) a scorecard to evaluate bidders’ proposals, (5) standard evaluation criteria for credible assessments of program progress, and (6) the hiring of a cadre of experts in each service. The report also endorsed use of specific contractual language that was based on that in ANSI/GEIA-STD-0009 (see above).
With respect to RFPs, the working group report contained the following advice for mandating reliability activities in acquisition contracts (U.S. Department of Defense, 2008b, p. II-6):
The contractor shall develop a reliability model for the system. At minimum, the system reliability model shall be used to (1) generate and update the reliability allocations from the system level down to lower indenture levels, (2) aggregate system-level reliability based on reliability estimates from lower indenture levels, (3) identify single points of failure, and (4) identify reliability-critical items and areas where additional design or testing activities are required in order to achieve the reliability requirements. The system reliability model shall be updated whenever new failure modes are identified, failure definitions are updated, operational and environmental load estimates are revised, or design and manufacturing changes occur throughout the life cycle. Detailed component stress and damage models shall be incorporated as appropriate.
The report continued with detailed requirements for contractors (pp. II-6, 7):
The contractor shall implement a sound systems-engineering process to translate customer/user needs and requirements into suitable systems/products while balancing performance, risk, cost, and schedule…. The contractor shall estimate and periodically update the operational and environmental loads (e.g., mechanical shock, vibration, and temperature cycling) that the system is expected to encounter in actual usage throughout the life cycle. These loads shall be estimated for the entire life cycle which will typically include operation, storage, shipping, handling, and maintenance. The estimates shall be verified to be operationally realistic with measurements using the production-representative system in time to be used for Reliability Verification…. The contractor shall estimate the lifecycle loads that subordinate assemblies, subassemblies, components, commercial-off-the-shelf, non-developmental items, and government-furnished equipment will experience as a result of the product-level operational and environmental loads estimated above. These estimates and updates shall be provided to teams developing assemblies, subassemblies,
and components for this system…. The identification of failure modes and mechanisms shall start immediately after contract award. The estimates of lifecycle loads on assemblies, subassemblies, and components obtained above shall be used as inputs to engineering- and physics-based models in order to identify potential failure mechanisms and the resulting failure modes. The teams developing assemblies, subassemblies, and components for this system shall identify and confirm through analysis, test or accelerated test the failure modes and distributions that will result when lifecycle loads estimated above are imposed on these assemblies, subassemblies and components…. All failures that occur in either test or in field shall be analyzed until the root cause failure mechanism has been identified. Identification of the failure mechanism provides the insight essential to the identification of corrective actions, including reliability improvements. Predicted failure modes/mechanisms shall be compared with those from test and the field…. The contractor shall have an integrated team, including suppliers of assemblies, subassemblies, components, commercial-off-the-shelf, non-developmental items, and government-furnished equipment, as applicable, analyze all failure modes arising from modeling, analysis, test, or the field throughout the life cycle in order to formulate corrective actions…. The contractor shall deploy a mechanism (e.g., a Failure Reporting, Analysis, and Corrective Action System or a Data Collection, Analysis, and Corrective Action System) for monitoring and communicating throughout the organization (1) description of test and field failures, (2) analyses of failure mode and root-cause failure mechanism, (3) the status of design and/or process corrective actions and risk-mitigation decisions, (4) the effectiveness of corrective actions, and (5) lessons learned…. The model developed in System Reliability Model shall be used, in conjunction with expert judgment, in order to assess if the design (including commercial-off-the-shelf, non-developmental items, and government-furnished equipment) is capable of meeting reliability requirements in the user environment. If the assessment is that the customer’s requirements are infeasible, the contractor shall communicate this to the customer. The contractor shall allocate the reliability requirements down to lower indenture levels and flow them and needed inputs down to its subcontractors/suppliers. The contractor shall assess the reliability of the system periodically throughout the life cycle using the System Reliability Model, the lifecycle operational and environmental load estimates generated herein, and the failure definition and scoring criteria…. The contractor shall understand the failure definition and scoring criteria and shall develop the system to meet reliability requirements when these failure definitions are used and the system is operated and maintained by the user…. The contractor shall conduct technical interchanges with the customer/user in order to compare the status and outcomes of Reliability Activities, especially the identification, analysis, classification, and mitigation of failure modes.
REQUIREMENTS FOR TEMPS
Also beginning in 2008, DOT&E initiated the requirement that TEMPs contain a process for the collection and reporting of reliability data and that they present specific plans for reliability growth during system development. The 2011 DOT&E Annual Report (U.S. Department of Defense, 2011a) reported on the effect of this requirement, noting that in a survey of 151 programs with DOT&E-approved TEMPs in development carried out in 2010 and focusing on those programs with TEMPs approved since 2008, 90 percent planned to collect and report reliability data. (There have been more recent reviews carried out by DOT&E with similar results; see, in particular U.S. Department of Defense, 2013.) In addition, these TEMPs were more likely to: (1) have an approved system engineering plan, (2) incorporate reliability as an element of test strategy, (3) document their reliability growth strategy in the TEMP, (4) include reliability growth curves in the TEMP, (5) establish reliability-based milestone or operational testing entrance criteria, and (6) collect and report reliability data. Unfortunately, possibly because of the long development time for defense systems, or possibly because of a disconnect between reporting and practice, there has as yet been no significant improvement in the percentage of such systems that meet their reliability thresholds. Also, there is no evidence that programs are using reliability metrics to ensure that the growth in reliability will result in the system’s meeting their required levels. As a result, systems continue to enter operational testing without demonstrating their required reliability.
LIFE-CYCLE COSTS AND RAM REQUIREMENTS
The Defense Science Board (U.S. Department of Defense, 2008a) study on developmental test and evaluation also helped to initiate four activities: (1) establishment of the Systems Engineering Forum, (2) institution of reliability growth training, (3) establishment of a reliability senior steering group, and (4) establishment of the position of Deputy Assistant Secretary of Defense (System Engineering).
The Defense Science Board’s study also led to a memorandum from the Under Secretary of Defense for Acquisition, Technology, and Logistics (USD AT&L) “Implementing a Life Cycle Management Framework” (U.S. Department of Defense, 2008c). This memorandum directed the service secretaries to establish policies in four areas to carry out the following.
First, all major defense acquisition programs were to establish target goals for the metrics of materiel reliability and ownership costs. This was to be done through effective collaboration between the requirements and acquisition communities that balanced funding and schedule while ensuring
system suitability in the anticipated operating environment. Also, resources were to be aligned to achieve readiness levels. Second, reliability performance was to be tracked throughout the program life cycle. Third, the services were to ensure that development contracts and acquisition plans evaluated RAM during system design. Fourth, the Services were to evaluate the appropriate use of contract incentives to achieve RAM objectives.
The 2008 DOT&E Annual Report (U.S. Department of Defense, 2008d, p. iii) stressed the new approach:
… a fundamental precept of the new T&E [test and evaluation] policies is that expertise must be brought to bear at the beginning of the system life cycle to provide earlier learning. Operational perspective and operational stresses can help find failure modes early in development when correction is easiest. A key to accomplish this is to make progress toward Integrated T&E, where the operational perspective is incorporated into all activity as early as possible. This is now policy, but one of the challenges remaining is to convert that policy into meaningful practical application.
In December 2008, the USD AT&L issued an “Instruction” on defense system acquisition (U.S. Department of Defense, 2008e). It modified DODI 5000.026 by requiring that program managers should formulate a “viable RAM strategy that includes a reliability growth program as an integral part of design and development” (p. 19). It stated that RAM was to be integrated within the Systems Engineering processes, as documented in the program’s Systems Engineering Plan (SEP) and Life-Cycle Sustainment Plan (LCSP), and progress was to be assessed during technical reviews, test and evaluation, and program support reviews. It stated that (p. vi)
For this policy guidance to be effective, the Services must incorporate formal requirements for early RAM planning into their regulations, and assure development programs for individual systems include reliability growth and reliability testing; ultimately, the systems have to prove themselves in operational testing. Incorporation of RAM planning into Service regulation has been uneven.
In 2009, the Weapons System Acquisition Reform Act (WSARA, P.L. 111-23) required that acquisition programs develop a reliability growth program.7 It prescribed that the duties of the directors of systems engineering were to develop policies and guidance for “the use of systems
6 See discussion in Chapter 9. Instruction 5000.02, Operation of the Defense Acquisition System, is available at http://www.dtic.mil/whs/directives/corres/pdf/500002_interim.pdf [December 2013].
7 P.L. 111-23 is available at http://www.acq.osd.mil/se/docs/PUBLIC-LAW-111-23-22MAY2009.pdf [January 2014].
engineering approaches to enhance reliability, availability, and maintainability on major defense acquisition programs, [and] the inclusion of provisions relating to systems engineering and reliability growth in requests for proposals” (section 102).
WSARA (U.S. Department of Defense, 2009a) also stated that adequate resources needed to be provided and should (section 102):
… include a robust program for improving reliability, availability, maintainability, and sustainability as an integral part of design and development, … [and] identify systems engineering requirements, including reliability, availability, maintainability, and lifecycle management and sustainability requirements, during the Joint Capabilities Integration Development System (JCIDS) process, and incorporate such systems engineering requirements into contract requirements for each major defense acquisition program.
Shortly after WSARA was adopted, a RAM cost (RAM-C) manual was produced (U.S. Department of Defense, 2009b) to guide the development of realistic reliability, availability, and maintainability requirements for the established suitability/sustainability key performance parameters and key system attributes. The RAM-C manual contains
- RAM planning and evaluation tools, first to assess the adequacy of the RAM program proposed and then to monitor the progress in achieving program objectives. In addition, DoD has sponsored the development of tools to estimate the investment in reliability that is needed and the return on investment possible in terms of the reduction of total life-cycle costs. These tools include algorithms to estimate how much to spend on reliability.
- Workforce and expertise initiatives to bring back personnel with the expertise that was lost during the years that the importance of government oversight of RAM was discounted.
In 2010, DOT&E published sample RFP and contract language to help assure reliability growth was incorporated in system design and development contracts. DOT&E also sponsored the development of the reliability investment model (see Forbes et al., 2008) and began drafting the Reliability Program Handbook, HB-0009, meant to assist with the implementation of ANSI/GEIA-STD-0009. The TechAmerica Engineering Bulletin Reliability Program Handbook, TA-HB-0009 was released and published in May 2013.
On June 30, 2010, DOT&E issued a memorandum, “State of Reliability,” which strongly argued that sustainment costs are often much more
than 50 percent of total system costs and that unreliable systems have much higher sustainment costs because of the need for spare systems, increased maintenance, increased number of repair parts, more repair facilities, and more staff (U.S. Department of Defense, 2010). Also, poor reliability hinders warfighter effectiveness (pp. 1-2):
For example, the Early-Infantry Brigade Combat Team (E-IBCT) unmanned aerial system (UAS) demonstrated a mean time between system aborts of 1.5 hours, which was less than 1/10th the requirement. It would require 129 spare UAS to provide sufficient number to support the brigade’s operations, which is clearly infeasible. When such a failing is discovered in post-design testing—as is typical with current policy—the program must shift to a new schedule and budget to enable redesign and new development. For example, it cost $700M to bring the F-22 reliability up to acceptable levels.
However, this memo also points out that increases in system reliability can also come at a cost. A reliable system can weigh more, and be more expensive, and sometimes the added reliability does not increase battlefield effectiveness and is therefore wasteful.
The memo also discussed the role of contractors (p. 2):
[Industry] will not bid to deliver reliable products unless they are assured that the government expects and requires all bidders to take the actions and make the investments up-front needed to develop reliable systems. To obtain reliable products, we must assure vendors’ bids to produce reliable products outcompete the cheaper bids that do not.
The memo also stressed that reliability constraints must be “pushed as far to the left as possible,” meaning that the earlier that design-related reliability problems are discovered, the less expensive it is to correct such problems, and the less impact there is on the completion of the system. Finally, the memo stated that all DoD acquisition contracts will require, at a minimum, the system engineering practices of ANSI/GEIA STD-0009 (Information Technology Association of America, 2008).
TWO MAJOR INITIATIVES TO PROMOTE RELIABILITY GROWTH: ANSI/GEIA-STD-0009 AND DTM 11-003
ANSI/GEIA-STD-0009: A Standard to Address Reliability Deficiencies
ANSI/GEIA-STD-0009 (Information Technology Association of America, 2008) is a recent document that can be agreed to be used as a standard for DoD purposes. It begins with a statement that the user’s needs are represented by four reliability objectives (p. 1):
- The developer shall solicit, investigate, analyze, understand, and agree to the user’s requirements and product needs.
- The developer shall use well-defined reliability- and systems-engineering processes to develop, design, and verify that the system/product meets the user’s documented reliability requirements and needs.
- The multifunctional team shall verify during production that the developer has met the user’s reliability requirements and needs prior to fielding.
- The multifunctional team shall monitor and assess the reliability of the system/product in the field.
ANSI/GEIA-STD-0009 provides detailed advice on what information should be mandated for inclusion in several documents provided by the contractor, including a reliability program plan (RPP), in order to satisfy the above four objectives. To satisfy Objective 1 to understand customer/user requirements and constraints, the RPP shall (Information Technology Association of America, 2008, p. 15):
- Define all resources (e.g., personnel, funding, tools, and facilities) required to fully implement the reliability program.
- Include a coordinated schedule for conducting all reliability activities throughout the system/product life cycle.
- Include detailed descriptions of all reliability activities, functions, documentation, processes, and strategies required to ensure system/product reliability maturation and management throughout the system/product life cycle.
- Document the procedures for verifying that planned activities are implemented and for both reviewing and comparing their status and outcomes.
- Manage potential reliability risks due, for example, to new technologies or testing approaches.
- Ensure that reliability allocations, monitoring provisions, and inputs that impact reliability (e.g., user and environmental loads) flow down to subcontractors and suppliers.
- Include contingency-planning criteria and decision making for altering plans and intensifying reliability improvement efforts.
- Include, at minimum, the normative activities identified throughout this standard.
- Include, when applicable, additional customer-specified normative activities.
Furthermore, the standard says that the RPP “shall address the implementation of all the normative activities identified in Objectives 1-4.” This standard requires that the RFP call for the inclusion in acquisition proposals of a description of system or product reliability model and requirements, which means the description of the methods and tools used, the extent to which detailed stress and damages models will be employed, how and when models and requirements will be updated in response to design evolution and the discovery of failure modes, and how the model and requirements will be used to prioritize design elements. This standard also requires that proposals include a description of the engineering process, which includes how reliability improvements will be incorporated in the design, how it will be ensured that design rules that impact reliability will be adhered to, how reliability-critical items will be identified, managed, and controlled, and how the reliability impact of design changes will be monitored and evaluated. In addition, the standard calls for proposals to include the assessment of life-cycle loads, the impact of those loads on subsystems and components, the identification of failure modes and mechanisms, the description of a closed-loop failure-mode mitigation process, how and when reliability assessments will be performed, plan design, production, and field reliability verification, failure definitions and scoring, technical reviews, and outputs and documentation.
With reference to the last point, we note that life-cycle loads may be difficult to predict. For example, a truck that is designed to be reliable in cross-country maneuvers may be less reliable on sustained highway travel. In general, a system’s actual life cycle may include new missions for the system to carry out. For some systems, it might be appropriate to conclude from testing that it is reliable for some scenarios and not others. Such a statement would be similar to statements that the system is effective in certain operational situations but not others. It may be that system reliability for all possible missions is too expensive; perhaps there should be different reliability requirements for different missions.
This standard provides greater detail on the satisfaction of Objective 2: design and redesign for reliability. The goal is to ensure the use of well-defined reliability engineering processes to develop, design, manufacture, and sustain the system/product so that it meets the user’s reliability requirements and needs. This includes the initial conceptual reliability model of the system, quantitative reliability requirements for the system, initial reliability assessment, user and environmental life-cycle loads, failure definitions and scoring criteria, the reliability program plan, and the reliability requirements verification strategy. Furthermore, this also includes updates to the RPP, refinements to the reliability model, including reliability allocations to subsystems and components, refined user and environmental loads, initial estimates of loads for subsystems and components, engineering analysis and
test data identifying the system failure modes that will result from life-cycle loads, data verifying the mitigation of these failure modes, updates of the reliability requirements verification strategy, and updates to the reliability assessment.
The standard also says that the developer should develop a model that relates component-level reliabilities to system-level reliabilities. In addition, the identification of failure modes and mechanisms shall start as soon as the development begins. Failures that occur in either test or the field are to be analyzed until the root-cause failure mechanism has been identified. In addition, the developer must make use of a closed-loop failure mitigation process. The developer (p. 26)
… shall employ a mechanism for monitoring and communicating throughout the organization (1) descriptions of test and field failures, (2) analyses of failure mode and root-cause failure mechanism, and (3) the status of design and/or process corrective actions and risk-mitigation decisions. This mechanism shall be accessible by the customer…. [The developer] shall assess the reliability of the system/product periodically throughout the life cycle. Reliability estimates from analysis, modeling and simulation, and test shall be tracked as a function of time and compared against customer reliability requirements. The implementation of corrective actions shall be verified and their effectiveness tracked. Formal reliability growth methodology shall be used where applicable … in order to plan, track, and project reliability improvement…. [The developer] shall plan and conduct activities to ensure that the design reliability requirements are met…. For complex systems/products, this strategy shall include reliability values to be achieved at various points during development. The verification shall be based on analysis, modeling and simulation, testing, or a mixture…. Testing shall be operationally realistic.
For Objective 4, to monitor and assess user reliability, the ANSI/GEIA-STD-0009 directs that RFPs mandate that proposals includes methods for which field performance can be used as feedback loops for system reliability improvement.
DTM 11-003: Improving Reliability Analysis, Planning, Tracking, and Reporting
As mentioned above, the deficiency in the reliability of fielded systems may at least be partially due to proposals that gave insufficient attention to plans for achieving reliability requirements, both initially and through testing. This issue is also addressed in DTM 11-003 (U.S. Department of Defense, 2011b, pp. 1-2), which “amplifies procedures in Reference (b) [DoD Instruction 5000.02] and is designed to improve reliability analysis,
planning, tracking, and reporting.” It “institutionalizes reliability planning methods and reporting requirements timed to key acquisition activities to monitor reliability growth.”
DTM 11-003 stipulates that six procedures take place (pp. 3-4):
- [Program managers] (PMs) must] formulate a comprehensive reliability and maintainability (R&M) program using an appropriate reliability growth strategy to improve R&M performance until R&M requirements are satisfied. The program will consist of engineering activities including: R&M allocations, block diagrams and predictions; failure definitions and scoring criteria; failure mode, effects and criticality analysis; maintainability and built-in test demonstrations; reliability growth testing at the system and subsystem level; and a failure reporting and corrective action system maintained through design, development, production, and sustainment. The R&M program is an integral part of the systems engineering process.
- The lead DoD Component and the PM, or equivalent, shall prepare a preliminary Reliability, Availability, Maintainability, and Cost Rationale Report in accordance with Reference (c) [DOD Reliability, Availability, Maintainability, and Cost Rationale Report Manual, 2009] in support of the Milestone (MS) A decision. This report provides a quantitative basis for reliability requirements and improves cost estimates and program planning.
- The Technology Development Strategy preceding MS A and the Acquisition Strategy preceding MS B and C shall specify how the sustainment characteristics of the materiel solution resulting from the analysis of alternatives and the Capability Development Document sustainment key performance parameter thresholds have been translated into R&D design requirements and contract specifications. The strategies shall also include the tasks and processes to be stated in the request for proposal that the contractor will be required to employ to demonstrate the achievement of reliability design requirements. The Test and Evaluation Strategy and the Test and Evaluation Master Plan (TEMP) shall specify how reliability will be tested and evaluated during the associated acquisition phase.
- Reliability Growth Curves (RGC) shall reflect the reliability growth strategy and be employed to plan, illustrate and report reliability growth. A RGC shall be included in the SEP [systems engineering plan] at MS A [Milestone A], and updated in the TEMP [test and engineering master plan] beginning at MS B. RGC will be stated in a series of intermediate goals and tracked through fully integrated,
- system-level test and evaluation events until the reliability threshold is achieved. If a single curve is not adequate to describe overall system reliability, curves will be provided for critical subsystems with rationale for their selection.
- PMs and operational test agencies shall assess the reliability growth required for the system to achieve its reliability threshold during initial operational test and evaluation and report the results of that assessment to the Milestone Decision Authority at MS C.
- Reliability growth shall be monitored and reported throughout the acquisition process. PMs shall report the status of reliability objectives and/or thresholds as part of the formal design review process, during Program Support Reviews, and during systems engineering technical reviews. RGC shall be employed to report reliability growth status at Defense Acquisition Executive System reviews.
The 2011 DOT&E Annual Report (U.S. Department of Defense, 2011a, p. iv) points out that some changes in system reliability are becoming evident:
Sixty-five percent of FY10 TEMPs documented a reliability strategy (35 percent of those included a [reliability] growth curve), while only 20 percent of FY09 TEMPs had a documented reliability strategy. Further, three TEMPS were disapproved, citing the need for additional reliability documentation, and four other TEMPS were approved with a caveat that the next revision must include more information on the program’s reliability growth strategy.
Both ANSI/GEIA-STD-0009 and DTM 11-003 serve an important purpose to help produce defense systems that (1) have more reasonable reliability requirements and (2) are more likely to meet these requirements in design and development. However, given their intended purpose, these are relatively general documents that do not provide specifics as to how some of the demands are to be met. For instance, ANSI/GEIA-STD-0009 (Information Technology Association of America, 2008, p. 2) “does not specify the details concerning how to engineer a system / product for high reliability. Nor does it mandate the methods or tools a developer would use to implement the process requirements.”
The tailoring to be done will be dependent upon a “customer’s funding profile, developer’s internal policies and procedures and negotiations between the customer and developer” (p. 2). Proposals are to include a reliability program plan, a conceptual reliability model, an initial reliability
flow-down of requirements, an initial system reliability assessment, candidate reliability trade studies, and a reliability requirements verification strategy. But there is no indication of how these activities should be carried out. How should one produce the initial reliability assessment for a system that only exists in diagrams? What does an effective design for reliability plan include? How should someone track reliability over time in development when few developmental and operationally relevant test events have taken place? How can one determine whether a test plan is adequate to take a system with a given initial reliability and improve that system’s reliability through test-analyze-and-fix to the required level? How does one know when a prototype for a system is ready for operational testing?
Although the TechAmerica Engineering Bulletin Reliability Program Handbook, TA-HB-0009,8 has been produced with the goal at least in part to answer these questions, a primary goal of this report is to assist in the provision of additional specificity as to how some of these steps should be carried out.
Adolph, P., DiPetto, C.S., and Seglie, E.T. (2008). Defense Science Board task force developmental test and evaluation study results. ITEA Journal, 29, 215-221.
Forbes, J.A., Long, A., Lee, D.A., Essmann, W.J., and Cross, L.C. (2008). Developing a Reliability Investment Model: Phase II—Basic, Intermediate, and Production and Support Cost Models. LMI Government Consulting. LMI Report # HPT80T1. Available: http://www.dote.osd.mil/pub/reports/HPT80T1_Dev_a_Reliability_Investment_Model.pdf [August 2014].
Information Technology Association of America. (2008). ANSI/GEIA-STD-0009. Available: http://www.techstreet.com/products/1574525 [October 2014].
U.S. Department of the Army. (1996). Pamphlet 73-2, Test and Evaluation Master Plan Procedures and Guidelines. Available: http://acqnotes.com/Attachments/Army%20TEMP%20Procedures%20and%20Guidlines.pdf [October 2014].
U.S. Department of Defense. (2007a). FY 2007 Annual Report. Office of the Director of Operational Training and Development. Available: http://www.dote.osd.mil/pub/reports/FY2007/pdf/other/2007DOTEAnnualReport.pdf [January 2014].
U.S. Department of Defense. (2007b). Memorandum, Reliability of U.S. Army Materiel Systems. Acquisition Logistics and Technology, Assistant Secretary of the Army, Department of the Army. Available: https://dap.dau.mil/policy/Documents/Policy/Signed%20Reliability%20Memo.pdf [January 2014].
U.S. Department of Defense. (2008a). Report of the Defense Science Board Task Force on Developmental Test and Evaluation. Office of the Under Secretary of Defense for Acquisitions, Technology, and Logistics. Available: https://acc.dau.mil/CommunityBrowser.aspx?id=217840 [January 2014].
U.S. Department of Defense. (2008b). Report of the Reliability Improvement Working Group. Office of the Under Secretary of Defense for Acquisition, Technology, and Logistics. Available: http://www.acq.osd.mil/se/docs/RIWG-Report-VOL-I.pdf [January 2014].
U.S. Department of Defense. (2008c). Memorandum, Implementing a Life Cycle Management Framework. Office of the Undersecretary for Acquisition, Technology, and Logistics. Available: http://www.acq.osd.mil/log/mr/library/USD-ATL_LCM_framework_memo_31Jul08.pdf [January 2014].
U.S. Department of Defense. (2008d). FY 2008 Annual Report. Office of the Director of Operational Training and Development. Available: http://www.dote.osd.mil/pub/reports/FY2008/pdf/other/2008DOTEAnnualReport.pdf [January 2014].
U.S. Department of Defense. (2008e). Instruction, Operation of the Defense Acquisition System. Office of the Undersecretary for Acquisition, Technology, and Logistics. Available: http://www.acq.osd.mil/dpap/pdi/uid/attachments/DoDI5000-02-20081202.pdf [January 2014].
U.S. Department of Defense. (2009a). Implementation of Weapon Systems Acquisition Reform Act (WSARA) of 2009 (Public Law 111-23, May 22, 2009) October 22, 2009; Mona Lush, Special Assistant, Acquisition Initiatives, Acquisition Resources & Analysis Office of the Under Secretary of Defense for Acquisition, Technology, and Logistics.
U.S. Department of Defense. (2009b). DoD Reliability, Availability, Maintainability-Cost (RAM-C) Report Manual. Available: http://www.acq.osd.mil/se/docs/DoD-RAM-CManual.pdf [August 2014].
U.S. Department of Defense. (2010). Memorandum, State of Reliability. Office of the Secretary of Defense. Available: http://web.amsaa.army.mil/Documents/OSD%20Memo%20-%20State%20of%20Reliability%20-%206-30-10.pdf [January 2014].
U.S. Department of Defense. (2011a). DOT&E FY 2011 Annual Report. Office of the Director of Operational Test and Evaluation. Available: http://www.dote.osd.mil/pub/reports/FY2011/ [January 2014].
U.S. Department of Defense. (2011b). Memorandum, Directive-Type Memorandum (DTM) 11-003—Reliability Analysis, Planning, Tracking, and Reporting. The Under Secretary of Defense, Acquisition, Technology , and Logistics. Available: http://bbp.dau.mil/doc/USD-ATL%20Memo%2021Mar11%20DTM%2011-003%20-%20Reliability.pdf [January 2014].
U.S. Department of Defense. (2013). DOT&E FY 2013 Annual Report. Office of the Director of Operational Test and Evaluation. Available: http://www.dote.osd.mil/pub/reports/FY2013/ [January 2014].
This page intentionally left blank.