Anda di halaman 1dari 19

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.

com

Effective Maintenance Program Development/Optimization

Sammy Seifeddine
HSB Reliability Technologies Senior Project Manager 800 Rockmead Drive Three Kingwood Place, Suite 180 Kingwood, TX 77339 (281) 358-1477 ext. 276 (281) 358-1871 fax sseifeddine@hsbrt.com

12th International Process Plant Reliability Conference October 22-23, 2003 Houston, Texas

Page

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

Effective Maintenance Program Development/Optimization


Abstract This paper describes a proven process for developing, optimizing, and managing effective maintenance programs for new and in-service assets based on risk and costbenefit principles. The process calls for utilizing operational and maintenance experience as long as the experience is documented for the proper class of assets in the form of standard tasks. In absence of standard tasks, a more comprehensive analysis is performed using Reliability-Centered Maintenance (RCM2) or Failure Modes Effects Analysis (FMEA) to develop an optimum program. Asset performance data is used to continually adjust the maintenance program to meet user objectives. 1.0 Introduction

A maintenance program is effective when it targets critical production equipment and puts emphasis on minimizing risk, which will lead to improved reliability, availability and resource utilization. This paper focuses on a process for developing effective asset (or optimizing existing) maintenance programs. The process is a component of overall assets Life Cycle Management (LCM). 2.0 Maintenance Program Development/Optimization

This process consists of the following steps (refer to Figure 1): 1. 2. 3. 4. 5. 6. 7. Identifying business objectives. Development of plant/asset technical model. Condition assessment of installed assets. Criticality and risk assessment. Maintenance program development/review. Loading of maintenance tasks to the CMMS system. Maintenance spares strategy (not covered in this document.)

These steps are considered in more detail in the following sections. 3.0 Business Objective

Business objectives are set at the corporate and plant levels. They reflect market conditions, shareholders expectations, and regulatory compliance. Objectives at this level

Page

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

include production levels, products qualities, safe operation policies and requirements, environmental integrity requirements, and operating cost targets. Objectives are then translated to major assets specific performance expectations. Measures at this level might include availability, asset utilization, efficiency, specific products qualities, Overall Equipment Effectiveness (OEE), cost per unit produced, etc. Target values are set by plant operating departments and approved by plant and corporate management. Major assets or systems performance expectations are further refined to the individual equipment level. Here target vales for measures, such as Mean Time Between Failure (MTBF), Mean Time To Repair (MTTR), availability, etc., are set and approved. This process is repeated periodically, and the objectives are changed to reflect the companys position regarding the main business drivers. Figure 2 identifies the steps involved in developing asset performance expectations. Business objectives and performance expectations set the stage for defining equipment performance standards for high risk equipment in which RCM2 is the utilized method for developing/optimizing the maintenance programs. 4.0 Plant Technical Model

The plant technical model (also known as asset hierarchy) is composed of a hierarchy of systems and sub-systems that gradually represent increased levels of detail in describing the asset. The model reflects how systems and sub-systems fit together, interrelate and operate to provide the intended business function. As such, the hierarchy reflects both the structural and process flow characteristics of the plant/asset. The model starts with the process flow diagram representing the overall operation of a plant. This level consists of the major plant production units, utility systems (such as electricity, water, steam, air, fuel, etc.), feed and raw material preparation facilities, final product storage, plant control systems and local area network(s), infrastructures, etc. The next level breaks down each unit into systems and sub-systems as depicted on unit process flow diagram and P&IDs. Examples at this level include systems such as feed filtration, feed pressurization, feed heating, atmospheric fractionation, etc. At progressively lower levels of the model, the breakdown of the plant becomes more detailed. At the end, the plant is reduced to a set of systems and sub-subsystems and the equipment items that support each one of the systems or sub-sub-systems. Control and protective systems are incorporated in the hierarchy at the appropriate levels. In the case where a control or protective system is dedicated to one system or sub-system then it should be setup as a sub-element of that system. In the case that a

Page

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

control/protective system is controlling/protecting multiple systems, it should be setup as an element at the same level in the hierarchy. Every hierarchy element - whether it is a system, sub-system or an equipment item - has a clearly defined boundary. Boundary definitions are standardized for classes of system/equipment items. The steps involved in developing a plant technical model are as follows (see Figure 3): 1. Collect technical information and drawings (PFDs, P&IDs, line diagrams, datasheets, O&M manuals, etc.) 2. Establish a standard for defining systems boundaries. See references 4 and 6 for details. 3. Develop plant technical hierarchy. 4. Define systems functions (optional). 5. Load hierarchy into the plant maintenance information system (CMMS). 5.0 Criticality and Risk Assessment

Criticality and risk assessment is a qualitative analysis of assets failure events and the ranking of those events according to their impact on the business goals of the company. The process consists of the following main activities (see Figure 4): 1. Establish criticality assessment criteria. 2. Define for each assessment criteria the failure consequences and their scores. 3. Collect equipment condition assessment records or generic failure frequencies. 4. Determine failure frequencies and their ratings. 5. Define criticality ranking scores. 6. Define criticality ranking rules. 7. Select systems and/or equipment for assessment. 8. Perform the analysis. 9. Rank systems/equipment by criticality. 10. Rank systems/equipment by risk. These steps are considered in more detail in the following sections. 5.1 Assessment Criteria

The first step in the analysis is to use the organizational business objectives to define the criticality assessment criteria. The following are some suggested criticality assessment criteria. Health and Safety. Environmental Integrity. Throughput.

Page

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

Customer Service. Operating Cost. Each criterion is given a maximum score to reflect the consequences and relative importance. In Table 1, the safety criterion is given a maximum score of twenty (20) while operating cost criterion is given a maximum score of ten (10). 5.2 Failure Consequences

Failure consequences within each criterion are defined and given an evaluation score. Table 2 provides examples of safety, throughput/downtime, product quality, maintenance and operating cost criteria and their associated consequences of failure and their scores. 5.3 Failure Frequencies

Failure frequencies are defined based on systems and equipment performance. When defining failure frequencies, consideration is given to aspects such as: Operational failure history (where available). Generic reliability data. Equipment redundancy. Mode of equipment operation. Equipment stress variations, etc.

The frequency of failure score is used in the calculation of relative risk to determine how likely the failure of the assessed system or equipment item will impact an organizations business. Table 3 shows a sample of frequency scores. 5.4 Criticality Ranks and Rules

The criticality rank number of a system or equipment is a function of the systems or equipments impact on the business when the system or equipment fails, regardless of how often the failure occurs. For example, a set of criticality ranking numbers might range from 1 to 10. Criticality rank number 10 represents the highest rank while number 1 represents the lowest. Criticality ranking rules are defined to assist in assigning criticality ranks to systems or equipment during the analysis. The rules are established by considering the combined consequence scores for all assessment criteria. For example, a rule can be defined as Assign criticality of 10 to a system/equipment, if any of safety or environmental consequence scores are greater than 18, or any of throughput, product quality or maintenance and operating cost consequence scores are equal to 10, and so forth.

Page

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

The equipment criticality rank numbers, number range, and the rules for assigning the numbers to systems or equipment under assessment are defined before conducting the analysis. Criticality rank numbers are assigned to systems and/or equipment based on the rules developed. This is accomplished by comparing the equipments criteria consequence scores to the criticality rank numbers rules. If the equipment matches the rules, the equipment is assigned that criticality rank number. The equipment is always assigned the highest criticality rank number it matches. 5.5 Criticality and Risk Assessment

The assessment starts by analyzing the selected system and/or equipment failure consequences. The most serious failure consequence in each defined consequence criterion is identified and its score recorded. System and equipment failure consequences are analyzed in terms of the resultant effects on the asset as a whole and consider the impact of the failure on safety of personnel and on the asset commercial performance. The later requires consideration of both direct and indirect failure costs. The analysis is conducted by answering a series of questions about each system or equipment item. These questions assess both the consequence of system or equipment failure and the frequency/probability of failure with respect to the assessment criteria. The criticality number and relative risk are calculated during the assessment from responses to the questions. Questions are formulated in the following form: If the system/equipment fails, could it result in a safety consequence? If yes, how serious should the potential consequence be rated? 5.6 5.6.1 Results of Criticality and Risk Assessment Outcome of the Assessment

Criticality and risk assessment produces the results: 1. 2. 3. 4. Systems/equipment criticality ranks. Relative risk. Total consequence scores. Individual system/equipment scores.

Page

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

5.6.2

Relative Risk

The probability of failure is used in combination with the total failure consequence of a system/equipment to determine the RR value of the system/equipment. CARA uses the concept of the relative risk (RR) to identify system/equipment that has the greatest potential impact on the business goals of the company. The RR of a system or equipment is the product of its Total Consequence Score (TC) and the Frequency/Probability (F/P) Number. It is called relative risk because it only has meaning relative to the other equipment evaluated by the same method. The Total Consequence (TC) is the sum of all the scores assigned to each of the criteria including: Safety (S), Environmental (E), Quality (Q), Throughput (T), Customer Service (CS) and Operating Cost (OC). TC = S + E + Q + T + CS + OC RR = TC * F/P 6.0 Maintenance Tasks Development/Optimization (MTD/O)

The MTD/O process described in this paper establishes a structured framework for developing or assessing maintenance programs for in-service or newly commissioned assets. The process emphasizes the use of operation and maintenance experience documented in a form of standard maintenance tasks (SMT). 6.1 Maintenance Tasks Development/Optimization (MTD/O) Overview

The flowchart in Figure 5 describes the steps involved in carrying out the MTD/O process. The steps involved in the development/optimization of maintenance tasks are as follows: 1. A system is identified for review by selecting an element from the plant technical hierarchy. As described earlier, the selected system boundary should be clearly defined. The selected system includes all lower level elements. 2. A risk analysis is performed per section 4 of this paper. If an analysis was conducted in the past, review of failure frequencies in lieu of the current system/equipment items condition is conducted and the frequency scores changed as necessary. The system/equipment items selected are then ranked by their risk ranking. 3. In the case that the system under review belongs to an equipment class group that has a Standard Maintenance Task (SMT) documented, it is only necessary to verify for low risk systems/equipment that any specific company, standards, and regulation

Page

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

requirements are applicable and simple service activities are adequate and cost efficient. For high and medium risk systems/equipment, verification of all SMT elements is required. 4. When an applicable SMT is not available, a more detailed analysis is required for high and medium risk systems/equipment. For high risk items, a complete RCM2 analysis is recommended, while for medium risk items, RCM2 (FMEA) is sufficient to develop/optimize the maintenance program. The outcome of RCM2 or RCM2 FMEA is a set of proposed tasks, their frequencies, and the crafts and skill levels of individuals performing the work, or recommended actions in case suitable routine tasks cannot be found. 5. For low risk items not governed by any company, standard or governmental requirements a run-to-failure strategy is adapted. When requirements exist, routine tasks are developed and incorporated into work packages. 6. From the output of RCM2 or RCM2 (FMEA), detailed routine task descriptions are developed and then incorporated into work packages. 7. SMTs are developed to reduce tasks development time, efforts, and to ensure consistency when dealing with equipment from the same equipment group. Developed SMTs are kept in a library for future reference. Routine updates are made to SMTs to reflect current condition of equipment, gained maintenance and operating experience, and any new changes/modification to systems and equipment. 8. The final step in the analysis is to upload the developed work packages into Plant Reliability Information Management Systems (PRIMS). PRIMS include maintenance systems such as MAXIMO, SAP Plant Maintenance, Document Management Systems, Inspection Systems, etc. 9. Monitoring developed/optimized maintenance programs is essential to ensure their effectiveness in meeting the objective set by the organization. An established method for recording failure modes, failure effects, and failure causes as well as the corrective actions taken to eliminate/reduce the failure effects is critical to the successful implementation of any maintenance program. 6.2 Standard Maintenance Task (SMT)

An SMT is a set of maintenance activities, which demonstrate a technically feasible and cost-effective maintenance strategy for a defined equipment group. An equipment group is a set of equipment of the same class that functions in an identical operating context. An equipment group has similar design, failure modes and frequencies. Establishing a library of SMTs ensures consistent documentation of maintenance strategies, reduces the efforts for developing maintenance programs for new systems,

Page

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

ensures the application of uniform, consistent and cost-effective maintenance activities, and facilitates analysis of equipment groups. It is recommended to include the following information when documenting a standard maintenance task: 1. 2. 3. 4. 5. 6. 7. 8. 9. Applicable company requirements. Applicable governing standards. Governmental requirements/regulations. Completed RCM2 analysis. Description of equipment boundary and proper reference to drawings/isometrics. Description of operating context (operational and environmental.) Assumptions/requirements for/from risk assessment. Dominating failure modes with approximate probability. The selected maintenance activities to reduce the probability of identified failure mechanisms to cause failure along with the proper intervals (time-based or performance/condition-based). 10. All equipment monitored parameters (RCM2) with their sensitivity to faults/failures. 11. Established performance indicators. 12. Experience from using a known maintenance strategy along with periodic monitoring of established performance indicators. 13. For non evident failure modes, the tests/inspections required to determine equipment expected availability. 14. Required experience and competency of maintenance personnel. 15. Estimated person-hours for maintenance activities. 16. Estimated repair time. 17. Essential spare parts, tools, equipment, and lead times. The extent of documentation depends on the complexity and the risk assigned to the assets under review. For low risk assets, it is only required to document items one to three above and an assessment if simple service activities are adequate and cost effective. For high and medium risk assets, it is recommended that the SMT documents all of the listed items. 6.3 Condition Monitoring

The MTD/O review will determine that the best maintenance strategy is to perform on condition maintenance. Equipment condition is determined by monitoring operational and non-operational parameters sensitive to failure modes. Since not all parameters are effective in detecting failure modes, a formal analysis is needed to select the right corroborative set of parameters. The analysis must identify the failure sensitive parameters and their monitoring practicality.

Page

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

After establishing the technical feasibility of condition monitoring, the economic viability must be considered. The costs associated with the operation and on-going support of the condition-monitoring program must be considered against the potential cost savings and cost of alternative maintenance strategies. 6.4 Monitoring Maintenance Program Effectiveness

Monitoring the effectiveness of the developed maintenance programs is accomplished by tracking and trending a set of key performance indicators. The indicators were established during the assets condition assessment phase. Progress reports are produced periodically. Modifications to maintenance tasks are made when necessary. 7.0 Application

This process was introduced and implemented at several plants in North America. Assets condition assessment studies were conducted and baselines established for each facility. The studies helped in developing the frequency score tables and provided points of reference for future analysis to assess the effectiveness of the devised maintenance programs. Areas of assessment included the following: Mean time between failures. Downtime due to unscheduled maintenance. Downtime for scheduled maintenance. Asset downtime due to failures of utilities, upstream, and downstream production assets. Slowdowns due to equipment failures. Slowdowns due to utilities, upstream and downstream failures. Quality problems due to equipment failures. Maintenance cost. Increased operating cost due to equipment failures. Safety incidents due to equipment failures. Environmental releases and damages due to equipment failures. Spares consumptions. Survey of existing PM and PdM tasks.

Operational downtimes and slowdowns data were collected but not used for this analysis. The impact of adapting this process on assets performance and maintenance organizations are summarized in Table 4.

Page

10

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

Start Start

Develop Develop Plant/Asset Plant/Asset Technical Model Technical Model

Perform Perform Criticality & Risk Criticality & Risk Assessment Assessment Existing New/Existing Plant/Asset? New Develop / Develop / Optimize Optimize MP MP Assess Plant/Asset Assess Plant/Asset Condition Condition

Develop / Develop / Optimize Optimize Spares Strategy Spares Strategy

Modify/Load MP To PRIMS

Monitor Monitor MP MP Effectiveness Effectiveness

End End

MP: Maintenance Program PRIMS: Plant Reliability Information Management Systems

Figure 1: Maintenance Program Development Process.

Page

11

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

Start

Corporate Objectives Corporate Objectives

Plant Objectives Plant Objectives

Major Assets/Systems Major Assets/Systems Performance Expectations Performance Expectations

Equipment Item Equipment Item Performance Expectations Performance Expectations

End

Figure 2: Setting Performance Expectations.

Page

12

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

Start Start

Collect Collect Plant/Asset Plant/Asset Technical Data Technical Data

Establish Establish Boundary Definition Boundary Definition Standards Standards

Develop Develop Plant/Asset Plant/Asset Technical Model Technical Model

Describe Systems Describe Systems Functions Functions

Load Plant/Asset Model & Equipment To PRIMS

End End

PRIMS: Plant Reliability Information Management Systems

Figure 3: Plant Technical Model Development.

Page

13

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

Start Start

Business Objectives

Establish Establish Assessment Assessment Criteria Criteria

Select System Select System


Cycle through systems/equipment list

Develop Develop Plant/Asset Plant/Asset Technical Model Technical Model

Generic Failure Generic Failure Data Data

Define Failure Define Failure Consequences & Consequences & Their Ratings Their Ratings

Perform the Analysis Perform the Analysis

Assess Plant/Asset Assess Plant/Asset Condition Condition

Determine Determine Failure Frequencies Failure Frequencies & Their Ratings & Their Ratings

Assign Criticality & Risk Ranks To System(s) / Equipment

Define Criticality Define Criticality Ranking Table Ranking Table

More Systems/ Equipment?


No

Yes

Define Criticality Define Criticality Assignment Rules Assignment Rules

End End

Figure 4: Criticality and Risk Assessment.

Page

14

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

Start Start

Select Select A System/Equipment A System/Equipment

End End
Low

Risk?
Medium High

Perform Perform Risk Risk Assessment Assessment

Planned Planned Corrective Repair Corrective Repair (Run-to-Failure) (Run-to-Failure)

No

Regulatory Requirements?
Yes

Perform Perform RCM2 (FMEA) RCM2 (FMEA)

Perform Perform RCM 2 RCM 2 Analysis Analysis

Standard Maintenance Task Exist?


Yes

No

Establish SMT Establish SMT

Yes

Relevant as SMT?
No

Routine Activities, Frequencies, Required Resources

Select Proper SMT Select From Proper LibrarySMT From Library

Add SMT to Library

Write Detailed Write Detailed Work Instructions Work Instructions

Verify Verify Work Packages Work Packages

Load Work Packages To PRIMS

Determine Determine Work Packages Work Packages

Yes

More Systems/ Equipment?


No

End End

Figure 5: Maintenance Tasks Development/Optimization.

Page

15

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

Criterion Health and Safety Environmental Integrity Production Throughput Operating Cost

Score 20 20 10 10

Table 1: Assessment Criteria Scores.

Score
20 18 14 6 0 10 9 8 7 6 4 2 0 10 5 0 10 8 6 4 2 1

Consequence
Safety Fatalities. Disabling injury. Serious injury. Minor or first aid injury such. No injury. Throughput/Downtime Production downtime equal or greater than 7 days Production downtime from 3 to 7 days. Production downtime from 1 to 3 days. One day production down time. Production throughput at 25% of capacity. Production throughput at 50% of capacity. Production throughput at 75% of capacity. No impact on throughput. Product Quality Unacceptable quality resulting in TOTAL product loss. Unacceptable quality resulting in TOTAL product rework. No effect on product quality. Maintenance and Operating Cost Incurred cost <$400K. Incurred cost >$100K and <$400K. Incurred cost >$50K and <$100K. Incurred cost >$10K and <$50K. Incurred cost >$1K and <$10K. Incurred cost <$1K.

Table 2: Safety Criterion Consequence Table.

Page

16

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

Failure Frequency Failures occur daily Failures occur weekly Failures occur monthly Failures occur between one month and one year intervals Failures occur yearly Failures occur between 1 and 5 years Failures occur between 5 and 10 years Failures occur less frequently than once in 10 years Table 3: Failure Frequency Scores.

Score 10 9 8 7 6 5 4 1

Availability (%)1

Downtime (%)2

RAV3

Product Quality Rejects (%)4 6 6 4 2 4.5 4.2 2.4 1.3

Before

Plant 1 Plant 2 Plant 3 Plant 4

88 89 92 93 92 91.5 94.5 94.5

8 7 5 4 4 4.5 2.5 2.5

4.1 3.5 3.1 2.5 3.25 2.85 2.4 2.1

After

Plant 1 Plant 2 Plant 3 Plant 4

1) 2) 3) 4)

Availability ([operating time - all downtimes including slowdowns]*100/operating time). Planned and unplanned downtime for maintenance (excluding TA). Percent of maintenance cost to asset replacement value. Percent reject due to equipment failure (includes startup and shutdown of spec products).

Table 4: Implementation Results.

Page

17

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

Appendix A: Definitions Asset: May refer to a plant, system, or a piece of equipment. Failure Mechanism: Physical, chemical, or other processes which lead or have led to failure. Maintenance Program: A comprehensive set of maintenance activities, their intervals, and required recourses along with the performed maintenance analysis documentation. Maintenance Strategy: The means by which equipment are maintained. The maintenance strategy can be of four main types: Run-to-failure, preventive, predictive (on condition maintenance), or, redesign (the equipment). Standard Maintenance Task (SMT): A set of cost-effective maintenance actions for an equipment class group. Equipment Group: A set of equipment of the same class that functions in an identical operating context.

Page

18

Downloaded from Reliabilityweb.com on the web at http://www.reliabilityweb.com

Appendix B: References 1. AIChE/CCPS, Guidelines for Process Equipment Reliability Data. Center for Chemical Process Safety, American Institute of Chemical Engineers, New York, 1989. 2. Blanchard, Benjamin S., Logistics Engineering and Management, Prentice Hall, Inc., 1998. 3. EXP Training Documentation, IVARA Corporation, 2002. 4. Moubray, John, Reliability-Centered Maintenance (RCM II), 2nd Edition, Industrial Press, 1997. 5. ISO 14224, Petroleum and Natural Gas Industries Collection and Exchange of Refinery and Maintenance Data for Equipment, International Standards Organization, First Edition, 1999. 6. Norsok Standard, Criticality Analysis for Maintenance Purposes, Z-008, Rev. 2, November 2001. 7. OREDA-97, Offshore Reliability Data, Det Norske Veritas, P.O.Box 300, N-1322 Hovik, Norway, 3 Edition, 1997. 8. Seifeddine, Sammy, Criticality and Risk Assessment, HSB Reliability Technologies, Project Document, 2000.

Page

19