[Skip to content]

TWI
Search our Site
.

Probabilistic fracture mechanics - towards a practical approach

TWI Bulletin, January 1986

by Steve Williams and Tarsam Jutla

Steve Williams, BA, PhD, was in the Institute's Engineering Department and is now Project Engineer - Pipelines with Shell UK Exploration and Production. Tarsam Jutla, BSc, PhD, is a Senior Research Engineer in the Fracture Department. The work reported here was carried out while Dr Williams was with The Welding Institute, and does not necessarily reflect the views of his present employer.

This is the first in a series of articles on practical aspects of probabilistic fracture mechanics. The series is intended to deal with problems of scatter in test data, with confidence in making a safe assessment of known cracks and with confidence in making safe assessments of structures containing unknown cracks. This article considers aspects of scatter in fracture toughness data and provides guidance on: i) how much testing is necessary; and ii) on which toughness value to use in a fracture mechanics assessment. In a further article the methods described will be used to estimate safety in assessments by examining distributions of safety factors between allowable crack sizes and critical crack sizes. Subsequent articles will consider critical design equations and simulation techniques for accurate reliability assessments.


Structures and components can fail in many different ways and the onus is on the designer to anticipate the possible modes of failure and to take precautions to avoid their occurrence. One design philosophy is to use safety factors. A design equation is used to predict a limit state condition for a particular failure mode, e.g. yielding, then the maximum design condition is set at some factor of the limit state. For example, for pressure vessels maximum design stress may be o.72 of yield strength. This implies a factor of safety of approximately 1.4 between the most severe anticipated applied load condition and the elastic limit of the material. Such factors of safety are considered to result in a 'safe' design, i.e. one for which the risk of failure is acceptably small. The risk is not quantified but is frequently believed to be almost zero. This would be true if the limit state and maximum applied conditions were single values. However, there is always uncertainty in the strength of a material and in the severity of the worst service or overload conditions which could occur, i.e. these values are distributed about the expected value (Fig.1). As the uncertainty (represented by the width of the distribution) increases, the risk of the limit state being exceeded also increases. If interaction (overlap) between the distribution of applied loads and limit condition is significant the chosen safety factor may not actually imply adequate safety. Experience gained in designing structures has led to suitable safety being found for particular service conditions which results in acceptably low risk of failure in similar structures.

As a consequence many present design procedures and codes of practice embody safety factors which are based purely on this subjective judgement.

For some failure modes, such as fracture, uncertainty in the material properties controlling the fracture process, e.g. the fracture toughness and maximum crack driving force (a combination of stress, crack size and geometry) may be very large. The range of possible values may cover one or two orders of magnitude[1] and even very large safety factors may still be associated with significant risks of failure (Fig.1b). In such circumstances alternative methods are necessary to ensure adequate safe design without excessive conservatism.

Fig.1. Use of a safety factor for properties with: a) Little scatter;

Fig.1. Use of a safety factor for properties with:

a) Little scatter;

b) Significant scatter

b) Significant scatter

This article examines the problems associated with scatter in fracture toughness data and outlines a non-parametric statistical method of selecting a value from an enlarged data set such that when the selected value is used in a crack assessment it results in a required level of safety. The method is also discussed in terms of quality assurance via 'consumer's risk' (risk of accepting bad material) and 'producer's risk' (risk of rejecting good material). It is a non-parametric method because it does not assume any particular probability distribution for the population of the material property.

Safety in fracture assessment: the CTOD design curve

Most conventional design philosophies are based on empirical evidence and subjective judgements. A design criterion is selected which is believed to be acceptably safe. When this is used, failures are rare and can be attributed to some freak condition, e.g. excessive overload or poor material properties which pushed the applied load or limit state into the tails of the distributions shown in Fig.1.

A similar approach is used to design against brittle fracture without actually invoking a particular safety factor. A design equation, e.g. the CTOD design curve,[2] which itself is believed to be conservative, is used with the minimum toughness measured and the highest design stress to predict allowable crack sizes, i.e. crack size which can safely be expected not to cause fracture. The safety factor between the allowable crack sizes calculated and the critical crack sizes to cause failure is variable and is not explicitly calculated. It is distributed, with the precise value being dependent on the errors between actual and assumed values for input parameters. However, considerable experience has been gained in using the CTOD design curve and it is believed that the procedure is sufficiently conservative when the lowest of three toughness values is used. The conservatism of the approach has been evaluated in terms of the confidence with which the behaviour of large wide plates containing known cracks can be predicted from results of small scale tests. This work indicates that if the assessment is made using the lowest test result available (normally from three tests, occasionally fewer and sometimes up to six) there is a 97% chance that the assessment will be conservative. The average factor of safety between predicted allowable crack size and critical crack size has been estimated to be three.[3]

The above approach has been used satisfactorily for many years but problems are now occurring for two reasons:

  1. It is becoming more common for more than three CTOD tests to be carried out. The more tests that are performed the more confident one can be in the material properties. However, presently there are no agreed guidelines on which value to use from a data set containing more than three test results, but an approach which is widely practised is to use the lowest value. A problem with this is that as more testing is carried out the range of values obtained increases and, in particular, it is probable that the lowest value will decrease. This can lead to increasing conservatism which may not be required.
  2. There is no facility for performing retests. If one test result fails to meet the required specification, there is no mechanism to permit the welding procedure or material to be accepted on the basis of additional testing (since this will not remove the lowest value).

Generally in fracture toughness testing, three tests are carried out with the notch tip sampling a particular microstructure. A simple method has been developed to interpret data sets containing more than the minimum three test results and hence to benefit from the increased knowledge of material fracture toughness properties which is gained from the increased testing. This method is based on the assumption that if the level of safety which results from using the minimum of three test results is adequate, the same level of safety can be achieved by using the value from a larger data set which is equivalent to the minimum of three tests. However, the data should include only those values which correspond with the particular microstructure of interest (see discussion). In its simple form the model does not attempt to quantify the level of safety, merely to achieve at least the same level of confidence in structural safety as the more conventional approach. In more advanced forms the model can be used to achieve different levels of safety for different applications and can give assistance in estimating the absolute level of safety.

Significance of limited test data

Material properties such as toughness are variable and a consumer/client may wish to satisfy himself that the properties of a particular batch of material or components are adequate. This is normally achieved by testing a limited number of samples and assessing the results against a preset criterion to decide whether the material is acceptable.

The criterion that is set may be such that all the sample test results must be above a certain value. The significance of this criterion varies according to how many tests are carried out. The more tests that are performed, the more likely that a low result will be obtained and the more likely that the material will be rejected. An alternative criterion which avoids this problem is to define a characteristic of the batch of material itself rather than the sample, e.g. a certain percentage of the batch must have properties greater than a required value. This criterion can be interpreted to account for different amounts of test data and is discussed later in this article.

A difficulty with limited testing is that the tests cannot determine with certainty whether the material is acceptable. This is because the results from each sample represent values drawn at random from the range of possible values. This is represented in Fig.2 with ten sets of possible test results. If the acceptance criterion is such that all the test results must be above a value X, some of the sets of results presented in Fig.2 show that the material is acceptable whilst others cause it to be rejected despite the fact that the quality of the material has not changed. (The sets of results also illustrate that with the chosen criterion the material is more likely to be rejected, the more tests that are carried out, since a low value is more likely to be obtained).

Fig.2. Accepting or rejecting a material based on limited data

Fig.2. Accepting or rejecting a material based on limited data

The real problem which arises from trying to interpret limited test data is in fact slightly different from that described above. The distribution of the range of possible test results is not known. Instead there is a known set of test results which has come from an unknown distribution. This is illustrated in Fig.3. The problem is to interpret the limited data to decide whether or not the expected distribution of material properties is acceptable. Just as a range of possible test results may originate from a single distribution, some of which may be acceptable and others rejectable, so a particular set of test results may originate from a range of distributions, some of which are acceptable and some of which are rejectable. The following paragraphs describe a method of interpreting the test data to decide whether the distribution is acceptable and how to evaluate the consumer's or client's risk of unknowingly accepting bad material.

Fig.3. Accepting or rejecting a distribution based on a known set of results. Acceptance criterion is that (1 - α)% of population must be greater than thresdhold value X.

Fig.3. Accepting or rejecting a distribution based on a known set of results. Acceptance criterion is that (1 - α)% of population must be greater than threshold value X.

Instead of using the lowest available test result, it may be decided that the most appropriate value to use for accepting or rejecting a material, or for use in a crack assessment, is that value above which (1 - α) % of the possible results lie, see Fig.3. Each test result may be regarded as a value picked at random from the distribution of possible values. The results which are below the (1 - α) % value may then be regarded as failures, with those above it being considered as successes. The probability that any single test will fail is α and the probability of success is (1 - α) (Fig.3). If three tests are performed, the probability that all three tests will be successes, i.e. all above the threshold value, is

Pr (3 tests ≥ threshold) = (1-α)3    [1]

The criterion for accepting a material may be that the (1 - α) % value must be greater than some threshold value X (Fig.3). If the true distribution is just acceptable, i.e. the (1-α2)% value equals X, the probability of all three test results being greater than X is given by [1]. If this material is actually just unacceptable, i.e. (1 - α1) % value is just below X, [1] may be regarded as the maximum probability that all three values will be above X. If the actual (1 - α3)% value is well above X, then [1] represents the minimum probability that all three results will be above X.

The above discussion has used the concept of quality control, i.e. determining whether test results meet a preset requirement. However, the question of selecting a suitable toughness value for use in a crack assessment is virtually identical and the same methods of interpretation may be used. The problem is to determine a characteristic value from the distribution of toughness values which may be used in the crack assessment. The only difference is that for QA purposes a particular threshold value is set to determine whether or not the material is acceptable whilst in a crack assessment the threshold value must be found.

For example, when three tests are carried out the problem is that the distribution from which they occur is not known. However, if it is assumed that the (1 - α) % threshold value is just below the lowest of the test results,'[1] can be interpreted as the risk that the true (1 - α) threshold is lower than the minimum of the three test results. If, say, the value required in the overall population is that above which 5o% of the possible values lie, the probability that any particular result will be below this value, i.e. fail, is o.5. If three tests are carried out the probability that none will fail, i.e. all will be above the 5o% value is o.53 = 0.125. In other words, the risk that the required value is actually lower than all three test results is 12.5%.

The highest risk of a hypothesis being wrong which is normally considered in statistical assessments is 10%, i.e. 90% confidence that the hypothesis is correct. When this risk is equated to [1], (1 - α) is calculated to be 0.464. This implies that the minimum of three results may be interpreted as the value above which at least 46.4% of the possible results lie. However, for convenience it has been decided to interpret the minimum of three as the value above which at least 50% of possible values lie. The risk that the true value is less than the minimum of three results is 0.125. This interpretation of the significance of the minimum of three results is somewhat arbitrary and other interpretations could be chosen. However, it is believed to be the most meaningful statement that could be made about the parent distribution if only three test results are available.

Large data sets

The advantage in defining the acceptance criterion as a characteristic of the population rather than of the sample is that the criterion can be adapted to suit the particular number of test results available.

One method of interpreting the criterion for different numbers of tests is to use the binomial distribution (see Appendix).

If the criterion is that (1 - α)% of the distribution must lie above a value X, i.e. Pr(x > X) = I - α, values above X are regarded as successes and values below X as failures. If the actual distribution just meets the criterion, the probability that n test results will also lie above X is

Pr (n tests ≥ X) = (1 - α)n    [2]

The variation of this probability with respect to the threshold level and the number of tests is given by Fig.4.

Fig.4. Probability of n tests being at or above threshold level X

Fig.4. Probability of n tests being at or above threshold level X

The probability of r failures out of n tests is

Pr (n-r tests ≥ X) =

 

b2713e3.gif
[3]

And the probability of r or fewer failures out of n tests is

Pr (r or fewer < X) =

b2713e4.gif
[4]

This equation may be used to determine which value from a large data set should be used to satisfy a particular criterion.

For example if the criterion is that there must be at least 87.5% confidence that at lease 50% of the distribution, i.e. the median value, is greater than X then α = 0.5 and Pr must be ≤ 0.125.

i.e. Pr (r or fewer fails) =

b2713e5.gif
[5]

Equation [5] gives the probability that a distribution which just meets the criterion will be accepted. Equation [4] may be interpreted as defining the greatest consumer's or client's risk that material which fails to meet the specified criterion will be accepted. (As the true (1 - α) % value drops below X, the probability of achieving the required number of successes drops and hence the risk of falsely accepting the material decreases).

In terms of selecting a suitable toughness value for use in a crack assessment,[4] can be used to interpret data sets which contain more than three results. When the selected value is used in a crack assessment it should result in at least the same level of confidence as using the minimum of three test results. Equation [5] has been solved for up to 30 results and the resulting stepped curve is given by (i - α) = 0.5 in Fig.5. Other curves in Fig.5 represent different threshold levels above and below the median.

Fig.5. Selecting a value from n tests which is at least equivalent to the minimum of three test results

Fig.5. Selecting a value from n tests which is at least equivalent to the minimum of three test results

So far the risk associated with using a value which is equivalent to the minimum of three test results has been considered. However [4] allows the selection of a value out of n tests which is equivalent to the minimum of an arbitrary number of tests. If for instance it has been found reasonable to use the minimum value of six test results in a crack assessment, then a value can be selected from n test results which gives at least the same level of confidence. Using [1], the minimum of six test results corresponds to a risk of 1.6% that the median lies below the lowest value. The increased number of tests from three to six is likely to result in a lower minimum value. Thus the probability of the median being above the lowest value of the larger set is increased. In this respect, the number of failures which can be allowed in a data set containing n values decreases. For example, in a data set of ten test results the second lowest is equivalent to the minimum of six, as opposed to the third lowest value being equivalent to the minimum of three results.

The risk of 1.6% associated with the minimum of six results corresponds to a confidence level of 98.4%. More useful and less conservative risk values would be those associated with either 95% or 90% confidence levels, i.e. Pr = 0.05 and 0.1 respectively. In statistical terms these are more generally used confidence levels.

Consumer's and producer's risks

In addition to minimising the consumer's risk of accepting material which does not meet the criterion, it is also desirable to limit the producer's risk of rejecting material which does meet the criterion. (In terms of crack assessments the consumer would want to select a value which is too low and possibly excessively conservative).

It is not possible to minimise both types of risk if limited testing is carried out. For fracture toughness testing it is normal to limit the consumer's risk, i.e. limit the risk of using too high a value. If the interpretation presented earlier of the significance of three test results is accepted, then it is normal to limit the consumer's risk to approximately 10% when the minimum of three tests is used either in a crack assessment or as a basis for accepting material. If more testing is carried out and the minimum value is still used as the criterion, the consumer's risk decreases (see [4]). This may seem to be a reasonable state of affairs, but the producer's risk (risk of rejecting acceptable material) increases and this may eventually be very undesirable for both producer and consumer. The extra conservatism may cause significant delays and costs whilst either alternative welding procedures are developed or cracks are repaired.

Use of the binomial distribution to select a suitable value from a large data set allows a more appropriate balance between consumer's and producer's risks. It allows the consumer's risk to be kept at or below the level which has been found acceptable without causing large increases in the producer's risk. In fact, if large amounts of data are available the producer's risk may be reduced significantly whilst maintaining the consumer's risk.

Discussion

Once a suitable criterion has been selected for choosing a toughness value for use in a crack assessment or for accepting/rejecting a material, the binomial distribution may be used to interpret different size data sets. When large amounts of data exist it allows the consumer's risk to be maintained or reduced slightly whilst also reducing the producer's risk as much as possible. An example of the use of [5] is that the lowest value of three tests may be regarded as approximately equivalent to the second lowest of six test results. The practical effect of this is that if three tests are carried out and one fails to meet the requirement, it is possible to carry out three additional tests. If all three additional tests meet the requirement then the material may be accepted on the basis of the six results (however, see warning below). This procedure is not possible under the conventional way of interpreting toughness test results as the additional test data do not remove the fail value. However, it may be seen that this procedure is analogous to that conventionally used for Charpy testing.

Use of the binomial distribution to interpret test results and select suitable values for use in crack assessments or for accepting/rejecting material is very valuable and allows the most appropriate use to be made of the available data. However, great caution and responsibility are required if it is to be used safely. The model is based on the assumption that the test results come from a single population and there is a constant probability of any individual result passing or failing the required value. Care must be taken to ensure that these assumptions are satisfied. When there is doubt about their validity it is safer to keep to the conventional method and use the lowest data point. Caution is particularly necessary when assessing data for a heat affected zone of a weld or for a data set made up of tests from different welds. Data from unsatisfactory welds should not be mixed with data from good welds since they may mask the presence of a low toughness weld. Also, specimens which have not sampled the intended microstructure should not be included in the data set of 'good' results. If such data are combined in the group, one of the assumptions made for the binomial distribution, that the probability of success or failure should be constant (see Appendix), will be violated.

This article has concentrated on the way in which the binomial distribution may be used to interpret large data sets to maintain the same level of safety as achieved by using the minimum of three tests. However, use of the binomial distribution may be extended to allow different levels of confidence in the prediction for different applications. This is very useful since additional safety normally demands additional testing and hence additional material, time and expense, and may only be necessary for particularly important or sensitive applications. The binomial distribution allows the flexibility to develop testing and interpretation schemes which balance consumer's and producer's risks, for both simple applications demanding moderate confidence, and also for major projects with serious safety consequences which demand much higher levels of confidence. One assessment approach which may be used is that the correlations between large sets of data from CTOD and wide plate tests[1-3] may be used to set alternative criteria to that postulated earlier in this article. For example, to achieve a level of confidence higher than 97% in the safety of predicting wide plate test results, it may be necessary to use the value above which 70% of the data lie rather than 50%. The new criterion may be used directly in [4] to determine how much testing should be carried out and which value should be used in the crack assessment.

The confidence in the safety of a wide plate prediction is not the same as the confidence in safety of a crack assessment for a structure. This is because in the wide plate a known crack is introduced into a known microstructure and the safety of the wide plate predictions reflects only the conservatism of the crack assessment procedure itself. In a structure, there is only a finite probability that there is a crack of a particular size, in a particular microstructure and subjected to a particular stress level. These probabilities must be incorporated (amongst many others) when carrying out a full risk assessment of a structure. Their effect is to reduce the risk of fracture to considerably below the level of conservatism derived from simple correlations between wide plate test results and predictions from CTOD values.

Summary

Some aspects of scatter in fracture toughness data have been discussed. A statistical method is introduced which, in its simplest form, can be used to provide guidance on how much testing is necessary, or on how to interpret data sets containing more than three test results. In a more advanced form it can be used to evaluate risk levels for quality control. The method does not require prior knowledge of the type of probability distribution that describes the data. However, it does require that only those data are used which correspond to a particular microstructure.

Practical use of this approach will be considered in more detail in a future article.


References

Author Title
1 Towers O L, Williams S and Harrison J D: 'ECSC collaborative elastic-plastic fracture toughness testing and assessment methods'. Welding Institute Contract Report 3571/10M/84, 1984. (Scheduled to be published as a Members Report). Return to text
2 Dawes M G: 'The COD design curve'. Publ in 'Advances in elasto-plastic fracture mechanics', Ed. L H Larsson, Applied Science Publishers, London, 1979, 279-300. Return to text
3 Kamath M S: 'The COD design curve: an assessment of validity using wide plate tests'. Int J of Pressure Vessels and Piping 1981 9 79-105. Return to text

Appendix

The binomial distribution

An important concept in probability is that of a probability distribution. There are two types of distributions; one which is discrete or discontinuous and the other which is continuous. In a discrete distribution the random variable can only take a discrete set of values. In a continuous distribution the random variable can take any specified value. An example of a continuous distribution is the normal distribution which is widely used in statistics. The binomial distribution is a discrete distribution and may be used whenever a series of trials is made that satisfies the following conditions:

  1. The individual trials are independent;
  2. Each trial has two outcomes, success (s) or failure (f), which are mutually exclusive;
  3. The probability of successor failure in each trial is constant.

The binomial distribution is well documented in most standard statistics text books, but it is developed here for readers unfamiliar with it.

The above three conditions are satisfied when, say, a coin is tossed. The number of heads, i, in n tosses is a random variable which can have any integer value between o and n. To find how the probabilities of obtaining i are distributed, consider first a case of getting three heads out of five tosses. Let H denote a head (success), and T denote a tail (fail). For five tosses there are 32 possible outcomes, and these are shown in Fig.A1. Each outcome comprises five letters, H and/or T. The outcomes comprising three Hs and two Ts represent all possible combinations of getting three heads in five tosses. The number of these combinations is given by:

b2713ea.gif

If α denotes the probability of getting ahead in a single toss (= ½ for an unbiased coin), then the probability of a tail is (1 - α). Then the sequence HHHTT in Fig.A1 has the probability

α.α.α(1 - α) (1 - α) = α3 (1- α)2

Fig.A1. Computing the binomial probability of three heads in five tosses of a coin

Fig.A1. Computing the binomial probability of three heads in five tosses of a coin

 	Fig.A2. Examples of binomial distributions: A2a) Positively skewed; A2b) Symmetrical; 2c) Negatively skewed.

Fig.A2. Examples of binomial distributions:

A2a) Positively skewed;

A2b) Symmetrical;

2c) Negatively skewed.

Another outcome which comprises three heads and two tails that could equally have occurred is HHTHT. The probability of this sequence is

α.α.(1 - α) α (1-α)=α3 (1-α)2

Thus for any outcome with three Hs and two Ts the probability is always α3 (1 - α)2. However, this is the probability of just one outcome comprising three heads and two tails. There are ten possible ways in which this outcome may occur and therefore the probability is

10.α3 (1 - α)2

Substituting for α = ½, the probability of three heads in five tosses is

Pr (3) = 10.(½)3 (1-½)2

= 0.3125

More generally, the above example can be used as basis to compute the probability of getting i heads in n tosses. One possible sequence that could occur is

b2713eb.gif

The probability of getting this sequence is

α.α.α.....α.(i-α) (i-α).... (i-α) (i-α) = αi (1-α) n-i

As shown above, this also represents the probability of any other sequence comprising i heads and (n-i) tails. The number of combinations of i heads in n tosses is

[A1]
[A1]

Thus the probability of obtaining i heads in n tosses is

b2713ea2.gif
[A2]

This equation defines a discrete probability distribution of i which is generally known as the binomial distribution. It is so called because the individual probabilities are terms in a binomial expansion which add up to 1, i.e.

b2713ea3.gif
[A3]

The individual probabilities in the binomial expansion can be represented in terms of bar charts for different values of n and α. Some examples are given in Fig.A2 and these show that each probability forms a part of a discontinuous probability distribution. If Pr(i) is the probability of obtaining a head, Fig.A2b could be taken to represent an unbiased coin which is tossed 15 times. Figure A2a and c would then represent a coin which is biased towards a tail and a head respectively. In both cases the distribution is skewed.