On the Robustness of PERT Fittings in Agricultural Yield Insurance

In agricultural insurance practice, risk and indemnity payment are often incurred from individual farmer’s yield. However, high administration cost and data scarcity are simultaneously quite often seen, which form huge burdens for insurers to adequately rate insurance products. Under this circumstance, some methods that could be used to estimate farmers’ yields, in particular, their distributions, are urgently needed. Among these methods, a so called PERT fitting technique often prevails due to its simplicity which only requires very little knowledge about the yield history, that is frequently implemented by both academics and practitioners. However, the very limited information used would sometimes cause severe bias, in other words, the reliability of this method is yet to be examined. In this paper, I used Monte Carlo experiments to test the robustness of PERT fittings under Var and CTE risk measures in different scenarios. The result proves that PERT method is indeed robust and trustworthy.


INTRODUCTION
The main purpose of this research is to figure out some properties in fitting distributions for assumed populations with conventional PERT techniques, test their behaviors under different scenarios that are generated by controlling skewness and variance before evaluating their robustness in distribution fitting through properly designed experiments. In this research, I mainly focus on distributions derived from non-negative random variables and their fittings under actuarial context, which leads to the concentration to some particular risk measures, such as the left-tail Value at Risk and the corresponding Conditional Tail Expectation. Monte Carlo simulations will be the basic tool in random sampling from given distributions for its applicability and flexibility.

PERT DISTRIBUTION
PERT originally refers to "Program & Project Evaluation and Review Technique", which is, as a management tool, developed for the Program Evaluation Branch of the Special Projects of the Navy (U.S. Dept. of the Navy. (1958)). The entire process of conventional PERT includes a distribution fitting procedure that uses a special three-parameter general Beta distribution (Malcolm,D.G. et al. [1]).
In general, a Beta distribution on (0,1) with parameters = > 0 and c > 0 has a density function defined as In using general Beta to fit distributions, the usual methods are of two main kinds, maximum likelihood estimations (MLEs) and Moment/Quantile matchings. The former requires all information of each sample, whereas the latter requires some pre-chosen sample statistics. Under the circumstances where complete datasets (samples) are hard to acquire, moment/quantile matching methods seem to be more feasible due to its less stringent requirement for sample information. Moreover, among all moments/quantiles, ∏, | [V( and π ) are the most easily captured ones. In this case, one is capable of using any two of the above equations to solve = and c ∫ by substituting ∏, |[V( and π ) by ∏, | º [V( and π ) respectively.
Under the context of its original use, PERT tackles to a triangular estimation problem, in which all data is categorised into three clusters, |-D, |JZ and |[V(. The motivation is to transform uncertain problem into relatively certain one by categorising or ranking, instead of processing continuous stochastic information. In most cases where PERT is implemented, decision makers want to obtain some distribution information of concerned variables based on very simple point estimators instead of those for continuous behaviors. For instance, if an insurer needs to know what an individual farmer's yield distribution looks like with absence of the farmer's historical yield data, he could ask the farmer for his | º -D, | º [V( and | º JZ before conducting PERT fittings on them to reach the purpose. In this case, the inputs for PERT fitting are quite subjective and actually infers that: a continuous data set, without any loss of generality, is regarded as a combination of three clusters, and yields a sample group composed of only three values of observations. Indeed, in practice, with acquired | º [V(, | º -D, | º JZ and the assumption that the possibility for occasions that |[V( happening is 4 times than that of |JZ plus |-D. It is often estimated that (1.5) Obviously, the above estimations closely relate to the assumption of "4 times", and they are precise if population coincides with E(D(.JF , (/J(4,4). Actually, it could be explained in an easier way, that is, with the assumption of two independently distributed events Ü m¬ØøΩøØ√Ø< and Ü ¬%√√øΩ%√Øø< with symmetric assumptions "mode happens 2 times that min" and "mode happens 2 times than max". Indeed, the constant 4 could be generalized as ƒ if verification is needed. In this case, the choice of ƒ becomes quite sensitive. Moreover, if a general Beta distribution has identical |(JD and |[V(, which indicates unskewness and yields = = c, the estimation (1.4) will be unbiased regardless of ƒ, whereas (1.5) being biased in many cases even if the distribution is unskewed. Moreover, = ≠ c could be used in conducting skewed fittings, for instance, as = + c = ƒ suggested by Sasieni.
With the use of (1.2) and (1.4) or (1.2) and (1.5), = and c ∫ for PERT distribution could be obtained. One should note that the original PERT is a very sophisticated process combining multiple phases, which is, in most cases, information about the concerned variable, elapsed time, for instance, is simply approximated by (1.4) and (1.5) (U.S. Dept. of the Navy. (1958)), instead of the fitted PERT distribution. In this research, I just concentrate on the PERT distribution, with nothing else relates to the whole program evaluation and review procedure.

POPULATION ASSUMPTIONS
In actuarial and risk assessment practices, the most commonly treated data is that generated from the class of non-negative random variables. For example life durations, farming yields and all sorts of claim payments etc. In this research, in a general way, I use random samples drawn from predetermined parametric populations as representations for non-negative datasets in real life applications. The reason for that is quite natural, because I can't simply deny the fact that a dataset comes from a parametric population, despite not knowing from what distribution exactly. As the fact, the choice of population distribution becomes essential under this context. I believe that a good choice must at least have the following properties: 1. It must be adequately representative to reality, its range, shape and other properties has to suit the research purposes. 2. It must be of great flexibility as it could be adjusted into multiple cases which are of concerns. 3. It must be easily operate, the relations of its parameters toward its moments and quantiles should be as simple and clear as possible.
By considering those in mind, I find modified Gamma distribution might be the best choice as it attains all the desired properties. Moreover, despite the fact that Gamma distribution is always right-skewed, a mirror image operation could be used to solve this problem, that is, by |-D(Z) + |-D(Z) − Z where Z is a vector of samples from right skewed Gamma distributions. The converted samples have identical variance but opposite skewness against the original ones. Meanwhile, general Beta distribution seems to be an alternmative, but it is hard to operate as the relations of its moments and mode toward its parameters are of comparatively more sophisticated forms, besides, general Beta distribution requires a predetermined fixed range, which may lack variability to some extent.
In general, a modified Gamma distribution | + Ü where | is a non-negative constant, Ü is a Gamma distributed random variable with shape parameter U > 0 and scale parameter ∆ > 0, posses a density function In this research, I assume that the population coincides with modified Gamma distributions, and I are concern about different scenarios generated from them by controlling population mean, variance and skewness. Kurtosis is not considered because it will make the research much more redundant, other information about the population will be ignored as well, as they are relatively less important than moments of the first three orders.
VaR AND CTE Under actuarial context, censored data, such as claim payment, is usually of primary concerns to many practitioners. Because in many cases, tail information is of greater importance than that of center. In this research, I am going to tackle to both censored and uncensored data, which means they could either be realized loss or risk sets where loss comes from. In the first case, PERT method is simply implemented to loss data, whereas in the second case, data derived from risk sets with potential loss contained in is fitted. In this case, some additional methods should be undertook in order to separate actual loss from populations. Hence, I consider two widely used risk measures, Value at Risk (VaR) and Conditional Tail Expectation (CTE) to reach the purpose.
Unlike traditional definition on the right tail, I use a left tail definition of VaR, which is . With these definitions, if population is regarded as a risk set with loss triggered in percentiles to its left tail, VaR, as an threshold, could be used as an indication of the severity to this risk, because claims will only occur with values less than it. Consequently, CTE infers to the expected loss of the given population.

PERT FITTINGS Population Distribution Fitting
Here I implement PERT and general Beta fittings on populations with modified Gamma distributions under different combinations of skewness and variance. The experiments are conducted to use them fitting dataset composed of 10000 random samples drawn from a given modified Gamma populations. As the value | º [V( has to be estimated from the samples (with equally " ">>>> empirical frequency each value), it has to be certain where the density reaches its peak. I use kernel density estimation (KDE) (Rosenblatt,M. [2], Parzen,E.[3]) to figure that, with the help of some R (a statistical software) packages.
In KDE, the choice of kernel function and bandwidth will both cause influence to the result. According to Wand,M.P. and Jones,M.C.[4], the loss of efficiency in estimation is small with the use of different kernel functions, but huge in distinct bandwidth selections. So in the latter sections, I will consistently use Gaussian kernel for its convenience and usability. Meanwhile, there are many commonly used methods in optimal bandwidth selection and their efficiencies are shown for a small-sample case with 60 samples drawn from a Gamma population. "J||J(1,50).
As shown in the figure, the method "WXÅ", which refers to "unbiased cross validation" (Bowman, A.W. (1984), Hall,P. et al. (1992)) seems to be the best in capturing changes of the empirical density to a small-size sample group, and it is a suitable method for a wide variety of sample groups. Therefore, I will use it to extract | º [V( in the latter. The empirical density estimated by KDE is a smoothed curve, without simple closed form expressions in most cases, but estimations for any values within the range could be conducted. In this case, I take 512 equally-distant points from (~J|aF((|-D),~J|aF((|JZ)), obtain 512 estimations for the empirical density and get the | º [V( by picking the largest from them. For PERT fitting, I use ∏̂ derived from (1.4), combining with (1.1) and (1.2), I have This is exactly the case where ƒ = 6 as in Sasieni (1986), and coincides with the method many statistical softwares implement (@Risk etc). The reason not to use (1.5) is intuitively because that it actually causes double bias as it is derived based on estimation (1.4).
In comparison, I also implement general Beta fitting and use (1.1) and (1.2) to solve the parameters. By this, I assume that the population mean is known, or perfectly estimated with ∏̂= ∏. The reason for that is quite natural, because no matter how I modify the parameters in PERT distribution, it is still a special form of general Beta distributions. In result, somehow, this sort of general Beta fitting may be regarded as the "best" case of PERT fittings, which to some extent allows us to compare robustness among different PERT methods. In addition, I also assume that | º -D and | º JZ equal to |-D and |JZ in all cases.
By controlling U , ∆ and | , I generate four populations "J||J(2,25) + 303 , "J||J(100,3.53) , "J||J(2,10) + 333 and "J||J(100,1.41) + 212 which all have the same mean (353) but distinct skewness and variance, as the representations of four typical scenarios. However, population with negative skewness is spared because the fittings are symmetric (the right skewed estimation is symmetric in shape of the left skewed case) as all the fittings are of Beta forms.  The figure shows that in the right skewed cases, GB fittings present much more accurate results than PERT, as they precisely capture the peaks and tails of the empirical distributions while PERT fittings exhibit longer and heavier tails. The analogous results could be expected for left skewed cases. In the unskewed cases, where theoretic population skewness I implement ( Theoretically, in GB distributions, I have These relations guarantee the positiveness (regularization properties) of parameters = and c in GB distributions. But when using it to conduct fittings, very often, the | º [V( and ∏̂ derived from datasets would fail to follow those relations, thus leading to negative = and c ∫ and resulting in invalidness. The same analysis could be applied to modified Gamma fitting which is widely used in actuarial practices. Because in modified Gamma fittings, relations Ü∆ ò + | º = ∏̂ and (Ü − 1)∆ ò + | º = | º [V( are used, yield Ü = ' ºMΩ º ' ºMΩ º mae% and ∆ ò = ∏̂− | º [V( when Ü ≥ 1, which guarantees the positiveness of these two parameters and may fail if datasets don't coincide with the requirement that ∏̂> | º V[(. Besides, more information is needed in estimating | º . It should be emphasized that these failures are irrelative with the size of datasets, but relating to the underlying parametric assumptions, which cannot be eliminated as long as I still use the The results prove that even under large and homogenous datasets, any tiny change in |∏̂− | º [V(| will be amplified in estimating = and c ∫ in GB distributions, hence producing entirely distinct shapes, while some of them may cause devastating impact under actuarial context due to their significant underestimations to the empirical tails. However, in highly skewed datasets, as |∏̂− | º [V(| increases, the sensitivities of = and c ∫ towards it will decline, therefore GB fittings under this circumstance will be more robust. So far, all the results I have obtained indicate that PERT seems to prevail in population distribution fitting under limited information of ∏̂ and | º [V( for its robustness and universality.
Additionally, the impact of variance towards fittings is quite clear, which is, smaller variance shrinks the range of all empirical and fitted distributions, without changing the patterns of shapes and reciprocal positions of them. Consequently, in the latter, I will also treat population variance as a controlled variable.

VaR and CTE Distribution Fittings.
In actuarial practice, if the task is to use PERT to fit some certain kind of populations, for instance, loss or claim payment, though robust, the result may be relatively unacceptable as I have seen. But in other cases that the population to be fitted is some kind of risk set, which means, I don't care about all its information but only the tail-related one, the situation in consideration would be totally different. Contemporarily, I assume that loss is triggered at certain predetermined percentiles, which is indeed the case in many actuarial uses, then without any loss of generality, I could use the left tail VaR and CTE, which are defined previously, to examine the behaviors of "severity of loss" and "expected loss" in PERT fittings under different scenarios. The idea is to firstly construct some situations close to reality. For this purpose, I build a collection composed of 2000 independent risk sets, (e.g, it could be a group of 2000 individuals facing potential loss in the left tail) who have identical underlying population distributions (modified Gamma distributions), which infers that all risk sets in this collection are highly homogenous. Then I use unbiased Monte Carlo method (By resetting the pseudo random number generator in each time of sampling to avoid circulations in long sequence) to draw 60 random samples from every risk set, which is to simulate the historical behaviors of each individual. At this stage, the sample quantity "60" is carefully selected for the reason that I hope to obtain a sample group with significant variabilities, simultaneously I don't want the assumed underlying distribution causes too much influence to the samples drawn from it. Ideally, our main purpose is to control skewness and variance to the whole collection in general manners, instead of making each sample set present obvious features of Gamma distribution. Then I implement PERT fittings to every risk set's 60 samples, by this, together with some calculations, I obtain two sets of 2-tuples  (1996). Meanwhile, under the assumption that every sample set posses a continuous density, CTE is equivalent to Tail Value at Risk (TVaR), which is In order to reduce estimation error, I calculate CTE (TVaR) by implementing a numeric integration technique based on quadratic approximation procedure (Piessens,R. et al. [5]), combined with the quantile function estimation method as above.
As the same as in the previous section, I adjust U, ∆ and | in the underlying modified Gamma populations to obtain scenarios with fixed variance and different skewness. Moreover, I could calculate VaR and CTE in the left skewed cases either with samples generated by mirror image operation from right skewed populations or simply calculating them on the right tail of right skewed populations. Moreover, I also add some comparisons between a = 0.05 and a = 0.15.   The results show that under all considered circumstances, PERT fittings in VaR and CTE present some overestimating patterns toward the empirical, and compared with population fittings, the fitted VaR and CTE distributions seem to be relatively closer to the empirical ones. This is due to the narrowing discrepancies as VaR and CTE only take information form left part of the original population distributions. Apparently, in every scenario, PERT fittings in both VaR and CTE display analogous shape toward the empirical ones, additionally with similar range and peak. Indeed, by (2.5), CTE is the direct result of VaR, consequently, its behavior would be determined by corresponding VaR fittings under 0 < Ë < a.
The results also reveal a phenomenon that both empirical and PERT fittings in VaR and CTE tend to concentrate in distribution with increasing skewness. Intuitively, as skewness descending, left tail information obtained from every risk set within the collection, specifically under small significance level a, would be of greater variability due to the nature that samples are less gathering in the left tail when such tail is long and heavy. In result, this would cause larger range and variance with lower peak and kurtosis in both empirical and PERT fittings under VaR and CTE risk measures.
However, in the cases that ~ƒ(åD(~~= 0.2 and ~ƒ(åD(~~= −1.41, PERT fittings in VaR and CTE tend to converge to the empirical as a increasing from 0.05 to 0.15, while diverge in the case with ~ƒ(åD(~~= 1.41. For detailed comparisons, I illustrate theoretic CDFs of modified Gamma populations together with those for corresponding PERT distributions under right, unskewed and left skewed cases where they are generated from "J||J(2,25) + 303 , "J||J(100,3.53) and the left skewed version of "J||J(2,25) + 303 respectively. Note that the theoretic PERT distributions are established by a group of 100,000 samples drawn from modified Gamma populations in order to guarantee finite range. For this research, a = 0.4 will be fairly enough so that result is shown in Figure 7 with 0 < a < 0.4.   Figure 8. I find that, in average, indeed Δ|JZ ⩾ Δ|-D and all Δ|JZ, Δ|-D and Δ| º [V( will approach to 0 as D increasing. Besides, sample size will cause slightly systematic impact on | º [V( for populations with ~ƒ(åD(~~≠ 0, which is, a pattern that |~ƒ(åD(~~| tends to be smaller than |~ƒ(åD(~~| and gradually converge to it. Meanwhile, I need to elaborate that | º [V( is actually extracted with KDE method, instead coming from empirical dataset which is discrete. Moreover, I simulate a density comparison among four sample sets, which contain 60, 200, 400 and 600 samples respectively from a "J||J(2,25) + 303 population, and implement corresponding PERT fittings toward them, results are shown in Figure 9 (Note that PERT densities are extracted from 2000 random samples from corresponding fitted PERT distributions). I find that, the results coincide with the findings displayed in Figure 8, with an obvious pattern that samples from small-sized dataset are less gathered in both right and left tails, while the peak (| º [V() deviate from the theoretic value which is larger. This is quite understandable, because under small sample size, samples are insufficiently clustering in the tails (presents shorter tails and enlarge the empirical densities in tails) where theoretic densities Ç(Z) are very low (approaching 0), whereas over-clustering on the right side of theoretic |[V( where the theoretic probability ∫ Ωkl Ωmae% Ç(Z)VZ is larger than ∫ Ωmae% Ωø¿ Ç(Z)VZ .
Meanwhile, PERT fittings under small-sized samples tend to be more steep, which will be discussed later. If I carefully observe the trends in Figure 9, I may find that the reciprocal tail position that PERT lies below the empirical distributions in both tails might be the prior cause which would lead PERT overestimate the empirical VaR and CTE in small-sized sample set.
(Similar trend could be found in low-skewed and left skewed populations). (2.12) Combined with (2.10) and (2.11), (2.12) infers that, under cases Z " ∈ ( Now, I consider the impacts of ΔD to Z " and Z ) . In fact, there are no explicit mathematical expressions to measure the relations of sample size D with Z " and Z ) , therefore I cannot straightly put partial on them because the relations are not continuous. In general, I assume that |-D and | º [V( are non-increasing functions of D, while |JZ is a non-decreasing function of D, hence Z " and Z ) are functions of D. Moreover, for simplicity, I assume that ~ƒ(åD(~~↑ ƒ(åD(~~ as D → ∞, which will lead | º [V( ↓ |[V( as D → ∞. Denote In order to figure out whether this difference is positive or negative, I use a simple mathematical relation, which is, for J, R, X, V > 0, The truth is, I cannot rigorously prove which side is larger in these pairs due to the absence of corresponding explicit mathematical expressions. But considering this in a practical way, I may set some safe and reasonable assumptions that if 0 < Δ|-D < Δ|JZ, there are reltaions (2.22) In reality, (2.22) is a trivial case, while for ΔD which will make both Δ|-D > 0 and Δ|JZ > 0, which is indeed the case under small sample size, it is hard to compare the terms in the pairs of ( ¿ . Simulations designed as the same as above are conducted to examine these deductions, which are exhibited in Figure 10.  The results in Figure 10 do support our deductions, the trend I depict in the previous discussion is very clear under large population skewness, while some fluctuations occur in the low skewness case, which is intuitively caused by the implementation of ΔD = 50 that might be insufficient to extract the pattern when population is closely unskewed. Particularly, I use a total differential form to express the impact of ΔD to =, which is V= =¸∞ , the general impact of ΔD towards = would be vague. Because in this case, the above discrepancies would all enlarge and I cannot be certain which one is more significant. But I should also notice that, for populations which have long right tails, Z " will eventually reach to a very low level, thus widening the interval [ "M)l ß ) , " ) ), together with the fact that Z " declines faster than Z ) , Z ) will gradually lies into [ "M)l ß ) , " ) ), and the case becomes (2.24).
So far, from the previous deductions, I believe that in general, = in PERT fittings will tend to decrease as more samples are drawn. Moreover, this pattern will still be true for other parametric underlying populations with finite range. In addition, analogous analysis could be implement to unskewed or left skewed populations. Again I need to emphasize that, the above mathematical deductions are not absolute, instead, I only use them as tools in explaining the phenomenons of interest. Though rough, they are still useful in illustrating general patterns.
Some simulations are conducted as examinations for the deductions of the impact of ΔD towards =. The experimental design of increasing sample size of each risk set from 60 to 510 is continuously used, with 10 distinct skewness varies from 1.41 to 0.14. The results shown in Figure 11 provide solid evidence for our analysis, with the phenomenon that = decreases with ΔD at decelerating speed. .
For PERT distributions with = + c ∫ = 6, it is obvious that excess kurtosis is an increasing function of = for = ⩽ 3 (right skewed cases), while decreasing with it when = > 3 (left skewed cases). Therefore, under right skewed cases, lower = will make PERT distribution present more gentle shape in its tails (technically, with less outliers). In this case, VaR and CTE fittings will increase with more samples are drawn as = increasing simultaneously. On the other hand, the empirical distribution will converge to the theoretic one at the same time. In the previous result (Figure 9), I find that empirical densities will drop in tails, and become more concentrate around the | º [V(, hence, leading empirical VaR and CTE increasing as well. Therefore, there is no absolute and certain trend of the dynamics of PERT fittings toward empirical VaR and CTE when sample size enlarges. In fact, as sample size approaching to ∞, the reciprocal position of an underlying population and the corresponding PERT fitting will be determined by their theoretic levels (e.g, Figure 7).
However, the most important property is, PERT fitting to the assumed modified Gamma population in VaR and CTE risk measures is very robust under small sample size. I implement some simulations ( Figure 12) in illustrating this robustness on an 60 × 2000 data collection under population assumptions "J||J(2,25) + 303, "J||J(100,3.53) and a mirror image version of "J||J(2,25) + 303 , which have skewness 1.41, 0.2 and -1,41 respectively. Moreover, I establish two sample statistics "Integrated Discrepancy (Ä)" and "Overestimated Ratio  The results in Figure 12 reveal that under all population skewness, PERT fittings of VaR and CTE presents overestimated patterns toward the empirical with significance level varies from 0.01 to 0.4. The general dynamic behaviors in discrepancies between empirical and fitted concords with the theoretic properties ( Figure 7). Moreover, the discrepancies will grow with skewness, which is a natural fact that the left tail distinction of empirical and PERT would be significantly enlarge with descending population skewness. In fact, theses results provide very valuable knowledge, particularly under risk assessment and actuarial context. Because it will overestimate the risk rather than underestimate it, more importantly, PERT fittings in smallsized samples would be rather better than large-sized samples as it is more robust, in this case, always overestimates whenever population skewness varies. This entirely different patterns against theoretic cases (Figure 7) could somehow be regarded as "Skewness free property".
Furthermore, all of the experiments are conducted on several assumed modified Gamma distributions, in this sense, I want to know if the properties I find would hold for wider range of population distributions. In fact, for any given right skewed population distribution, if sample size is relatively low, as depicted previously, samples would insufficiently cluster in the left tail, thus causing empirical left tail density larger than that of PERT, with PERT presents steep shape in this case (= is relatively large). Therefore, for any underlying populations with arbitrary skewness, as long as the sample size is relatively low, there must be an interval of significance level (Ë > , Ë " ) such that if a lies in it, the fitted {Ju ¬ and vt; ¬ would be larger than the empirical {Ju ¬ and vt; ¬ in general trend, whereas the interval (Ë > , Ë " ) depends on the underlying population distribution and the sample size.
In addition, the assumption in PERT that = + c ∫ = ƒ would impact the general shape of PERT distributions, with larger ƒ generating more dense sample clusters around |[V( , thus enlarging the kurtosis. However, the choice of ƒ is a quite subjective which often relies on the user's attitude towards the extent of concentration for a concerned variable. So far, there is no standardized method which could be used to adjust this assumption. In practice, the common way is to ask the individual risk takers for their thoughts, and use them to moderate the assumption.
CONCLUDING REMARKS 1. PERT method is very useful in distribution fitting procedure, with very loose requirement for sample information. The underlying assumption of it guarantees its