There are several formulas that can be used to calculate compliance limits. The simple formula that was given in the previous paragraph and that works well for sample sizes greater than 60,[14] is the formula for estimating the Krippendorff alpha – are shown in the additional file 1. For more details, we refer to the work of Krippendorff [27]. Gwet [1] indicates that Krippendorff`s alpha is similar to Fleiss` K, especially if no value is missing. The difference between the two measures is explained by different definitions of the expected agreement. For the calculation of the expected agreement for Fleiss` K, the sample size is considered infinite, while for Krippendorffs Alpha, the actual sample size is used. Krippendorffs Alpha[16][17] is a versatile statistic that evaluates the agreement between observers who categorize, evaluate or measure a certain number of objects against the values of a variable. It generalizes several specialized agreement coefficients by accepting any number of observers applicable to nominal, ordinal, interval and proportional levels of measurement, capable of processing missing and corrected data for small sample sizes. Another way to conduct reliability tests is the use of the intraclass correlation coefficient (CCI).

[12] There are several types, and one is defined as “the percentage of variance of an observation because of the variability between subjects in actual values.” [13] The ICC area can be between 0.0 and 1.0 (an early definition of CCI could be between 1 and 1). CCI will be high if there are few differences between the partitions that are given to each item by the advisors, z.B. if all advisors give values identical or similar to each of the elements. CCI is an improvement over Pearsons r`displaystyle r` and Spearmans `displaystyle `rho`, as it takes into account differences in evaluations for different segments, as well as the correlation between Denern. For rxx, we used two different reliability dimensions: (1) the RICC obtained in our study population and (2) the test test reliability (Bockmann and 0ese-Himmel, 2006), a value that comes from a larger and representative population and rather reflects the characteristics of the ELAN and not our sample.