Tasters’ performance in a coffee quality contest in Brazil

,


INTRODUCTION
The commercial value of coffee is influenced by quality. The beverage quality is evaluated by professional tasters, who perform sensory analysis to score coffee quality attributes according to a chosen protocol (D'Alessandro, 2015;Gutierrez;Barrera, 2015;Pereira et al., 2017;Pereira et al., 2018).
Protocols, such as those established by Specialty Coffee Association -SCA (Specialty Coffee Association -SCA, 2015) and Cup of Excelence -COE (Alliance for Coffee Excelllence -ACE, 2020), are used for commercial, research, and contest purposes. Protocols and related guidelines and standards are also used for coffee samples preparation (e.g. roasting and grinding grade, sample amount, and water temperature and volume), tasting room preparation (e.g. without interfering smell, wind, and noise), and tasters capacity building.
Nevertheless, taster judgement may be biased by personal preferences, individual perception of quality, temporary low sensory perception, lack of calibration, prior information about the sample, undue interaction with other members, and other factors (Di Donfrancesco;Guzman;Chambers, 2014;Pereira et el., 2017).
The reliability of the sensorial analysis is associated with the homogeneity of scores among tasters and it seems to be more important than the number of tasters (Ferreira et al., 2018). Coffee quality contests are usually run through a group of tasters. The ideal number of tasters to evaluate coffee sensory quality is unknown. A small number of tasters may be inaccurate, while a large number may be more expensive without corresponding increase in accuracy. Then, the number may vary depending on the the objectives of the contest and its feasibility.
For instance, the annual Minas Gerais Coffee Quality Contest objectives are to distinguish farmers who produce best coffees in the state, promote added value, publicize Minas Gerais coffees, but also to train tasters for sensory analysis of specialty coffees (Empresa de Assistência Técnica e Extensão Rural do Estado de Minas Gerais -EMATER, 2021). This contest is composed by large number of tasters, which allows one to statistically study its composition. Repeatability coefficient (Cruz;Regazzi;Carneiro, 2012) can be used to study the number of tasters and Tocher cluster analysis (Cruz;Ferreira;Pessoni, 2011) can be used to study the reliability of the coffee sensory panel.
Thus, the objective of this work was to evaluate the performance of coffee tasters in five annual editions of Minas Gerais Coffee Quality Contest.
of Sul de Minas (IF Sul de Minas), by the Federal University of Lavras (UFLA) and by the Support, Teaching, Research and Extension Foundation (FAEPE), in the years 2013, 2014, 2015, 2016 and 2018. Data for the year 2017 were not available.
The contest received samples of Arabica coffee, from coffee growers in the municipalities of the state of Minas Gerais, produced by them in the current year, which fell into two categories: natural and peeled cherry. The natural coffee category (or dry method processing) is the form of preparation by which the freshly harvested coffee, after the washing/ separation process, is taken to the terrace to dry in the sun and/ or to the dryer, without removing the skin of the fruit.
The peeled cherry coffee category (or wet method processing), now called CD, includes the peeled cherry coffee, which refers to the preparation method in which the fruits are washed, pass through a peeler, separating the green fruits from the ripe ones, following, later, for drying. This category also includes pulped and/or demucilated cherry coffee, which is the form of preparation in which the fruits are washed, they pass through a peeler, in which the green and ripe fruits are separated, later they are taken to a fermentation tank or they pass through a demucilator, finally, they proceed to drying.
The sensory evaluation forms from the final phase of the Minas Gerais Coffee Quality Contest for the years 2013, 2014, 2015, 2016 and 2018 were used, this phase brought together the best coffees. The previous phases consisted of a physical classification step and a sensory classification step. The first phase, of physical classification, had an eliminatory character, type 2 coffees were classified, according to the Official Brazilian Coffee Classification Table, sieves 16 and above, with maximum leakage of 5% and humidity between 10% and 12%. In the next phase, the coffees were submitted to sensory analysis, in which samples with sensory analysis scores below 80 points were disqualified. The samples of the finalist coffees used in the study consisted of the coffees selected in the sensory analysis. The number of samples and the number of tasters varied in the years studied and can be found in Table 1, where you can also find the sensory protocol used, which will be described later. each sample, through the "cupping test". Sensory analysis was performed by skilled tasters in specialty coffees (EMATER, 2021), all samples were coded, so that professionals had no information on the samples evaluated. Each taster performed a sample determination, each sample being made up of five cups.
In 2013, 2014 and 2015 the Cup of Excellence (COE) evaluation protocol (ACE, 2020) was used, in which the following attributes are evaluated: clean cup, sweetness, acidity, body, flavor, aftertaste, balance and overall. In this methodology, each sample has a starting score of 36 points, to which the scores for each attribute are added (0 to 8 points), making up the final score. In the years 2016 and 2018, the SCA (Specialty Coffee Association) protocol was used, in which scores from 0 to 10 points are attributed for the attributes fragrance/aroma, uniformity, clean cup, sweetness, flavor, acidity, aftertaste, body, balance and overall, the sum of the notes of all attributes constitutes the final score.
The experimental design used consisted of a DIC (completely randomized design). For the repeatability analysis, the treatments (coffees finalists in the contest) were tested with the repetitions, constituted by the tasters, in the five years of the contest. Classified data was used, which is the previous classification of the original data. From this classification, data were obtained and submitted to processing to calculate the repeatability coefficient. The repeatability coefficients (r) were estimated using the analysis of variance (ANOVA) methods, in which the temporary effect of the environment is removed from the error, by the principal component method (CP), based on the correlation matrices and the method of structural analysis (AE), based on the intraclass correlation matrices. The number of measurements, that is, tasters, necessary to predict the real value of individuals, based on the pre-established determination coefficients (R²) (0.80, 0.85, 0.90 and 0.95), it was obtained according to the methodology described by Cruz, Regazzi and Carneiro (2012).
Euclidean distance matrices were determined between the tasters for the five years of the contest. In determining the matrices in the years in which the COE protocol was used, all attributes of the drink were used, in addition to the final score, however in the last years studied, 2016 and 2018, with the use of the SCA protocol, three attributes were excluded, the clean cup, sweetness and uniformity, due to the low variation between the tasters for these attributes. These distance matrices were used as a measure of dissimilarity for the cluster analysis of the tasters.
Tocher optimization method was used to study the similarity in the evaluation of the tasters, by forming groups of tasters more similar to each other in the evaluation of coffees. In this method, the set of tasters was divided, for each year of the contest, into non-empty and mutually exclusive groups. For that, it was adopted the criterion that the average of the dissimilarity measures, within each group, must be less than the average distances between any groups (Cruz;Ferreira;Pessoni, 2011). In the sensorial analysis, the characteristics related to the organoleptic patterns of the beverage were evaluated in To graphically visualize the differences in the sensory analysis of coffees between the groups of tasters, for each year studied, the sensory profiles of the groups were constructed, obtained by the Tocher optimization method. The notes of the sensory attributes were plotted on radar type diagrams with a single graphic scale, being constructed with the average of the notes of the sensory attributes for the groups of tasters. In the years with the use of the COE protocol, all eight attributes were used, but in the years in which the SCA protocol was used, the attributes clean cup, sweetness and uniformity were disregarded for the construction of the profiles. Sensory profiles were performed using Microsoft® Office Excel© software.
Statistical analyzes were performed in the Genes Software (Cruz, 2013). Table 2 shows the repeatability coefficients for the first three years of studies, using the COE protocol, which ranged from 0.2268 to 0.5981. In 2013, the coefficients ranged from 0.2268 to 0.5207, this year the lowest coefficient was observed for the years studied with the COE protocol, for the final score; the highest coefficient occurred for the clean cup attribute. In 2014, the coefficients varied from 0.4032, for body to 0.5981 for balance. In 2015 the coefficients ranged from 0.2781 for the sweetness attribute to 0.5546 for the balance attribute.

Attributes
Anova    Table 3 shows the repeatability coefficients for the years 2016 and 2018, using the SCA protocol. In 2016, the coefficients ranged from 0.1987, for the body attribute to 0.9459 for the sweetness and uniformity attributes, in 2018 the variation was between 0.2825 and 0.9371, for the final score and the attribute sweetness, respectively. The repeatability coefficient for the body attribute, in the year 2016, obtained through the use of the variance analysis estimation methodology, was the lowest among all the years studied. Table 4 shows the number of tasters needed for different determination coefficients, using the COE protocol.
In 2013, with 80% reliability, it took between four and fourteen tasters, depending on the attribute and the estimation methodology used (Table 4). In 2014, for 80% confidence, it took between three and six tasters for the different sensory attributes of coffee, in different estimation methods, for the same level of reliability. In 2015, this variation was between four and eleven tasters.
In order to estimate the final grade, in 2013, between thirteen and fourteen tasters were required. This was the only year studied in which the number of tasters needed to achieve 80% reliability exceeded the number of tasters used. This result demonstrated that the tasters used were less calibrated and disagreed in the score of the finalist coffees.
In 2014, for the 80% confidence level, four tasters were needed to determine the final grade. In 2015, considering this same requirement, between eight and nine tasters were required to determine the final grade of coffees.
Among the attributes of the beverage, the ones that required the least number of tasters for 2013 were clean cup, overall and body. In 2015, the clean cup and overall attributes are also among those that required a smaller number of tasters, which also occurred for the balance attribute. In 2014, as previously mentioned, the tasters presented greater uniformity in the evaluation, between three and six tasters were required in all attributes. Table 3: Estimates of repeatability coefficients and respective determination coefficients (in percentages in parentheses) of tasters for the sensory attributes of coffees evaluated in the contest of the years 2016 and 2018, using the SCA protocol.

Attributes
Anova    Table 5 shows the number of tasters required, considering different levels of confidence, for the years 2016 and 2018, using the SCA protocol. In 2016, depending on the sensory attribute and the estimation method, one to seventeen tasters were required, with a reliability of 80%. In the same condition, in 2018, the variation was smaller, with a requirement between one and eleven tasters.
In 2016, for the final score, six tasters were needed, for 80% confidence. In 2018, considering the same level of reliability, it took between ten and eleven tasters to determine the final score.
In the evaluation of the other attributes of the beverage, only one taster was required for the evaluation of uniformity, clean cup and sweetness, in the two years with the use of the SCA protocol. For the others, the largest requirement in the number of tasters occurred for body and overall, in 2016, and for acidity and body, in 2018.
Euclidean distances were used as a measure of dissimilarity and Tocher optimization method for cluster analysis, which provided the formation of groups among the studied tasters, in the five years of study, as shown in Table 6. In the first three years of study, using the COE protocol, most tasters evaluated coffees in a similar way, with four groups being formed in 2013 and 2015, in 2014 three groups were formed. In these three years, the largest number of tasters were gathered in the same group, the others being formed by only one taster in each group. Using the SCA protocol, two groups were formed in 2016 and four groups in 2018 (Table 6). In 2016, eighteen of the nineteen tasters evaluated the coffee in a similar way, staying in the same group, only the taster number 5 was different in the evaluation of the coffees. In 2018, the tasters also showed similarity in the assessment, eleven tasters were gathered in the same group, there were only three separate tasters in this group, each of them formed a different group.
The sensory profiles of the groups of tasters formed in the first three years of study, using the COE protocol, through the Tocher grouping, can be seen in Figure 1. These profiles allow to graphically visualize the main differences in the evaluation between the groups of tasters, for all the attributes of the beverage.
In 2013, group 1 presented a more balanced sensory profile among the scores of the evaluated attributes, being the result of the average evaluation of nine of the twelve studied tasters ( Figure 1A). Group 2, which was composed by the taster 3, presented a less balanced sensory profile, with bigger differences in the evaluation of the body attribute, in which it provided, on average, the lowest scores among all groups. In the evaluation of group 3, composed of the taster 4, the sensory profile stood out for the highest average score for the overall attribute. Group 4, which represents the taster 11, presented a profile with higher average scores for the attributes body, acidity and sweetness.
In 2014, group 1, formed by ten of twelve tasters, more convergent in the evaluation, and group 3, composed of taster 8, presented sensory profiles balanced between the attributes, with greater differences between them for the evaluation of clean cup and flavor, in which the taster in group 3 assigned lower grades ( Figure 1B). The sensory profile of group 2, composed of the taster 1, was less balanced when compared to the others, with the lowest scores for the evaluation of the attributes clean cup, sweetness, acidity, body and balance and prominence, with the highest score, for the aftertaste.  In 2015, group 1, formed by the 8 most converging tasters in the sensory evaluation of coffee, also presented a balanced profile, although there was an emphasis on the highest notes for the attributes sweetness and acidity and the highest score for flavor, in relation to the other groups ( Figure 1C). Groups 2 (taster 1), 3 (taster 10) and 4 (taster 11) had less balanced profiles. In the profile of group 2 there was the highest score in the evaluation of the clean cup and the lowest score for the body attribute, in comparison with the others. Group 3 evaluated the attributes clean cup, sweetness and acidity with the lowest average scores, however, it presented the highest score for the overall. The profile of group 4 stood out for presenting the highest average scores for the balance, aftertaste and body attributes. Figure 2 shows the sensory profiles of the assessments made by the groups of tasters in the years 2016 and 2018, formed using Tocher optimization grouping, using the SCA protocol. attribution of scores for the balance attribute, when compared with the other attributes ( Figure 2A). Group 2 assigned higher average scores for all attributes compared to group 1.
In 2018 the profiles of group 1, formed by eleven tasters more convergent in the sensory evaluation of coffee, and group 2, constituted by taster 1, presented the most balanced profiles between the notes of the attributes, the biggest differences in the evaluation of both were the highest scores in group 1 for the attributes aroma, acidity, body and flavor and the lowest scores for balance and overall. Group 3, formed only by the taster 2, presented the highest scores for all the attributes of the beverage, with emphasis on the aroma score. Group 4, composed of the taster 13, presented a sensory profile with some similarities with group 2, for the aftertaste, flavor, aroma and body scores, however, the highest scores for the overall and balance attributes differed these two groups.

DISCUSSION
The highest repeatability coefficients in this study were obtained using the SCA protocol, for the attributes sweetness, uniformity and clean cup (Table 3). This fact occurred due to the way these attributes are evaluated, differently from the other seven attributes (fragrance/aroma, flavor, acidity, body, balance, aftertaste and overall), which are subjectively evaluated, according to their quality, for these three the taster makes an objective assessment, granting 2 points per cup that is normal for these characteristics. Thus, there is less variation between the tasters in the scores of these attributes.
The lowest coefficients with the use of the SCA protocol were obtained for the body attribute, in the year 2016 and for the final score, in the year 2018 (Table 3). The lowest coefficients of repeatability with the COE protocol were obtained for the final score in 2013 and for sweetness in 2015 (Table 2).
In the year 2014, with the use of twelve tasters, all the repeatability coefficients reached values above 0.4, which may indicate that the tasters were more trained or calibrated in the evaluation of coffee, since the same number of tasters was employed in the previous year (2013), with different results.
Regarding the estimation methodology, the highest repeatability coefficients, for all attributes and the final grade, were obtained with the principal component methodology. Other authors obtained similar results when studying the repeatability for characteristics in other species (Bergo et al., 2013;Negreiros et al., 2014). The variance analysis methodology provided the lowest repeatability coefficients in all the years studied (Tables 2 and 3), corroborating with results obtained by Negreiros et al. (2014). Bergo et al. (2013) considered values above 0.4 for the repeatability coefficient to be reliable, however, in this study, In the year 2016, group 1 presented a balanced sensory profile, as well as the profile of group 2, however, there was in the evaluation of the taster 5, which represents group 2, greater coefficients of lower magnitudes were observed in four of the five years analyzed for some attributes of the coffee and even for its final score (Tables 2 and 3). Despite obtaining repeatability coefficients considered low, the determination coefficients obtained were higher than 80% for all coffee attributes in all years studied, except for the final score in 2013, which ranged from 77.87 to 79.95%, depending on the estimation method used.
In general, the determination coefficient ranged from 77.87 to 94.7% in the years studied. These results demonstrate that there is good reliability in the number of tasters used to express the real sensory quality of coffee in the studied contest, using both the COE and SCA protocol. In addition, for Negreiros et al. (2014), the definition of the ideal determination coefficient should privilege, in addition to the minimum reliability expected in the data, the availability of resources and labor for evaluations. As there are no other reference studies to assess the minimum degree of accuracy requirement for the number of tasters, 80% was considered to be a good level in this study.
The tasters were less uniform and, consequently, there was a need for a higher number of tasters for the attributes flavor and balance, in 2013 and, sweetness and flavor, in 2015. Flavor is a complex attribute, defined as a mixed experience of olfactory, taste and tactile sensations perceived during tasting (Carvalho et al., 2016;Teixeira, 2009). The complexity linked to the definition of this attribute may be the reason for the bigger variation in the opinion of the tasters in the attribution of grades.
Regarding the number of tasters, in general, it is possible to notice a large variation in the number needed between the sensory attributes of coffee, regardless of the methodology used (Tables 4 and 5). This variation may occur due to the greater or lesser capacity of the tasters in the evaluation of the different attributes, in addition to their level of experience. For Ferreira et al. (2018), who observed the existence of variability in the grades attributed by three different tasters in different attributes of the drink, before carrying out the analysis of the attributes for any technical purpose, it is indicated that the descriptive analysis of the data is performed to verify the need for elimination, of the scores of at least one taster in at least one attribute. According to the authors, there would be no need to always be considered the same taster for the different attributes of the drink.
There was also a wide variation in the number of tasters needed between the years studied (Tables 4 and 5). In 2014, 2015, 2016 and 2018, it would be possible to reduce the number of tasters in the evaluation of the contest's coffees, without loss of reliability, considering the level of 80%. In 2013, however, for the same level of reliability, between 13 and 14 tasters were needed in the final grade evaluation, which is higher than the original number, 12.
There was an increase in the maximum number of tasters required, that is, a higher number required among all attributes in the same year of study, with an increase in the number of tasters used, for the same level of reliability, of 80%. In 2016, when the largest number of tasters was studied, nineteen, seventeen tasters were needed, for 80% confidence. In 2013 and 2014, a maximum of fourteen and six tasters were required, respectively, studying twelve tasters. In 2015, with eleven tasters evaluated, the maximum number required for 80% confidence was eleven. In the last year evaluated, 2018, with fourteen tasters, eleven tasters were needed. This maximum number is also the highest among the estimation methodologies used, with variation between them.
Other studies have pointed out the need for different numbers of tasters from those found in this study. Pereira et al. (2018) indicated that the use of 6 tasters is efficient to perform the sensory analysis, for that purpose 10 tasters were used, which evaluated 20 samples of Arabica coffee, with a minimum score of 80 points, following the SCA and BSCA protocol for Arabica group coffees. According to these authors, the modeling applied in this study allows to conclude that it is possible to recommend the minimum number of tasters, for these conditions, however, they stated that the approach is limited to the study and thus, suggest that it is necessary to use 6 or more tasters in scientific studies and in routine tests for marketing purposes.
Without considering an ideal number of tasters for sensory analysis of coffee, other studies used a smaller number of tasters. Silveira et al. (2015) used three tasters to assess the sensory quality of coffee at different altitudes, faces of sun exposure and fruit color. Ribeiro et al. (2016) used four trained and qualified professional tasters, certified as judges of special coffees, to study the association between chemical descriptors of the bean with the sensory quality of the coffee drink, on expressions resulting from the interactions of the genotype, environment and coffee processing. Tolessa et al. (2016) used three professional tasters to analyze the interactive effects of altitude, shading levels and harvest periods on Ethiopia's coffee quality.
In the sensory analysis of coffee, according to Di Donfrancesco, Guzman and Chambers (2014), the use of expert tasters presents some problems, such as the influence of external factors and the change in an individual's perceptual skills through illness and other factors. For Pereira et al. (2018), the taster tends to prefer a sensory profile over others or even according to commercial and industrial standards, in order to meet the demand of certain customers. These factors may affect the evaluation of the tasters.
In addition, there are other aspects that may interfere with the performance of tasters in sensory analyzes, which were not evaluated in this study, such as, for example, the interaction between them during the analyzes. Pereira et al. (2017) observed distortions in the performance of coffee tasters when there is interaction between them, they concluded that the existence of conversations, comments and noise interfered in the sensorial analysis, decreasing the quality of the tasters' judgment on the levels of the attributes of the coffee beverage, probably due to their lack of concentration.
Considering that there are factors that can affect the performance of tasters in the sensory analysis, it is important to assess the consistency or similarity in the assessment of the tasters, not just the number of these professionals, for a more reliable assessment. Chambers, Bowers and Dayton (1981) emphasized that, in addition to the minimum number of tasters, it is necessary to study the consistency of who is carrying out the analysis. For these authors, a panel of three members, trained and experienced, presented a lower residual mean square than a semitrained panel, composed of eight members, for a study of sensory analysis of birds, which demonstrates, according to them, that the consistency between the tasters must be respected and observed. Ferreira et al. (2018) evaluated whether three trained tasters could constitute the minimum number of tasters to ensure the credibility of the sensory analysis of coffee. These authors concluded that, regardless of the number of tasters used, the reliability of the scores is related to the variability, and the lower the variability of the scores in the same situation studied for coffees, the greater the reliability on them. According to the authors, the reliability is directly related to the training and the technical capacity of the taster, less related to the number of tasters and more to the homogeneity of the scores attributed by each taster for the same evaluation conditions.
In this study, the convergence or similarity in the assessment observed for most of the studied tasters, in all years of study (Table 6), can be explained due to the fact that only skilled tasters were used. For Pereira et al. (2018) the use of professionals such as Q-graders and R-graders (professionals of tasting and classification of coffees that receive a world certification linked to the Coffee Quality Institute -CQI, for arabica and robusta coffee, respectively) is necessary, since these professionals are previously trained to carry out such activities, the authors also affirm that it is necessary to demand more veracity from the sensory analysis of coffee.
In the cluster analysis between the tasters, it was possible to notice that the majority of the tasters evaluated, in the five years of study, were grouped, which indicates that they evaluated the coffees in a similar way. However, our results also indicated that the number of tasters in the contest can not be drastically and randomly reduced, since the estimated minimum number varied over the years. Improving the calibration between tasters would probably improve the consistency of tasters over the years, allowing to establish a fixed small number of tasters for the contest. Calibration is usually achieved through independent training courses for specific cupping protocols or though the adoption of a calibration step of the tasters prior to the contest.

CONCLUSIONS
The tasters' performance in five annual editions of Minas Gerais Coffee Quality Contest is reliable using COE or SCA sensory analysis protocols.
Although not fully calibrated, most tasters are grouped with similar cupping results.
Unless efficient calibration prior to the contest is adopted, the number of tasters to be used in the next contest editions can not be drastically and randomly reduced, since the estimated minimum number varied over the years.
Calibration activities are suggested to improve two main aspects of the Minas Gerais Coffee Quality Contest: distinguishing the best coffees and trainning tasters.