What lies in between: Levallois, discoid and intermediate methods

: Lithic artefacts are usually associated with the different knapping methods used in their production. Flakes exhibit metric and technological features representative of the flaking method used to detach them. However, lithic production is a dynamic process in which discrete methods can be blurred


Introduction and background
The Middle Palaeolithic in western Europe is characterised by the increase in and diversification of prepared core knapping methods, resulting in flake-dominated assemblages possible to alternate the roles of the percussion and exploitation surfaces; (3) the peripheral convexity of the debitage surface is managed to control lateral and distal extractions, thus allowing for a certain degree of predetermination; (4) striking surfaces are oriented so that the core edge is perpendicular to the predetermined products; (5) the planes of extraction of the products are secant; (6) the technique employed is direct percussion with a hard hammer. Technological analyses of Middle Palaeolithic assemblages have gradually led to the identification of a variability of modalities within discoid core knapping (Bourguignon & Turq 2003;Locht 2003;Terradas 2003;Peresani 2003). This has resulted in sensu stricto and a sensu lato conceptualisations of the discoid knapping system (Faivre et al. 2017;Mourre 2003;Thiébaut 2013). The sensu stricto method closely corresponds to the definition established by Boëda (1993) described above, for which core edge flakes and pseudo-Levallois points are the most common products. The sensu lato discoid encompasses a greater range of products (although centripetal flakes are more common) as a result of higher variability in the organisation of percussion and exploitation surfaces (Terradas 2003).
One of the variants of the sensu lato discoid conceptualisation resembles the Levallois knapping strategy (Figure 1). Several common characteristics have been defined for this method: (1) the core volume is conceived as two hierarchical asymmetric surfaces: the percussion surface and the exploitation surface (this feature is shared with the Levallois). (2) Preparation of the percussion surface is absent or partial, without encompassing the complete periphery of the core. This may be because the characteristics of the raw material present an adequate morphology or because it requires minimal preparation. (3) Despite the hierarchical nature of the two surfaces, flakes detached from the debitage surface present a secant relationship towards the plane of intersection. Soriano and Villa (2017) point out that Levallois products usually present an external platform angle (EPA) of between 80º and 85º, while products from non-Levallois hierarchical methods present an EPA relationship of between 70º and 85º. However, this relationship can change over the course of the core's reduction with the final flakes being sub-parallel to the plane of fracture (Slimak 2003). (4) Products from the hierarchical discoid method tend to be symmetrical towards the knapping direction and thin, and the ventral and dorsal surfaces present a subparallel relationship. Again, these are common traits in Levallois products as well. Strategies evidenced at several sites fit into the discoid knapping variation described above, and its resemblance to the Levallois has been previously noted in several Middle and early Middle Palaeolithic assemblages (Casanova i Martí et al. 2009;2014;Jaubert 1993;Peresani 1998;Slimak 2003;Soriano & Villa 2017). However, it is important to consider that a variety of different terms have been employed due to the broad geographical and chronological span of the method (Figure 2). For Middle Palaeolithic sites, the identification of this method usually focuses on its shared features with the Levallois and discoid methods and thus, its intermediate nature. Jaubert (1993), at Mauran, noted the hierarchical nature of the production system and its resemblance to exhausted recurrent centripetal Levallois cores. However, he pointed out that the secant planes of detachment were not as parallel as in the Levallois. Slimak (1998;, at Champ Grand, also noted the similarities of residual cores with recurrent centripetal Levallois debitage. At Estret de Tragó, Casanova i Martí et al. (2009;2014) noted the presence of products and knapping methods which shared features of the Levallois and discoid methods, and proposed including the hierarchical discoid and Levallois recurrent centripetal strategies within a single hierarchical bifacial centripetal class. Peresani (1998), at Fumane cave, documented the presence of debitage products with features (reduced thickness, debitage angle, and centripetal organisation of scars also subparallel to the ventral surface) which would correspond to parallel-plane debitage. Baena et al. (2005) indicated the presence of the hierarchical discoid method throughout the Esquilleu cave sequence as a secondary and primary knapping method. As mentioned earlier, for Middle Palaeolithic sites, the identity of this knapping method is focused on its intermediate nature, as something between the discoid method and the recurrent centripetal Levallois. However, at early Middle Palaeolithic sites, the identification of this method usually focuses on its shared features with the Levallois, as a method that preceded the gradual emergence of the Levallois. For example, Carmignani et al. (2017) used the term "centripetal with parallel planes", noting the resemblance of some features to those of the Levallois at the sites of Payre and Bau de l'Aubesier. Soriano & Villa (2017), at Guado San Nicola, used the term "non-Levallois debitage with hierarchised surfaces", noting its similarity to the Levallois, but also its distinctiveness, due to the absence of preparation and management of convexities and the lack of platform preparation. For Cuesta de la Bajada, Santonja et al. (2014) differentiate between Levallois cores and cores of centripetal character with preferential surfaces (which are simply referred to as centripetal). De Lombera-Hermida et al. (2020) record the presence of "predetermined hierarchical centripetal" cores in different raw materials in TD10.1 at Atapuerca, and note the differences to and coexistence with classical discoid and Levallois knapping methods. The use of terms alluding to the hierarchical and centripetal nature of this method is not limited to European sites; the term "hierarchical bifacial centripetal" was used to refer to the Oldowan industries of Peninj (de la Torre et al. 2003).
The technological criteria for identifying production systems are clearly visible on cores. However, a high degree of uncertainty remains when examining flakes, as the technological criteria are less evident or absent (Hovers 2009: 20-22). Another underlying cause is the fluid and continuous nature of lithic knapping and reduction which can blur differences between apparently clear-cut methods. Additionally, variations can result from adaptations to raw materials (different knapping methods applied to different raw materials in the same archaeological level), the initial morphology of the raw material, the knappers' expertise and stage in the learning process or circumstantial events and restrictions. This adds up to an absence of discrete categories that are blurred by the underlying existence of a wide variety of lithic expressions. The attribution of flakes to a knapping method tends to be based on observed features compared with experimental collections, the context given from the knapping methods observed in the cores of the assemblage (although cores in an assemblage do not necessarily represent all knapping methods), and, importantly, the experience of the analyst. Bisson (2000), Hovers (2009: 20-22), and Perpère (1986; have illustrated that using a dual classification category of Levallois or non-Levallois results in 30% disagreement among researchers. However, it is important to note that Perpère's work (1989;1986) anticipates Boëda's volumetric and technological criteria (Boëda 1994: 254-258;Boëda 1995;Boëda et al. 1990), which have contributed to decreasing the degree of disagreement in the identification of the Levallois. Another example of classification discrepancy comes from the type counts conducted by Dibble and Tuffreau at the Biache Saint-Bass site, in which Levallois flakes represented 46.08% of the assemblage in one list and 27.29% in the other (Dibble 1995).
Thus, a strong argument can be made for the difficulty of attributing knapping methods to flakes. This difficulty increases due to the existence of intermediate methods such as the hierarchical discoid, whose technological characteristics fall between the sensu stricto discoid and the Levallois. This work uses products experimentally knapped using the Levallois, discoid, and hierarchical discoid methods and a machine learning approach to address this problem. Our results provide insight into the classification accuracy for each knapping method, variable importance, and the directions of confusion between methods.

Methods
An attribute analysis was performed on an experimental assemblage of flakes detached by means of the Levallois, discoid and hierarchical discoid reduction sequences. Supervised machine learning models were trained on the resulting dataset to identify the knapping method by which each flake was detached.

Experimental assemblage
The experimental assemblage consists of 222 experimentally knapped products resulting from Levallois, discoid and hierarchical discoid reduction sequences ( Figure 3 and Figure 4). It was knapped by two of the authors (JB and GBP), one of the whom (JB) is considered a highlevel expert, while the second (GBP) is considered an intermediate-high knapper. The Levallois (Boëda 1993;1994: 254-258;1995b;Boëda et al. 1990) products belonged to the preferential and recurrent centripetal modalities. The hierarchical discoid cores were knapped in accordance with previous technological descriptions, with no platform preparation, with centripetal scar direction (although some opportunistic preferential flakes were obtained), and without deliberate management of convexities. The discoid (Boëda 1993;Mourre 2003;Terradas 2003;Thiébaut 2013) products belonged to the sensu stricto conceptualisation in which both surfaces are bifacially knapped. With the Levallois and hierarchical discoid methods, only products from the debitage surface were included (flakes detached from the percussion surface were excluded from analysis). Because discoid knapping alternates the role of the percussion and debitage surfaces, all products were included. In all three cases, products from all stages of reduction were included (decortication, management of convexities, full production). To be sure of the surface to which each flake belonged and the completeness of each sequence, products were separated accordingly during the knapping sequence and, when that was not possible, cores and flakes were refitted. Only complete products were included, and all of the knapping sequences were carried out on different types of flint.
As previously noted (Eren et al. 2008), the discoid system was the most productive, generating the highest number of flakes with the lowest number of cores. The flake counts for the Levallois and hierarchical discoid methods were similar per core and lower than those from the discoid cores (Table 1). This was expected since only flakes from the production surface were included. It is important to note that this resulted in a very well-balanced dataset in which the sum of the products from each of the three classes was close to 33.33%. The distribution of the amount of cortex was similar across the knapping methods studied ( Figure 5). In all cases, the sum of non-cortical flakes and flakes with residual cortex were similarly distributed among the three knapping strategies and made up the majority of flakes (near 60% in all three categories). A visual exploratory analysis ( Figure 6) of the assemblage using a scatter plot with Bagolini's categories (Bagolini 1968) showed that most of the flakes fall into the "normal" and "big" size categories, with hierarchical discoid flakes being slightly larger than flakes produced by means of discoid or Levallois reduction sequences. Additionally, most of the flakes from all three methods fall into the categories of "elongated flakes", "normal flakes" and "wide flakes" in terms of their length to width ratios. (2023)

Attribute analysis
Attribute analyses were performed for each flake. The variables and indexes listed below were recorded and calculated. Measurements were recorded in mm. Some of the measurements recorded were used in the calculation of additional variables. For example, angle height and maximum flake length were used to calculate flake curvature; width measurements (at 25% and 75% length) were used to calculate the angle of the lateral margins; and thickness measurements were used to calculate the average and standard deviation of thickness. Other variables, such as technological length and width (or the above-mentioned width measurements) were also used to calculate ratios (the elongation and carination index), but their use would introduce a size dependence factor in the training of the models. Therefore, the variables employed in the machine learning model training are underlined. The variables considered were: Degree of fracture (Hiscock 2002): complete, proximal fragment, mesial fragment, distal fragment, marginal fragment and longitudinal fragment. Only complete flakes were further analysed.
Technological length: measured along the axis perpendicular to the striking platform (Andrefsky 2005: 99).
Technological width: measured along the axis perpendicular to the technological width (Andrefsky 2005: 99).
Width measured at 25% of the length of the flake (  Platform surface: calculated using previous geometric morphologies and measurements of x, y and z as in Muller & Clarkson (2016). Platform depth: which can correspond to the measurement of y or z depending on the geometric morphology of the platform (Figure 8).
Platform width: which can correspond to the measurement of x or y depending on the geometric morphology of the platform (Figure 8). (2023)   Profile of the platform: absence of platform, concave, straight, convex, biangular and chapeau de gendarme.

Journal of Lithic Studies
Preparation of the platform: no platform, cortical, plain, dihedral, a pans and prepared (facetted).
External platform angle (EPA) measured in degrees with a manual goniometer.
Internal platform angle (IPA) measured in degrees with a manual goniometer.
Relative amount of cortex present on the dorsal face: recorded according to its extension on the dorsal surface of the flake (Andrefsky 2005: 104): cortical (100% cortex), more than 50% of cortex (excluding cortical flakes), less than 50% of cortex (excluding the following categories), residual presence (<10% and excluding non-cortical flakes), and no cortex.
Cortex location: slightly modified from Marwick (2008) with categories being primary (dorsal surface completely covered by cortex), crescent (cortex located in one of the laterals usually with a crescent shaped form) distal (located at the opposite end of the percussion platform), central (inner part of the flake), and tertiary (no cortex).
Number of dorsal scars longer than 5 mm. Number of longitudinal ridges: individual ridges starting at the platform and reaching the distal part of the flake.
Termination type of the flake: feather, hinge, step or plunging in accordance with Cotterell & Kamminga (1987).
Simplified technological category of flakes: flake, backed flake (including core edge flake), and pseudo-Levallois point.
Weight in grams (precision of 0.01). Elongation index: the quotient of technological length divided by technological width (Laplace 1968).
Carination index: the quotient of length or width (the one with the lowest value) divided by the maximum thicknesses (Laplace 1968).
Angle of the lateral margins: describes the relationship between the two lateral edges of a flake. Flakes with expanding edges (wider in the distal part) will have negative values, contracting or pointed (wider in the proximal part than in the distal part) artefacts will have positive values, and rectangular artefacts (similar proximal and distal width) will have values near 0 (Clarkson 2007: 37). The angle of the lateral margins is calculated by multiplying by two the value of the reverse tangent of proximal width minus distal with, and dividing the result by two times the maximum length (Clarkson 2007: 38).
Flake curvature: measured in degrees and calculated as described in Andrefsky (2005: 109), as shown in Figure 7. Flakes presenting a 180º curvature are perfectly straight flakes, while decreasing values indicate an increase in curvature. (2023)  Surface area of the flake: the product of flake technological width multiplied by its length (Dibble 1989).

Journal of Lithic Studies
Ratio of surface area against thickness: surface area of the flake divided by average thickness.
Ratio of surface area against platform surface: the quotient of the surface area divided by platform area.

Machine learning models
The following ten machine learning models were tested for their ability to differentiate between flakes extracted from the three knapping methods: Linear discriminant analysis (LDA): reduces dimensionality for the purpose of maximising the separation between classes while decision boundaries divide the predictor range into regions (Fisher 1936;James et al. 2013: 142).
K-nearest neighbour (KNN): classifies cases by assigning the class of similar known cases. The k in KNN refers to the number of cases (neighbours) to consider when assigning a class, which must be found by testing different values. Given that KNN uses distance metrics to compute nearest neighbours and that each variable is in different scales, the data must be scaled and centred prior to fitting the model (Cover & Hart 1967;Lantz 2019: 67).
Random forest: made of decision trees. Each tree is grown from a random sample of the data and variables, allowing for each tree to grow differently and to better reflect the complexity of the data (Breiman 2001).
Generalized boosted model (Greenwell et al. 2019;Ridgeway 2007): implements the gradient boosted machine (Friedman 2001; making it possible to detect learning deficiencies and increase model accuracy for a set of random forests. Supported vector machines (SVM): fit hyperplanes into a multidimensional space with the objective of creating homogeneous partitions (Cortes & Vapnik 1995;Frey & Slate 1991). The present study tests SVM with linear, radial and polynomial kernels.
Artificial neural network (ANN): with multi-layer perception, it uses a series of hidden layers and error backpropagation for model training (Rumelhart et al. 1986).

Machine learning evaluation
All models were evaluated by means of a 10x50 k-fold cross validation (10 folds and 50 cycles). Accuracy and area under the curve (AUC) were used as measurements of overall performance for the selection of the best model. Accuracy measures the model's correct predictions against the overall cases provided to the model and can range from 0 to 1. The accuracy of a random classifier is 1/k (k being the number of classes). For the present study, the random classifier had an accuracy of 0.33.
The AUC is a measure of performance derived from the receiver operating characteristic (ROC) curve. The ROC curve is used to evaluate the ratio of detected true positives while avoiding false positives (Bradley 1997;Spackman 1989). The ROC curve makes it possible to visually analyse model performance and calculate the AUC, which ranges from 1 (perfect classifier) to 0.5 (random classifier). The ROC and AUC are commonly applied in two-class problems and their extension to multiclass problems is usually done through pairwise analysis.
In the case of the AUC, two groups of values are obtained: first, each class obtains an AUC value using a one vs all approach; second, a general AUC value of model performance is obtained from the average of each class AUC (Hand & Till 2001;Robin et al. 2011). In the case of the ROC curve, individual curves of each class are plotted using the previously mentioned one vs all approach. The present study tested 10 different models with a three-class classification problem which would involve a total of 30 different ROC curves (10 per class in each model). In this paper, we have provided only the three ROC curves of the best model. The AUC ranges of values are usually interpreted as follows: 1 to 0.9: outstanding; 0.9 to 0.8: excellent or good; 0.8 to 0.7: acceptable or fair; 0.7 to 0.6: poor; and 0.6 to 0.5: no discrimination (Lantz 2019: 213). When analysing lithic materials, the use of thresholds to guarantee true positives and avoid false positives is of special interest. The use of decision thresholds and derived measures of accuracy (ROC curve and AUC) can be especially useful in lithic analysis since it is expected that products from initial reduction stages are morphologically similar, independent of the knapping method. These products are expected to show a greater mixture between methods and have lower probability values. The use of thresholds better indicates the accuracy of a model considering these probability values.
This study was conducted using an R version 4.  (Venables & Ripley 2002). The random forest was trained using the ranger v.0.14.1 package (Wright & Ziegler 2017). The SVM was trained using the e1071 v.1.7.12 package (Karatzoglou et al. 2006;2004). The RSNNS v.0.4.14 (Stuttgart Neural Network Simulator) package (Bergmeir & Benítez 2012) was used to train the multi-layer ANN with backpropagation. The k-fold cross validation of all models, precision metrics, and confusion matrix were obtained using the caret v.6.0.93 package (Kuhn 2008). Machine learning models also provide insights into variable importance for classification. The caret package was used to extract variable importance after each k-fold cross validation (see supplementary information). ROC curves and AUC values are obtained using the package pROC v.1.18.0 (Robin et al., 2011).
All data, results, code, AI models and the complete workflow are available as supplementary materials and freely accessible as a Zenodo repository (https://doi.org/10.5281/zenodo.7486776)

Data results
All of the models presented a general accuracy of above 0.6 and far above the random classifier (0.33). The overall model performance ( Table 2) indicated that SVM with polynomial kernel provided the best classification metrics for both accuracy and AUC. SVM with polynomial kernel had the best overall accuracy value (0.667) and the best AUC (0.824), an excellent or good value. The SVM with polynomial kernel accuracy value was closely followed by the random forest (0.666) and logistic regression (0.662). The gradient boosting machine model and ANN presented the lowest accuracy values (0.613 and 0.614, respectively). The random forest also presented the second highest AUC value (0.816) followed by SVM with radial kernel (0.815), both values were excellent or good. Again, ANN and the generalised boosted model presented the lowest AUC values (0.772 and 0.795 respectively), although both values can be considered acceptable or fair. Classification metrics per class (Table 3), confusion matrix and ROC along with AUCs of SVM with polynomial kernel exhibited marked differences between knapping methods (Figure 9). In all cases, classification metric values were higher than the no-information ratio (prevalence) of each class (0.329 for Levallois, 0.347 for discoid and 0.324 for hierarchical discoid). In general, SVM with polynomial kernel best identified products obtained from Levallois preferential and recurrent centripetal reduction sequences. This resulted in a notably high value of sensitivity (0.783) along with a high value of general performance (F1 value of 0.797), and an outstanding AUC value (0.91). While it was slightly more common to erroneously identify Levallois products as hierarchical discoid products (12.44%) than to erroneously identify them as discoid products (9.29%), both values were notably low.   Products from discoid bifacial reduction sequences were the second best identified by the SVM with polynomial kernel. While a notable confusion occurred between discoid and hierarchical discoid products, the F1 general performance value (0.647) was still quite high, and the AUC value was found to be excellent or good (0.82). The confusion matrix (Figure 7) Journal of Lithic Studies (2023)  shows that the erroneous identification of products from discoid reduction sequences is due to confusion with products resulting from the hierarchical discoid method (29.9%), while their misclassification as Levallois is residual (4.49%). Hierarchical discoid products yielded the lowest values of classification metrics. The general performance metric F1 had a value of 0.56 and relatively low values of sensitivity (0.562) and precision (0.557). Despite these relatively low values, the AUC for the hierarchical discoid products was still acceptable or fair (0.73), which indicates that the application of decision thresholds can support and improve the identification of products from hierarchical discoid reduction sequences. These low classification metrics were clearly related to the SVM with polynomial kernel mistaking hierarchical discoid products as discoid products, along with a residual although notable confusion with Levallois products (13.5%).
The importance of the discriminating variables differed for each knapping method. Figure  10 presents the discriminating variables whose importance values exceeded 50 for each knapping method. For Levallois products, the most important discriminating variables related to ratios of flake size to thickness (the carination index was the most important, and the flake surface to thickness ratio was also considered highly important), thickness (mean and maximum thickness), and platform angle (with IPA having a slightly higher value than EPA). Qualitative attributes such as type of platform were also considered important for the discrimination of Levallois products. Plain and prepared platforms had importance values of 72.19 and 71.19, respectively.
The selection of variables for the identification of discoid products was similar to those for Levallois, although with differences in the order of importance. As with the Levallois, the carination index was considered the most important variable, while both measures of platform angle (IPA and EPA) were considered the second and third most important variables. Similar to the Levallois products, measures of thickness and platform type were also considered important in the identification of discoid products.
Discriminating hierarchical discoid products proved to be more complex, with none of the variables reaching the maximum value. The SVM with polynomial kernel model considered thickness measurements (maximum and mean thickness) along with platform angle measurements (EPA and IPA) as the most important variables for the discrimination of hierarchical discoid products. The carination index was considered a much less important measurement (value of 77.18) compared to its importance for the identification of Levallois and discoid products.
An additional variable which ranked slightly above the importance value threshold is the standard deviation of thickness. Standard deviation of thickness presented importance values of 57.25 for Levallois and discoid products, and 55.01 for hierarchical discoid products. Thus, although it is not a key variable for discrimination, differences can be observed in the thickness variations between products of the different knapping methods. For both Levallois and discoid products, the IPA was a slightly better discriminatory variable than EPA. For hierarchical discoid products, the EPA had a slightly higher (81.31) discriminatory value than IPA (81.17). This indicates that IPA tends to be a better discriminating variable for the identification of knapping methods. The analysis of class probabilities based on the reduction sequence and amount of cortex revealed a series of tendencies. Products detached by means of Levallois reduction sequences had the highest mean probability (0.657) of belonging to their true class, independent of cortex amount (Figure 11). This high average can be clearly related to the F1 and AUC measurements. Products detached using Levallois reduction sequences showed a clear tendency: as the cortex amount decreased, the probability of correctly being identified as Levallois increased. This resulted in increased average values of correctly being identified as Levallois: products with 100% cortical flakes had an average of 0.558, > 50% cortex an average of 0.572, < 50% cortex an average of 0.717 and flakes with residual cortex an average of 0.726. A decrease in the average value (0.686) was observed in the case of flakes with no cortex. This decrease in the mean value can be attributed to the fact that this was the most abundant category and included a long tail in the distribution of values as a result of flakes with low probability values of belonging to Levallois reduction sequences. Nevertheless, most of the Levallois reduction sequence flakes were concentrated above a 0.75 probability value ( Figure 11). Products detached from discoid reduction sequences had the second highest average of belonging to their class without considering cortex amount (0.508). As with Levallois products, as the cortex amount decreased, the probability of correctly being attributed to discoid reduction sequences increased. However, this increase was not as marked or clear as for Levallois products. The 100% cortical flakes had an average of 0.456, > 50% cortical flakes an average of 0.45, < 50% cortical an average of 0.512, residual cortical flakes an average of 0.486, and non-cortical flakes an average of 0.538. Again, the distribution of non-cortical flakes presented a long tail of low values, although most of the flakes were concentrated above the 0.5 value (Figure 11).
No clear pattern in the probability distribution of values was observed for the hierarchical discoid flakes. Flakes produced by means of the hierarchical discoid method presented the lowest average probability value of correctly being assigned to their true class (0.456) as expected from the corresponding F1 and AUC values. Flakes with less than 50% cortex and residual cortical flakes had the highest probability averages of correctly being identified as hierarchical discoid flakes (0.554 with less than 50% cortex, and 0.524 with only a residual presence of cortex). However, their low number should be taken into consideration. Noncortical flakes exhibited the lowest average of probability value (0.432) with a wide range of distribution of probability values, which indicates the difficulty of differentiating even advanced reduction products from hierarchical discoid knapping sequences.

Discussion
An analysis of machine learning model performance metrics (accuracy and AUC) found that SVM with polynomial kernel is the best model for differentiating between the three knapping strategies studied. SVM with polynomial kernel provided a general excellent or good AUC value (0.824) and notable accuracy 0.667. However, depending on the knapping strategy, notable differences were found in per-class classification metrics and AUCs. Our results also provide insight into variable importance, showing that measures of flake surface in relation to thickness (carination index and ratio of flake surface to thickness), angle of detachment (IPA and EPA), and thickness (mean and maximum thickness) are key considerations in the discrimination of knapping methods. In general, we found that corticality affects the ability of the model to determine the knapping method by which the flakes were detached. We detected a general trend: as the amount of cortex diminishes, it becomes easier for the model to identify the knapping method by which the flake was detached. However, again, this trend exhibited marked differences between knapping methods. It was very clear in the case of products detached from Levallois reduction sequences but not so evident for products detached by means of hierarchical discoid reduction sequences.
Flakes detached from Levallois reduction sequences yielded the best identification values, with an individual outstanding AUC (0.91) and high general performance metrics, such as the F1 score (0.797). These results underscore the singularity of products obtained from Levallois preferential and recurrent centripetal reduction sequences and the unlikelihood of confusing them with flakes detached by means of either discoid or hierarchical discoid reduction sequences.
Interpreting the relationship between the discoid and hierarchical discoid methods is more complex. The individual AUC for the identification of discoid flakes offered an excellent or good value (0.82) along with a relatively good F1 score (0.647). These values were not the result of over-predicting the discoid class (35.65% of the assemblage was predicted to be discoid against an actual 34.7%), but rather due to very limited confusion with Levallois products (only 4.49% of cases). As previously mentioned, although the identification of flakes detached by means of hierarchical discoid reduction sequences presented a relatively low F1 score (0.56), it did have an acceptable or fair AUC (0.73). These results indicate the ability of the model to differentiate between the products of the two strategies, reinforcing the notion that they are two different individual technical systems. However, the evaluation of directions of confusion showed that the SVM with polynomial kernel commonly misidentified products from both knapping methods. This can be interpreted as evidence of a very close relationship between the two technical systems.
The relationships between the Levallois, discoid and hierarchical discoid knapping methods have been previously noted by different authors. Lenoir and Turq (1995) proposed the use of the term "recurrent centripetal" to include both discoid debitage and recurrent centripetal Levallois, and that Levallois (sensu stricto) should be limited to preferential and recurrent lineal modalities. Similarly, Casanova i Martí et al. (2014) proposed including Levallois recurrent centripetal and hierarchical discoid modalities within the "bifacial centripetal hierarchical" category. In the present study, the combined use of attribute analysis and machine learning models effectively separated flakes obtained from Levallois recurrent centripetal and preferential sequences from those obtained from hierarchical discoid sequences. A limited percentage of hierarchical discoid products were identified as coming from Levallois recurrent and preferential reduction sequences (13.5%) and the confusion in the opposite direction was less common (12.44). Thus, these results do not support grouping Levallois recurrent centripetal and hierarchical discoid into the same category, as the products from these two approaches seem to be easily differentiated. Mourre (2003) situated the hierarchical discoid knapping method at the margins of the discoid group, given the specific characteristics of the two techniques. At the same time, Levallois recurrent centripetal modality (Mourre 2003) was placed within the Levallois group, but also overlapping with the discoid and hierarchical discoid knapping methods. The results of this study offer a point of agreement and a point of disagreement with this interpretation of the relationships between the three knapping methods. As point of disagreement, the very limited misidentification of discoid and Levallois products does not support the notion of an overlap between these two methods. This high level of separation between Levallois recurrent centripetal and discoid products is also supported by previous studies that used attribute analysis and machine learning models (González-Molina et al. 2020). As point of agreement, the notable confusion in the identification of discoid and hierarchical discoid products supports the proposal that the two methods belong to centripetal debitage systems, although our results indicate that a notable level of separation between products from the two systems can be achieved. Archer et al. (2021) used 3D geometric morphometrics for the identification of three distinct knapping methods (laminar, discoid and Levallois) and indicated a general success rate of between 73% and 77%, with Levallois flakes exhibiting the highest identification rate (87%) and discoid flakes the lowest (40%). The overall higher precision reached by Archer et al. (2021) was also expected in this study, as we included two classes which are commonly considered to belong to the same family and difficult to differentiate (discoid and hierarchical discoid) instead of three discrete knapping methods. However, their results for the identification of discoid products contrast with the results of the present study, which yielded notably higher values of sensitivity (0.656) and F1 (0.647) for discoid products. Again, it is important to consider that these moderate values resulted from the inclusion of a closely related strategy (hierarchical discoid) and that confusion with Levallois flakes was minimal. González-Molina et al. (2020) presented a balanced dataset of discoid and Levallois recurrent centripetal flakes and tested a series of machine learning algorithms for classification, and found that random forest provided the best results. González-Molina et al. (2020)  researchers have called attention to the subjective nature of using the classification "Levallois flake" or the inclusion of the category of "Atypical Levallois flake" (Debénath & Dibble 1994: 46-49;Hovers 2009: 61-64). González-Molina et al. (2020) additionally provided a random forest model without the technological classification feature which had a precision rate of 80%. In this model, the first five variables were related to flake width. However, aspects of flake width and length can be influenced by initial core and nodule size or by the process of retouch (Shott & Seeman, 2017), making them a less desirable variable for the identification of knapping methods. SVM with polynomial kernel selected variables for knapping method classification already noted in previous studies (Bourguignon 1997: 103-116;Carmignani et al. 2017;González-Molina et al. 2020;Mourre 2003;Slimak 2003). The carination index (Laplace 1968) was considered the most important variable for the discrimination of discoid and Levallois products, and also considered notably important for the identification of flakes detached by means of hierarchical discoid reduction sequences. Although thickness measurements are metric variables, they are less subject to variations due to initial nodule and core size or retouch. Previous researchers have highlighted the importance of blank thickness for the differentiation of knapping methods and have shown that Levallois blanks tend to be thinner and more regular than flakes from discoid production systems (Boëda 1994: 254-258;1995a;Bourguignon 1997: 103-116;González-Molina et al. 2020;Lenoir & Turq 1995;Mourre 2003;Peresani 2003;Slimak 2003;Terradas 2003).
Internal platform angle (IPA) is considered a slightly more important feature than the external platform angle (EPA) for the determination of knapping methods. Previous studies have called attention to the importance of angular relations between the platform and dorsal or ventral surfaces for the discrimination of knapping methods (Boëda 1993;1995;Kelly 1954;Lenoir & Turq, 1995;Slimak 2003;Soriano & Villa 2017;Terradas 2003). In the experimental sample used in this study, Levallois products had IPAs with lower values than those of discoid or hierarchical discoid products, despite the inclusion of cortical elements. Problems with the manual recording of IPA and EPA related to the general morphology of flakes have been acknowledged by other authors (Davis & Shea 1998;Dibble & Pelcin 1995); however, our research emphasises its importance for the determination of knapping methods.
This study highlights the type of platform (prepared and plain platforms) as a key variable for the differentiation of knapping methods. Previous studies have also pinpointed the importance of platform preparation for the differentiation of knapping strategies (Lenoir & Turq 1995). Although this seems to be a significant feature, it is important to consider that platforms vary widely in the archaeological record. For example, Levallois products commonly exhibit plain platforms (Bordes 1961a;Bordes 1961b: 26). Additionally, platform preparation can be related to the knapping style and preferences of the individual knapper, with different knappers producing very similar products with different platform preparation (Driscoll & García-Rojas 2014). Mean thickness of the blank was considered as a key feature by the random forest algorithm, as did the other algorithms tested, along with maximum thickness for the differentiation of knapping methods.
The present study did not include other flaking strategies common in western Europe during the Middle and early Middle Palaeolithic, such as Quina and SSDA (Bourguignon 1996;Forestier 1993). Although in western Europe the coexistence of Levallois and discoid knapping methods with other knapping methods in the same archaeological levels is a subject of debate (Faivre et al. 2017;Grimaldi & Santaniello 2014;Marciani et al. 2020;Ríos-Garaizar 2017), the present research can be considered for assemblages where Levallois and discoid and hierarchical discoid knapping strategies coexist. Thus, the examination and evaluation of the chaîne opératoire along with the assemblage context and integrity are fundamental for the study of lithic technology (Soressi & Geneste, 2011 should be considered prior to the application of machine learning models for the identification of knapping methods.

Conclusions
The present study used attribute analysis to identify three knapping methods (Levallois, discoid, and hierarchical discoid) in flakes produced with those methods. A secondary objective was to evaluate the degree of confusion between Levallois recurrent centripetal and hierarchical discoid flakes. This secondary objective is relevant since several sites report similar reduction strategies under an umbrella of terms (recurrent centripetal debitage, non-Levallois debitage with hierarchised surfaces, hierarchical bifacial centripetal, centripetal character with preferential surfaces, hierarchical centripetal), which are here grouped under the term hierarchical discoid.
Results from the best machine learning model (SVM with polynomial kernel) indicate that flakes from Levallois recurrent centripetal reduction sequences are highly distinguishable from those of hierarchical discoid reduction sequences. This underscores the singularity of the Levallois reduction strategy, even in its recurrent centripetal modality. Despite some confusion in the identification of flakes detached from discoid and hierarchical discoid reduction sequences, a notable degree of separation can be achieved. This can be considered an indication that the two strategies are separate reduction strategies with their own singularities, while also falling under the umbrella of centripetal strategies.
This study also employed ROC curves and AUCs to identify knapping strategies among flakes. The use of AUCs as a metric of model performance should be taken into consideration in lithic studies that employ machine learning models. This is because they make use of decision thresholds which will sort between ambiguous flakes (such as the ones belonging to initial reduction sequences) and diagnostic flakes (such as the ones from full production). Finally, it is important to highlight that these results illustrate the applicability of machine learning models (along with the present dataset) to contexts of the archaeological record in which these reduction strategies are present. (2023)

List of supplementary files
Supplementary file 1 "Bustos Pérez et al Data.zip" Zip file containing: a .csv file of all data employed in the analysis; an .RData file containing all wrangled data employed for the analysis; and an .RData file containing all models and results from validation generated during the development of the present work.
Supplementary file 2 "Bustos-Pérez et al -What lies in between.pdf" A markdown pdf file containing all code used in the analysis in the present study. Along with the abovementioned files, it allows the complete workflow of this research to be reproduced.