Supervised recommendations of gas metal arc welding parameters

In gas metal arc welding, a weld quality and performance depends on many parameters. Selecting the right ones can be complex, even for an expert. One generally proceeds through trial and error to find a good set of parameters. Therefore, the current experts’ method is not optimized and can require a lot of time and materials. We propose us-ing supervised learning techniques to help experts in their decision-making. To that extent, a two-part recommendation system is proposed. The first step is dedicated to identify, through classification, the number of weld passes. The second one suggests the seven remaining parameter values for each pass: layer, amperage, voltage, wire feed rate, frequency offset, trimming and welding speed. After extracting data from historical Welding Procedure Specification forms, we tested 11 different supervised learning algorithms. The recommendation system is able to provide good results for all the different settings mentioned above even if the data is noisy due to the heuristic nature of the experts’ process. The best classification model is CatBoost with 82.22% average F1-Weighted-Score and the best regression models are Extra Trees or a boosting algorithm with a reduced mean absolute percentage error compared to our baseline.


Introduction
Nowadays, gas metal arc welding (GMAW) is one of the most widely used industrial welding processes. This is mainly due to its simplicity and versatility in that it allows junior welders to be efficient as well as robotic automation, as long as the welding procedure has been certified. Welding procedure certification is performed by welding experts given the materials and physical configuration. For new materials and joints, this is a lengthy and expensive process performed mostly by trial and error.
There exists a known relationship between welding settings and the weld bead shape or quality which is now widely used in industry [1]. Based on this relationship, the welding parameters can be identified with regression models [2] and neural networks [3] or optimized for an expected weld bead shape [4]. The real problem, however, is slightly different since, even if the joint type is known, nothing precisely indicates the real final bead shape. Furthermore, GMAW is generally performed through multiple passes, complexifying the recommendation as we need to recommend both the number of passes and the parameters for each pass. That is, this relationship cannot be used to define all the necessary parameters to generate the Welding Procedure Specification (WPS).
Other works, like the quality management system developed by Zhou and al. [5], directly incorporate a WPS generation. However, their method is limited to a rule-based system. Although a closest neighbor search is performed among existing WPS, a set of defined rules is used if no WPS meets the required proximity. These instructions are either used to correct the closest WPS or to build one from scratch.
We propose a new method using supervised learning to recommend (1) the number of passes to complete the weld and (2) the values for the seven parameters (layer, amperage, voltage, wire feed rate, welding speed, frequency offset and trimming) for each pass. We created a dataset by extracting data from 630 WPS filled by experts, then we compared 11 classification algorithms to train models to identify the number of passes, and 11 regression algorithms to train models to recommend the parameters for each pass. We provide a comprehensive comparison of the performances of the models for each parameter and show that our approach provides recommendations of sufficient quality to help experts create the WPS for new welding cases.
The remainder of the paper is structured as follows. We first introduce concepts related to GMAW and to supervised machine learning. Then, we present our solution along with the experimental process. Finally, we discuss the results and we conclude.

Preliminary concepts
Understanding the welding process in industry is the first step towards building a recommendation system to help welders. We describe, in this section, the usual welding process and the related terms. Then, we describe the supervised learning algorithms we evaluated for our recommendation system.

Background on gas metal arc welding and welding procedure certification
Gas metal arc welding (GMAW) is a process where an electrical arc is generated between the electrode and the metals to create fusion and welds. A shielding gas is injected to prevent any contamination from the atmosphere [6,7]. Although it is a simple procedure, there are lots of parameters to consider. Among all of them, the number of passes and layers needed to complete the weld are crucial. Indeed, some weld joints require a large amount of filler metal and must be divided into different steps called passes. When a new pass is made above another one, it is considered to be part of another layer. Each new pass depends on the previous ones and parameters must be found for each of them separately because they might differ. In this case, the settings to be found are the layer, the amperage, the voltage, the wire feed rate, the welding speed, the frequency offset and the trimming.
In practice, the only way for an expert to find all parameters is from good judgment and experiments. Experts write down each one of their trials into a procedure qualification record (PQR). Once satisfactory and meeting the quality criteria, the PQR can be used to generate a welding procedure specification (WPS) which summarize how to replicate the expert's weld [1]. Therefore, information such as materials' references, physical treatments, temperature bounds or the width of the weld are written on it.
This process of trial and error depends so much on the expert's experience and preferences that two of them would not have the same parameters for a given weld. Yet, they generally start by fixing parameters limited by the machines at their disposal (e.g., the amperage, the voltage or the wire feed rate) and end up modifying the welding speed. Furthermore, as they want welders to be efficient, they also try to keep the same machines' parameters from one pass to the next. That is why a pass might not be done with the optimal parameters as long the quality is up to standards.

Supervised learning
Supervised learning is a type of machine learning where a model is trained on labeled training data to predict the labels of new untagged examples. Therefore, training examples can be seen as pairs of input and output (label); the learned model can be seen as a function from the input space to the output space. In the following subsections, we describe the supervised learning algorithms we considered to tackle this problem. We regroup the learning algorithms in four categories, namely, parametric models, instance-based algorithms, tree-based algorithms, and neural networks.

Instance-based algorithm: k-Nearest-Neighbors
The k-Nearest-Neighbors (KNN) [8] algorithm is based on the principle that similar data should be close to each other. The predicted output label depends either on the majority class of the k closest points (for classification) or their mean value (for regression).

Parametric models
Instead of making predictions based on the similarity of new input examples to the stored training ones, parametric algorithms learn a model.
The linear regression is a prediction model that consists of fitting a line (or a hyperplane) that minimizes the error between the model's predictions and the actual values.
Similar to linear regression, the Logistic Regression algorithm uses a sigmoid function to predict the probability of data belonging to each class. By applying a threshold to these probabilities, it is possible to build a classification model.
The Support Vector Machine (SVM) and Support Vector Classifier (SVC) [9] use support vectors, which are the closest data points of different classes in a multi-dimensional space, and seeks to find the dividing line that maximizes their deviation.

Tree-based algorithms
Some algorithms use Decision trees [10] to make predictions. They have a flowchart-like structure made of nodes and branches with each node being a test applied to the data. Depending on the results, branches lead to other nodes until a maximal depth to which a final result is decided.
Random Forests [11] and Extra Trees [12] are ensemble algorithms based on decision trees. They build many different trees and select the most common result among them. Their main differences are that Random Forest uses a subsample of features to build each tree and compute the optimal split threshold per feature whereas Extra Trees uses the whole sample and chose a random threshold. Some ensemble learning algorithms are able to build strong learners from weak ones and are called boosting algorithms. Weak learners are models that do not perform well overall but still have some qualities. Each one brings something to the ensemble model which creates a strong learner. XGBoost [13], LightBoost [14] and CatBoost [15] are different popular boosting algorithms using a different definition of weakness.

Neural Networks: multi-layer perceptron
Originally inspired by the human brain, neural networks are composed of neurons, divided into layers, communicating to one another. Except for the first and the last layer which respectively depends on inputs and outputs, the number of neurons must be optimized and some literature is dedicated to this task [16]. Each node is composed of an activation function with bias and weights modified during the training phase through a gradient descent process.

Welding parameters recommendation system
As described in the previous sections, many different parameters are required to weld effectively in multi-pass welding. In the following subsections, we begin by describing the dataset created and the modifications done on the features. Then, we explore our two-part solution to provide welders with the most accurate parameters.

Dataset
The data used on this project has been extracted from WPS elaborated by experts and used in the industry with optical character recognition (OCR) techniques [17]. The key features are about the materials, the welding techniques and the references of the machines. Regarding the materials, we mostly know their references, the temperatures allowed, the treatments they can handle, the thickness of the weld and to which geometrical category they belong. Regarding the welding techniques, we have indications on the welding positions (and sometimes on the angle), the gas used, the contact type and the type of support (if needed).
After being extracted, the data has been post-processed to correct potential OCR errors and to standardize categorical values. Indeed, as the WPS were done by different experts from 2011 to 2021, their shape and notations vary. In the end, we created a database of around 630 WPS for a total of 3000 unique welds (some WPS contain more than one weld and/or different metals options).
On top of that, an important work has been done on the features to optimize the results. Thus, we represented the materials, the wires and the gas by their chemical composition to allow the models to understand which ones were similar. Some features providing fuzzy information, like the backing and the penetration, have also been simplified as binary ones and most of the other categorical features, like the geometry and the positions, have been one-hot encoded. Another important addition to the dataset was to modify the quantitative features according to the total number of passes. For instance, the width of the weld and the pass number are respectively transformed into the width of a pass and a percentage of the total number of passes. Therefore, the models have more precise information on each observation. Also notice that units have been unified in the metric system as some data contained only imperial units or both.
Finally, an exploratory data analysis has been performed on the datasets and outliers were removed. The distribution of the number of passes, shown in Figure 1, indicates that the data is heavily imbalanced. Indeed, 94% of the welds are done with fewer than 5 passes and 57% with only one.

Predicting welding parameters
We divided the prediction problem in two, each part having its own dataset. A first part is dedicated to identify the number of passes (classification) while the second one is for the remaining parameters (regression) for each pass of the weld. Figure 2 presents a diagram of the information flow from raw data to the recommendation system. The features we use for learning are summarized in Table 1.
The first step towards building a recommendation system is to focus on the number of passes. Both classification and regression have been considered for this sub-problem, but classification showed more potential during our preliminary experiments. Therefore, a first classification model is dedicated to advise the exact number of passes knowing only the characteristics of the materials and equipment. Only the welds with fewer than five passes were kept for this dataset due to the unbalanced distribution-welds of more than five passes are infrequent. The number of passes is crucial as it influences a lot the determination of the  other parameters. Indeed, whether a weld is done with one or two passes, the amount of material deposited during the weld could remain the same. Moreover, the system allows experts to change the pass recommendation according to their preferences. In fact, the perfect weld parameters are a balance between quality and simplicity. Complex ones have a higher skill requirement from welders, which results in inconsistent quality. Therefore, the simplest weld parameters are always favored for faster and easier reproduction. This flexibility in the recommendation system makes it easy for experts to adapt the recommendation to any case.
As for the second step, regression models are used to find the layer, the amperage, the voltage, the wire feed rate, the welding speed, the offset and the trimming for each given pass. We trained one model per parameter.

Experiments
To study the impact and relevance of such a recommendation tool, we designed an experiment to compare the results obtained with various learning algorithms and assess the quality of the predictions. In this section, we first describe all the aspects taken into account to train the models and then we define the metrics used to evaluate them.

Models training
For both the classification and the regression sub-problems, we trained 11 models on 80% of randomly chosen WPS and tested on the remaining 20%. This process has been repeated 15 times for each model to get an average. Although a WPS can contain more than one weld and that each weld corresponds to a training example, we regroup welds belonging to a given WPS when partitioning the data so that they are always in the same group. This way, we avoid any data leakage from the training set to the test set. Moreover, models' hyper-parameters, including the architecture of our neural networks, were automatically tuned, using cross-validation on the training data for each fold, with a Bayesian optimization algorithm named Tree-structured Parzen Estimator (TPE) [18]. Bayesian optimization uses a probabilistic model to guide the search for hyper-parameter values which requires fewer trials than grid search or random search. It is particularly efficient in complex search spaces. TPE works with the same principle except it uses decision trees to construct the probabilistic model which allows it to scale better to large search spaces and be more easily interpreted. In addition, a feature selection was performed before optimizing the hyper-parameters. Features were selected by analyzing their SHAP (SHapley Additive exPlanation) values [19] and by proceeding to a recursive feature elimination.
Then, as the classification dataset is imbalanced, a stratified function was used to split the WPS and class weights were balanced when possible. To further address class imbalance, we evaluated data augmentation methods [20]. We tested SMOTE [21] and ADASYN [22] to improve the results. Both methods increase the number of samples of the minority classes by adding data points on the lines joining the k-nearest ones, but ADASYN adds a small random number in order to approach real data distribution.
Finally, simple models were defined as baselines to compare our results. The classification and the regression baselines respectively always assign the most common class and the mean of the output parameter seen in the training set.

Performance measures
To evaluate our recommendation system, classification and regression sub-problems can be looked independently as a welder has to validate the number of passes advised by the first model before getting the recommendation of the second model. We evaluate the prediction of each parameter individually to better understand the strengths and weaknesses of the system.
First of all, for the multi-class and imbalanced problem (predicting the number of passes), we use the F1-Weighted-Score (Equation (4.1)) to evaluate the performances [23]. Precision and recall are computed for each class and then a weighted average (W ) is made. The precision represents the true predictions among all predictions while the recall is about true predictions among real values.
Then, for the regression part (parameters definition), we chose to evaluate the mean absolute error (MAE) for all the parameters as it is an easily interpreted metric that keeps the scale and the unit of the target. As represented by Equation (4.2), the MAE is calculated as the arithmetic mean of the absolute differences between each of the true values y i and its corresponding predictionŷ i taken over n samples (with 1 ≤ i ≤ n). The lower the MAE the better.
Furthermore, some parameters have a percentage of error allowed, as specified in the WPS. This percentage has been set by experts in collaboration with the quality department and can be used to better evaluate the results. In fact, the amperage can have a 10% fluctuation, the voltage 7% and the welding speed 25%. That is why we also computed the mean absolute percentage error (MAPE) to look at the percentage error between predicted values and true ones (Equation (4.3)).
Finally, some parameters need to be identified exactly. This is the case for the number of layers but also for the offset and the trimming which have fewer possible values than others and can be negative. Therefore, we added the coefficient of determination (R 2 ) to our metrics to estimate the difference between the inputs and the prediction. R 2 is computed as follows: , (4.4) where n represents the size of the dataset while y andŷ i are respectively the real value and the predicted one.

Results
Results of both sub-problems are discussed according to overall performances and experts' expectations. In the first part, we focus on the results for the number of passes predictions. In the second part, we discuss the results of the seven other parameters.

Results of number of passes recommendations
For this section, results can be found in Table 2 and Figure 3. All models performed better than the baseline according to the F1-Weighted-Score. The worst model compared to the baseline classifier is Decision Tree with a F1-Weighted-Score of 71.90% without data augmentation, 69.48% with SMOTE and 71.95% with ADASYN. Moreover, Extra Trees and KNN have similar performances as they have strongly overlapping confidence intervals with Decision Tree. These algorithms seem to suffer more than the others from the lack of real world information about welds with more than one pass. On the contrary, the best model is CatBoost which achieves 82.22% without any augmentation  technique. Random Forest, XGBoost and LightBoost also have comparable results. Without data augmentation, the worst model still has a 67% increased of its F1-Weighted-Score compared to the baseline and the best one 92%. Despite the lack of training data for the higher classes, all the models managed, to a certain point, to identify the least represented classes in the testing phases. Nevertheless, we notice both data augmentation methods did not provide any significant improvement. Although ADASYN slightly improved the score of Decision Tree and Light-Boost, we observe the opposite with the multi-layer perceptron and the other models. As data augmentation techniques create new data based on the position of the data points in the training set, the classes are likely to be overlapping too much.
Apart from a lack of data, our assumption to explain such mixed results is that the dataset remains noisy due to the nature of the data, even though it has been carefully prepared. This is mainly due to the decision process of the experts since they rely on their own experience, intuition and habits. In fact, they chose the number of welds just by looking at the shapes and then focus on the other parameters. Then, if they have troubles working with this total number of passes, they start over and pick a new one. They do not always make an optimized decision or the same decision for similar cases which could bring noise to the data, a noise we cannot get rid of since it is due to the nature of the process. With that in mind, it could be interesting to have another look at our results. On the one hand, as we evaluate the performances of the models by looking at what an expert did, we cannot be sure that misclassified observations would not produce functional or optimal welds. On the other hand, the number of passes is only one of the eight settings to weld and the final result will depend on the coherence of all the predicted parameters.

Results of parameters recommendations per pass
For this sub-problem, results can be found in Tables 3-4, and in Figure 4. Overall, every model happens to outperform the baseline for each parameter. Most of the time, the model with the best MAPE or MAE is one of the boosting algorithms or Extra Trees while the worst is the linear regression. For instance, for the welding speed, the baseline has a MAE of 12.6 (cm/min) while the linear regression has 11.1 and XGBoost only 8.2. These results demonstrate it is possible to predict parameters based on WPS inputs and a limited amount of data. However, the quality of the results is not homogeneous between the welding parameters. Even though some are predicted with a insignificant MAE, others seem to have a complex relationship with the inputs. That is why, for the remaining of this section, we will look closer at the results of each parameter.
First, the amperage, the voltage and the wire feed rate are key welding parameters as they should always be precisely determined for the machines and are highly dependent. Compared to the baseline, Extra Trees achieved a 60% reduction of the amperage (A) MAE and a 69% decrease of the voltage (V) which respectively represents a MAPE of 9.2% and 4.3%. According to the percentage error allowed indicated on WPS (10% for the amperage and 7% for the voltage), these values are low enough to produce functional welds. However, for the wire feed rate (cm/min), the welding forms contain no information on the error allowed and the MAPE is higher as it gets to 11.2% at best with the XGBoost model. Even if the error is expected to be low enough for the weld, it is likely that the welder will have to slightly modify this value to improve the welding quality.
Second, some parameters-like the frequency offset, the trimming and the layer-are less important as they are very situational. Even though observations with null frequency offset and trimming are common, Extra Trees managed to decrease their MAE by respectively 59% and 83% compared to the baseline. Yet, as the values of these parameters can be below zero, it is also interesting to focus on the R 2 score. Once again, an improvement can be noticed from the mean regressor but the results remain around 50% at best. In addition to their situational use case, these values must be considered carefully by the welders. Moreover, the layer is also only an indication on how to divide the given number of passes. Indeed,  even if LightBoost achieved a R 2 score of around 86%, a welder can find the recommended solution difficult to perform. Changing the way the passes are done should not impact the remaining welding settings as they are not dependent on their order. Finally, the welding speed is the most difficult parameter to recommend for the models. Yet, the best average MAPE is only 18.2% with XGBoost which is less than the 25% error permitted in a WPS. This high percentage can be explained by the fact that experts are ending their parameters search by this feature which amplifies the noise for this parameter. On the one hand, they have to adapt to the previous parameters they chose. For instance, an expert might have chosen some fancy values for the wire feed rate compared to another one and will have to reduce the welding speed. In other cases, experts might have deliberately kept the same settings for a weld so welders must not stop between each pass. In that case, the welding speed will also be highly impacted.
In the end, each individually evaluated parameter results in values good enough to achieve a functional weld.

Conclusion
We presented a new method to recommend GMAW parameters by identifying the number of passes and then the seven welding parameters for each pass. We evaluated the performances of 11 classification model types for the passes and 11 regression model types for the other parameters, including multi-layer perceptrons. We demonstrated it is possible to generate WPS parameters based on a given weld information. The best classification model is CatBoost and the best regression models are Extra Trees or a boosting model with a reduced MAE and MAPE depending on the parameter. Given the tolerance thresholds for the parameters in a WPS, for each evaluated parameter, we obtained results of sufficient quality to achieve functional welds. We also observed, and shall stress, that experts have different methodologies which implies many solutions are possible for a specific weld. We conclude that our recommendation system, based on the best models for recommending both the number of passes and the welding parameters, has the potential to help expert welders in the time-consuming task of welding procedure certification. Further work includes studying physical models to add further information on each pass shape and welding penetration.