Analysis of effective area and mass transfer in a structure packing column using machine learning and response surface methodology | Scientific Reports
Scientific Reports volume 14, Article number: 19711 (2024) Cite this article
484 Accesses
Metrics details
The study examined mass transfer coefficients in a structured CO2 absorption column using machine learning (ML) and response surface methodology (RSM). Three correlations for the fractional effective area (af), gas phase mass transfer coefficient (kG), and liquid phase mass transfer coefficient (kL) were derived with coefficient of determination (R2) values of 0.9717, 0.9907 and 0.9323, respectively. To develop these correlations, four characteristics of structured packings, including packing surface area (ap), packing corrugation angle (θ), packing channel base (B), and packing crimp height (h), were used. ML used five models, represented as random forest (RF), radial basis function neural network (RBF), multilayer perceptron (MLP), XGB Regressor, and Extra Trees Regressor (ETR), with the best models being radial basis function neural network (RBF) for af (R2 = 0.9813, MSE = 0.00088), RBF for kG (R2 = 0.9933, MSE = 0.00056), and multilayer perceptron (MLP) for kL (R2 = 0.9871, MSE = 0.00089). The channel base had the most impact on af and kL, while crimp height affected kG the most. Although the RSM method produced adequate equations for each output variable with good predictability, the ML method provides superior modeling capabilities.
Developing efficient control strategies and economically viable technologies for reducing CO2 emissions is imperative to mitigate enduring damage to the climate system. Among the array of approaches for CO2 removal, chemical absorption using amine solvents stands out as a well-established, reliable, and highly promising technique known for its selectivity1. The conventional amines used in these processes are monoethanolamine (MEA), diethanolamine (DEA), methyl-diethanolamine (MDEA), and 2-amino-2-methyl1-propanol (AMP). To mitigate issues like corrosion, the loss of amine, the high energy demand for solvent regeneration, and the potential for improved CO2 absorption, it is advisable to explore the method of assessing blended solvents with enhanced abilities. According to researchers, one of the combinations that can be optimal is the combination of amines with piperazine2. In addition to the mentioned amines, other solvents such as Potassium Hydroxide (KOH) solution, liquid Ammonia, Lithium Silicate (Li4SiO4), Pyrrolidine, etc., are also used to absorb and separate CO23. The use of packing material is commonly observed in this process4.
Mass transfer has been an essential concept in studying the packed bed columns. Absorption occurs within a column where gas streams and liquid solvents come into direct contact. The efficiency of the absorption process can be evaluated based on the extent of interaction between the gas and liquid phases within the column. For efficient removal, it is essential to have column interiors that promote a high level of gas–liquid interaction5. Packing materials include stainless steel, plastic (PP, PVC, etc.), and ceramics. When we have a CO2 absorption process, stainless steel packing is a common choice due to its ability to withstand high temperatures, resist corrosion, and be cost-effective3. Thus, in this work, studies are focused on stainless steel packing. The results from industrial distillation and absorption columns that use structured packing indicate that separation efficiency often decreases as their diameter increases, negatively impacting the final product's quality and performance6. Two main factors, transverse maldistribution, and longitudinal mixing, affect the vapor and liquid flow inside the column, leading to decreased efficiency. These factors result in deviations from the ideal conditions of uniform distribution, leading to lower productivity and purity of the finished products6. The choice of packing type holds significant relevance in numerous industrial processes, as the attributes of the packing substantially influence the mass transfer coefficient. The effectiveness of substance transfer between phases is quantified by this coefficient, with the packing serving as a crucial determinant of its value4,5,6. Structured and random packings, the most used types of packing, are extensively employed in absorption processes. In recent decades, scientists have focused on advancing high-efficiency structured packings, maximizing mass transfer efficiency. Structured packings are often costly and more difficult to install and maintain than random packings. Structured packings have several advantages, such as lower pressure drop, reduced liquid hold-up, an increased surface-to-volume ratio, and increased production capacity. Due to these benefits, structured packings are preferred over other types of packing in various applications, including absorption and distillation processes, as they enhance mass transfer performance7. Structured packings consist of corrugated sheets arranged in an organized pattern, thanks to their modular design. They provide more efficient mass transfer due to their design of corrugated sheets, which can help increase the substance transfer rate and reduce the absorbent amount. These benefits are precious in low-pressure operations, like vacuum distillation in refinery crude units. On the other hand, random packings are less expensive and have greater mechanical strength6. To evaluate the effectiveness of packings, one can measure their performance based on the mass-transfer coefficients and the effective mass transfer area. Structured packings are desirable for capturing CO2 using aqueous amine absorption due to their advantageous mass transfer and hydraulic properties8. Over the past three decades, there has been a notable acceleration in the progress of modeling and correlations related to mass transfer in packed beds. Correlations for the effective interfacial area (ae) and gas and liquid side mass transfer coefficients (kG and kL) in structured packings are widely used in chemical engineering and process design to estimate the performance of gas–liquid contacting equipment such as distillation columns, absorption columns, and other packed columns5.
The current body of literature regarding packing is comprehensive, with a notable emphasis on structured packings. In 1985, Bravo et al.9 introduced one of the pioneering models for predicting ae, kG, and kL in structured packings. The model, initially developed for Sulzer BX structured packing, a type of Gauze packing with the presumption of complete wetting of the packing surface, utilized the two-film theory to determine kG and kL10. Bravo9 introduced a new idea of incorporating effective velocities for gas and liquid phases, a departure from previous correlation methods. This addition allows for a more accurate consideration of the system's interaction between gas and liquid. Billet and Schultes11 explored the mass transfer phenomenon in packed columns for gas absorption and distillation, examining 31 liquid–gas systems and 67 distinct packing variations. The experiments were conducted in various columns with different heights and diameters, operating in a counter-current flow configuration and utilizing both structured and random packings. The authors utilized the penetration theory to analyze gas and liquid mass transfer in their study. They additionally reported the mean relative variances between the computed mass transfer coefficients and the experimentally observed values in absorption and desorption processes, revealing an average difference of 8.3% on the liquid side and 12.4% on the gas side11. Brunazzi and Paglianti12 created a model for mass transfer through experimentation, focusing on the release of CO2 from water into the air and the absorption of chlorinated compounds using two commercially available high-boiling liquids (Genosorb 300 and Genosorb 1843). The experiments conducted absorption of chlorinated compounds in a column operating in counter-current mode using structured packings, such as Mellapak 250Y and BX, and desorption of CO2 in co-current mode. They demonstrated that the values of kxae and kyae, when considering a margin of error of ± 15% and ± 19%, respectively, applied to the process of CO2 desorption from water in the air. The Delft model, initially conceptualized by Olujic13 in 1997 and subsequently enhanced from 1999 to 2004, was designed for corrugated structured packings. This model considers the gas flow, which moves in a zig-zag pattern through the triangular channels of the packing. The zig-zag pattern occurs because the packing layers are composed of individual elements that are rotated by 90 degrees for their adjacent elements, creating an animated gas flow. The model demonstrated that the calculated height equivalent to theoretical plates (HETPs) was consistent with experimental results, with an error margin of approximately ± 12%. Hanley and Chen14 reviewed the performance of commonly used mass-transfer correlations for predicting the HETP in binary separations using Flexipac and Mellapak-type packing. They focused on mass-transfer and interfacial area correlations commonly available in commercial simulation software, such as Aspen Technology's Aspen rate based distillation component to develop reliable and dimensionally consistent correlations for mass transfer quantities such as kL, kG, and ae. They employed a new data fitting procedure for distillation and acid gas absorption with amine operations. Researchers developed these correlations for metal Pall rings, metal IMPT, sheet metal structured packing of Mellapak type, and metal Gauze structured packings in the X configuration. Linek et al.15 used the Mellapak structured packing, including 250Y, 350Y, 452Y, and 500Y, and underwent measurements of their mass transfer properties. The measurement of kL was done through oxygen desorption from water, while the ae was measured through the absorption of CO2 into a NaOH solution and oxygen absorption into a sulfite solution. Chao Wang16 created mass-transfer models that are dimensionless and can estimate the mass-transfer coefficients and effective area, incorporating mixing and Reynolds numbers in the correlations. Table 1 provides a summary of past correlations related to mass transfer1. These models consider the packing geometry and can predict mass transfer properties in aqueous systems. The models can account for the impacts of operating conditions and packing geometries on mass transfer. Instead of relying on vendor-specific geometric data such as packing channel base (B) and crimp height (h), these models utilize generalized packing information such as surface area (ap) and corrugation angle (θ). In the experimental setups, absorption and desorption processes use aqueous solvents with liquid physical properties similar to pure water17. Some new methods for reducing carbon dioxide in the air using amine solutions have been studied recently. Tan et al.18 study offers a promising strategy for leveraging efficient, cost-effective, and environmentally sustainable solid acid catalysts to advance CO2 capture technology. The findings underscore the significant potential of SnO2/ATP catalysts in promoting low-energy and green amine-based CO2 capture processes, contributing to the ongoing efforts toward sustainable energy utilization and environmental conservation. Tan et al.19 confirmed the catalytic effect of attapulgite using FT-IR and Raman techniques and proposed a potential catalytic desorption mechanism. Additionally, the stability of activated attapulgite was verified through 15 CO2 absorption–desorption cycles. These results highlight the significant contribution of attapulgite to advancing amine-based CO2 capture technology and its potential for large-scale deployment. Zhang et al.20 devise an effective method for substantially boosting CO2 absorption rates in tertiary amine solutions. Introducing a manganese-based oxide (MnOx) with four distinct oxides—Mn3O4, Mn2O3, MnOOH, and MnO2—via a one-step synthesis process marks a novel approach. This MnOx catalyst is then employed to catalytically accelerate CO2 absorption within a typical tertiary amine solution, MDEA. Results indicate a significant enhancement in both the rate (up to 360%) and quantity (132%) of CO2 absorption facilitated by the MnOx catalyst. Notably, MnOx surpasses individual manganese-based oxides, their physical mixture, and the majority of reported catalysts in enhancing CO2 absorption within the MDEA solution. Chemical absorption with tertiary amines for CO2 capture is efficient but slow. Zhang et al.'s21 study on MgAl-layered double hydroxide (LDH) catalysts showed enhanced absorption rates by 92.7%, validated through diverse techniques. This indicates the potential for faster CO2 capture processes. LDH stability is affirmed in 10 cyclic experiments, paving the way for scalable applications.
One of the most significant challenges for existing correlations is high error. The primary error of these rates originates from their fitting method with experimental data. The high errors in the derived rates can have significant implications for the reliability and applicability of the correlations in practical situations. Incorrect or unreliable rates can lead to erroneous conclusions, unreliable predictions, and unreliable decision-making, which can have adverse consequences in various fields of study or application22. Many research papers have discussed the application of machine learning (ML) and response surface methodology (RSM) in modeling and simulating absorption processes. Hassan Pashaei et al.23 used RSM and artificial neural network (ANN) to model and optimize mass transfer flux into the Pz-KOH-CO2 system. In 100 epochs, the mean square error (MSE) values for the MLP and RBF models regarding mass transfer flux were 0.00019 and 0.00048, respectively, and they also obtained suitable metric values for the mass transfer model. In a study conducted by Nuchitprasittichai and Cremaschi24, a typical amine-based CO2 capture process was optimized to reduce capture expenses by utilizing two simulation methodologies: RSM and ANN. Based on the outcomes derived from numerous simulation runs, encompassing the evaluation of the minimum CO2 capture cost and the associated percentage error, the researchers concluded that the RSM algorithm demonstrated comparable performance to the ANN approach, yielding solutions closely aligned with those obtained by the ANN method. Zafari and Ghaemi2 optimized and simulated CO2 mass transfer flux. They achieved this by employing a mixture of piperazine and amine absorbents and RSM and ANN methodologies. Their objective was to attain the highest possible mass transfer flux. The ANN and RSM models demonstrated proficient predictive capabilities for experimental data, boasting maximum R2 values of 0.9974 and 0.9723, respectively. Considering the low MSE of 5.2 × 10–4, it is advisable to prioritize utilizing ANN to develop absorption simulation models. ML algorithms have enabled more accurate predictions of CO2 solubility. In the research conducted by Khoshraftar and Ghaemi25, two ML methods, ANNs and support vector machines (SVM), along with RSM, were assessed for estimating the equilibrium of CO2 in water-based solutions containing piperazine and diethanolamine. The results revealed that the MLP network, after seven epochs, exhibited an MSE of 0.000128 and a coefficient of determination (R2) of 0.9995. All three models demonstrated an R2 exceeding 0.99, signifying their exceptional predictive capabilities. Valera et al.26 investigate the use of ANNs to predict the removal efficiency and volumetric mass transfer coefficient (kga) in spray towers for SO2 removal. Their network presented an average error of 8.44% for the outlet SO2 concentration and 4.53% for the kga. This work showed that the use of neural networks is promising in the prediction of important variables in the process of removing air pollutants in spray towers. Valera et al.27 present an experimental evaluation and neural network modeling of removal efficiency and volumetric mass transfer coefficient for gas desulfurization in a spray tower. The experimental results indicate significant variations in removal efficiency and mass transfer coefficient depending on parameters such as gas flow rate, liquid-to-gas ratio, and spray nozzle design. Neural network modeling was employed to predict these parameters, resulting in accurate predictions aligned with experimental findings. Specifically, the numerical data obtained from the experimental evaluation and neural network modeling provide valuable insights into optimizing gas desulfurization processes in spray towers for enhanced removal efficiency and mass transfer performance. Di Caprio et al.28 present compelling numerical evidence showcasing the effectiveness of a hybrid machine learning approach in precisely predicting mass transfer coefficients across a spectrum of operating conditions. The findings specifically highlight the hybrid model's ability to offer accurate estimations of these coefficients, thus providing invaluable insights for fine-tuning CO2 capture processes within spray columns. Previous works in the field have predominantly focused on predicting various mass transfer parameters within the absorption columns; however, a critical observation reveals a substantial gap in addressing the prediction of effective area and mass transfer coefficients for both gas and liquid phases in absorption columns with structured packing. This nuanced consideration is pivotal for a comprehensive understanding of the column's performance and is conspicuously absent in the existing literature. Our study aims to fill this gap by specifically concentrating on modeling the effective area and mass transfer coefficients for both gas and liquid phases. This study explored the structured packing column's mass-transfer modeling using the RSM and ML. To evaluate the efficiency of mass transfer, one must examine three parameters, each with distinct inputs and outputs in the model. Some variables are packing surface area (ap), packing corrugation angle (θ), packing channel base (B), packing crimp height (h), packed bed height (H), mass transfer coefficients of gas (kG) and liquid (kL) phase, fractional effective area (af), and others. The goal of utilizing the RSM modeling technique is to obtain the most favorable conditions and develop a semi-empirical model that achieves the best possible fit by considering the impact of input parameters on Mass transfer coefficients. Furthermore, the primary objective of utilizing the ML approach is to determine the optimal model configuration to examine the connection between fractional effective area and mass transfer coefficients with input parameters. This objective entails the identification of hyperparameters for the models.
Wang obtained the dataset used to create ML and RSM models from the experimental study conducted in their dissertation16. This data is available in the supplementary material. They investigated the mass transfer coefficient characteristics of a structured packing column for CO2 from the air into the NaOH solution system white inlet concentration of 400 ppm CO2. Their research employed a packed column with an entrance diameter of 0.428 m. The structured packings used were all made of stainless steel and produced by GTC Technology, Sulzer Chem Tech, and Raschig. Table 2 includes different types of packing and their physical dimensions used in this study29. The number of these experimental data for the af, kL, and kG parameters are 1976, 927, and 472, respectively. Figure 1 shows some of the common packing types and packing parameters.
Packed column and some common packing types and parameters.
Some packing characteristics used in this study for modeling purposes include ap, h, B, and θ. The packing's surface area per unit volume (ap) is a parameter typically provided in the manufacturer's packing data sheets. The θ in structured packing is the angle between nearby layers or ridges of the packing material. Crimp height (h) and packing corrugation angle (θ), illustrated in Fig. 1, are denoted as h and θ, respectively. Equation (1) is utilized to calculate the parameter B29.
In this equation, S represents the channel side, as depicted in Fig. 1. The remaining parameters used for this modeling include liquid load (L), the number of transfer units (NTU), H, uG, the height of a transfer unit (HTU), and cout. The L is a measure of the volumetric flow rate of liquid passing through a unit area of a packed tower over a specific period, typically expressed in hours. L is calculated based on Eq. (2)30,31.
In these equations, \({\dot{v}}_{L}\) (m3 h−1) is the volumetric gas flow rate, and s (m2) is the section of the column. H (m) is packed bed height, and uG (m s−1) is the inlet gas velocity. cout (ppm) is the concentration of the transfer component in the gas phase of the column. The NTU needed serves as an indicator of the separation's complexity. Each transfer unit brings about a composition change in one of the phases that is equivalent to the average driving force responsible for that change. In a trayed column, one can liken NTU to the number of theoretical trays required. Consequently, achieving a very high-purity product will demand a greater number of transfer units. The HTU serves as an indicator of how well a specific packing material performs in a particular separation process. It takes into account the mass transfer coefficient we discussed earlier. When mass transfer is highly efficient, the HTU value is smaller. Estimating HTU values using empirical correlations or pilot plant tests has limited scope. The equations for parameters HTU, NTU, and H in both liquid and gas phases appear in Eqs. (3–7) 30,31,32.
In these equations, KL and KG represent the overall mass transfer coefficients in the liquid and gas phases, respectively. In represents the input to the tower, while out signifies the output from the tower. Y and X are the respective component mole fractions in the gas and liquid phases, and the star above indicates equilibrium.
Response surface methodology (RSM) involves a collection of mathematical and statistical methods aimed at establishing a robust correlation or connection between a specific output variable of interest (y) and a set of associated input variables (x1, x2, …, xk). This approach has diverse applications, encompassing areas such as progress, design, innovating product formulations, and enhancing existing product designs. Its key advantage lies in its ability to overcome the limitations of the conventional empirical approach, namely, reducing computational time and minimizing costs33. In this research, RSM was utilized to analyze the af, kL, and kG. The quadratic equation was employed in Eq. (8)33,34,35.
In this equation, y is the response function, β0 represents the constant term, βi and βii indicate the linear and quadratic coefficients and xi and xj are parameters that represent the coded values of variables i and j, respectively. Furthermore, the interaction coefficients are represented by βij to demonstrate their effects, while Ø is an experimentally calculated parameter that represents unexpected or unaccounted factors36. To assess the accuracy of the model generated, various metrics are analyzed, including p-value, predicted R2, and adjusted R2 of the model. Table 3 provides a summary of the input and output parameters, including their minimum, maximum, and unit.
Perturbation plots serve the purpose of illustrating how the response variable changes when a single factor is varied while keeping others constant. In the RSM, perturbation plots visually represent the impact of individual factors on the response, aiding researchers in understanding the sensitivity of the system to each factor. The significance of perturbation plots lies in their ability to provide insights into the nature and magnitude of effects. By showcasing how the response changes with the variation of each factor, researchers can identify influential factors and optimize processes efficiently37. We separately used ML methods for each of parameters af, kG, and kL, with inputs corresponding to each of these parameters as described in Table 4. Similarly, for the RSM method, these parameters were also used separately with inputs shown in Table 4.
ML, a subfield of artificial intelligence, has emerged as a transformative technology with profound implications across various academic disciplines and industries. Its fundamental premise lies in the development of computational algorithms and models that enable machines to learn patterns and make data-driven predictions or decisions autonomously. ANN is an ML approach that derives its principles from the structural and functional attributes of biological neurons as observed in the human brain. Typically, neural networks consist of three layers comprising neurons: the input layer, one or more hidden layers, and the output layer. ML finds applications in diverse fields like computer vision, natural language processing, autonomous vehicles, manufacturing and quality control, and predictive modeling. One of the primary advantages of using MLs is their ability to recognize complex patterns and connections in data without requiring explicit coding of the underlying rules or data features. The flexibility and power of machine learning make it a valuable tool for data analysis and decision-making. In this work, we use ML to predict complex and nonlinear relations for input and output parameters. This modeling approach can offer cost-effective and time-efficient estimations compared to conventional methods that require extensive computational time38. At first, the input signals to a neuron are gathered. In ANN modeling, each input (xi) is multiplied by its corresponding weight (wi), and the resulting products are summed up along with a bias value (b) to obtain the overall output, as represented mathematically by Eq. (9)39.
The ultimate data is input into a transfer function denoted as \(f\), and Eq. (10) is used to generate the output values y40.
We adjusted all data, both input and output, to a range between -1 and + 1 using Eq. (11). This adjustment helps the ML learn better and also prevents it from overtraining41,42. Normalization scales input features, preventing certain features from dominating the training process. This ensures a more balanced convergence, allowing the model to learn from all features equally. Normalization helps avoid imbalance weight during training. Without normalization, features with larger scales might receive disproportionate weight updates, leading to biased models. Normalization ensures fair contribution from all features43. Normalization contributes to the stability of the optimization process, aiding in generalizing patterns learned from the training set to unseen data. This reduces the risk of overfitting by promoting a more generalized model44.
We use xnorm for normalized data and x for the variable, with xmax representing the maximum value and xmin representing the minimum value of the x. It is worth noting that each dataset is normalized using its own maximum and minimum values. For example, when normalizing af, we consider the highest and lowest values within the af dataset.
Common criteria for measuring the performance of MLs are the root mean squared error (RMSE), coefficient of determination (R2), mean average error (MAE), and mean squared error (MSE) value. The equation associated with the computation of these errors is outlined as follows42.
The MSE range is (0, + ∞); the prediction model's accuracy increases with decreasing MSE value. The prediction model is perfect when the MSE is zero. RMSE quantifies the average scale of the discrepancy between the predicted and actual values. In essence, RMSE represents the mean vertical distance from the actual value to the corresponding predicted value on the fitted line. Put succinctly, it is the square root of MSE. In the same manner as MSE, the range of RMSE is (0, + ∞); the smaller the RMSE value is, the higher the accuracy of the prediction model. In contrast with MSE, the units of RMSE are the same as the original units, making the RMSE more interpretable than MSE45. Furthermore, for the purpose of comparing the results obtained from the ML and the RSM model, we employed an additional metric. This metric is the average absolute relative deviation) AARD)42. Our findings underscore the importance of incorporating AARD alongside MSE, RMSE, MAE and R2, revealing a more comprehensive perspective on model performance. Equation (16) was used to calculate AARD.
In Eqs. (12–16), ypredicted stands for the estimated y value derived from ML, whereas yactual represents the real y value, and n represents several data points or observations in the dataset. ymean corresponds to the mean value of y.
The average magnitude of the absolute errors between the predicted and actual values is measured using the MAE metric. The MAE range is (0, + ∞); the prediction model's accuracy increases with decreasing MAE value. The benefit of MAE is that it is simple to compute and comprehend, and its unit is the same as that of the original data. A common symmetrical loss function is the MAE. The R2 is the ratio of the predicted variable that a regression model explains. In other words, it is the ratio of the explained variable to the total variable. R2 is the square of the correlation between the actual variable and the predicted variable. Thus, R2 typically ranges from 0 to 1, it can theoretically take on negative values when the model performs worse than a simple horizontal line. This occurs when the sum-of-squares of the residuals (SSres) exceeds the sum-of-squares of the total (SStot). Consequently, the ratio SSres/SStot exceeds 1, resulting in R2 values less than 0. A value of 0 indicates that the regression model explains none of the predicted variables, which means that there is no correlation between the two variables. A value of 1 indicates that the regression model explains all of the predicted variables, which means that the correlation between the two variables is perfect45. Table 5 shows The range of metric values and their classification to evaluate the performance of ML and RSM in our study.
In the realm of ML, the random forest (RF) model stands as a formidable innovation, drawing inspiration from the principles of multiple decision trees. This model, a product of the integration of bagging-based learning, represents a significant departure from the enigmatic nature often associated with black-box ML approaches like neural networks46. By quantifying the impact of every input variable on the adjusted outcomes, the model enables us to assess the influence held by the provided inputs47. As illustrated in Fig. 2, the RF algorithm harnesses the power of randomness during the training phase, constructing a forest that comprises multiple decision trees. Notably, these decision trees operate independently of each other. When presented with input training samples, the algorithm leverages randomization, leading to the creation of a diverse set of decision trees. Each of these trees individually generates predictions. However, the ultimate RF output emerges from the collective wisdom of these trees, and the predictions are averaged, resulting in the final output. Hyperparameters play a pivotal role in further enhancing the prowess of the RF model. Two such hyperparameters, namely max_depth and n_estimators, significantly impact the model's performance. The max_depth parameter determines the maximum depth that each decision tree within the forest can reach. This value controls the complexity of the individual trees and helps prevent overfitting. Conversely, the n_estimators parameter dictates the number of decision trees that constitute the forest. A higher number of trees can potentially lead to better generalization, albeit at the cost of increased computational intensity. Finding the right balance for these hyperparameters is crucial to optimizing the RF model's performance for a given task. The hyperparameters considered for the RF are shown in Table 6.
Schematic diagram of the RF model.
Rosenblatt introduced the perceptron algorithm during the late 1950s, and it has gained significant recognition as a widely used and popular model for supervised ML48. The multi-layered perceptron (MLP) is a well-known neural network that is utilized for constructing nonlinear functions. The basic building block of an MLP is a perceptron, which is a mathematical model inspired by the human brain's neurons. Each perceptron takes multiple inputs, applies weights to those inputs, sums them up, and passes the sum through an activation function to generate an output. Additional layers of perceptron can further process the output to make complex predictions or decisions. In the MLP, there are typically three types of layers: an input layer, one or more hidden layers, and an output layer. The hidden layers, as the name suggests, are in between the input and output layers and contain multiple perceptrons that perform intermediate computations. During the training process, an MLP learns the appropriate weights and biases for its perceptrons to reduce the disparity between its predicted output and the desired output. Nevertheless, the outcomes of the MLP model may vary based on the initial weights assigned to input parameters, which can be considered a limitation. To address this issue, the model is executed multiple times, and the most precise model is chosen as the outcome49. The best model was selected from almost 5000 different combinations. The final model was retained 10 times to ensure robustness and accuracy were seen in the final selection. In simple words, the data was put into five subsets using the five-fold cross-validation policy. This action tries to avoid the overfitting problem by giving a better result in measuring the accuracy of model performance. A model was selected based on the mean R-squared (R2) score metric. The MLP function method is based on Eq. (17). In this relation, ġ is the output vector, and Ѳ, w, and xk indicate the threshold limit, the weighted vector of coefficients, and the input vector, respectively. The MLP takes in, processes, and conveys information through an input layer, one or more hidden layers, and an output layer23.
Figure 3 shows the MLP neural network's structure, which includes the input layer, hidden multilayers, and the output layer for af, kG, and kL.
Diagram illustrating an MLP network with two hidden layers.
The output of the MLP neural network can be generated as Eq. (18).
Here, γjk represents the output of neuron j in layer k, while bjk denotes the bias weight for neuron j in layer k. The wijk signifies the initial randomly chosen link weights during the network training process, and Fk stands for the nonlinear activation transfer functions. These functions can assume various forms, such as identity, bipolar sigmoid, binary step function, binary sigmoid, linear, and Gaussian functions50. The hyperparameters considered for the MLP are shown in Table 7.
The radial basis function neural network (RBF), conceptualized by Haykin et al.51, presents a captivating departure from conventional architecture. In 1998, Brodhead and Lowe52 introduced the RBF, which is a type of feed-forward network that utilizes a single hidden layer, marking the network's debut. RBFs are frequently used in various tasks like regression, classification, pattern recognition, and time series forecasting. When compared to other neural network types, RBFs have a less complex architecture and a quicker learning algorithm. In addition to their strong ability to approximate global patterns, RBFs offer advantages and characteristics like a condensed structure, the capability to approximate any continuous network, and resilience to noise53. These networks consist of three layers: an input layer, a hidden layer with RBF units, and an output layer. The information collected from the input layers is consolidated in the singular hidden layer, where it undergoes a Gaussian transfer function, resulting in the transformation of the data into a nonlinear form. In an RBF, the connections between the input layer and the single hidden layer utilize nonlinear transfer functions, while the connections between the single hidden layer and the output layer employ linear transfer functions. While there are many activation functions for neurons, the Gaussian function is the most prominent one54. This function is mathematically represented as Eq. (19).
In this equation, x stands for the input, φi stands for the output, and cei and ψ represent the center and spread of the Gaussian function, respectively. Furthermore, b represents the bias term. The network's output, y, is calculated by combining the activation function with the weight vector w of the output layer in a linear manner, as demonstrated in Eq. (20)55.
The ith basis function's value, denoted as wi, is combined with weights to form the joint weighted value. In this work, an RBF with a single hidden layer was employed, as shown in Fig. 4.
Schematic diagram of RBF network.
To enhance the effectiveness of the RBF, it is imperative to incorporate various hyperparameters thoughtfully. A crucial aspect to take into account involves the selection of parameters like the learning rate in a hyperparameter context and the number of epochs. These parameters serve as the foundational building blocks and exert significant influence over the network's capacity to discern intricate data patterns. Moreover, the choice of optimizer settings assumes a central role in shaping the network's training process. The decision to employ the Adam optimizer facilitates dynamic optimization of the network's internal parameters, ensuring rapid convergence and robust performance. The Adam optimizer is renowned for its efficiency in handling sparse gradients and adaptability concerning learning rates. These characteristics make Adam particularly suitable for complex models, as it can adjust learning rates individually for each parameter, thus facilitating faster and more stable convergence compared to traditional optimization methods like stochastic gradient descent56. As the ML domain advances, the RBF's innate adaptability and distinctive composition continue to position it as a formidable tool, enabling the revelation of hidden patterns embedded within intricate datasets. The hyperparameters considered for the RBF are shown in Table 8.
XGBoost enhances the speed of the gradient-boosting machine classifier, giving XGB Regressor a significant advantage over alternative decision tree algorithms. This algorithm utilizes a range of regularization parameters, including parallel execution, tree pruning methods, hardware enhancements, and cross-validation, to address compatibility issues, decrease computational time, and enhance resource efficiency57,58. In Fig. 5, we can observe the repetitive computations involved in the XGB Regressor algorithm. In every step of XGB regression, errors, which are also referred to as residuals, are employed to refine the previous estimator. This process is mathematically represented by Eq. (21).
Schematic of the XGB Regressor algorithm.
In this equation, j(t) signifies the t-th iteration of the objective function, m represents the number of predictions, \({\text{L}}({\text{y}}_{{\text{i}}} ,\hat{y}_{{\text{i}}} )\) denotes the training error of the i-th sample, and Q represents the regularization function. The model's output \(\hat{y}_{i}\) is determined by a function F that consists of m trees, as described in Eq. (22).
The regularization term of decision trees, which forms the foundation of boosting algorithms, is expressed \({\text{Q(f}}_{{\text{k}}} {)}\) in Eq. (23).
In this equation, γ represents the complexity of each leaf, T denotes the number of leaves in the decision tree, \(\partial \) is the penalty parameter, and w is the vector of scores associated with the leaves. The hyperparameters considered for the XGB Regressor are shown in Table 9.
The Extra Trees Regressor (ETR) is an extension of the traditional decision tree ensemble method59. It capitalizes on the strength of multiple decision trees to excel in regression tasks. In the field of regression, the ETR operates by generating decision trees from the training dataset and then averaging their predictions. This approach helps to reduce the impact of characteristics on each tree. One notable feature of the ETR is its ability to handle overfitting. Unlike the RF, which uses "tree bagging ", the ETR employs "feature bagging". By creating training subsets for each tree, it trains all trees in the ensemble simultaneously using the complete training dataset. During the process of splitting nodes, randomness comes into play in the ETR by selecting both the feature and its corresponding value. In contrast, RF chooses a subset of features from a predetermined training subset for each tree split. To construct an ETR algorithm, three key steps are involved; step one determines three parameters: the splitting criterion to assess quality, the number of features considered at each split, and the total number of decision trees in the forest in step two, individual decision trees using the training data and select the feature, for splitting based on chosen criterion. Finally, generate a set of trees by following step 2, which will give us a group of trees. The ETR is particularly powerful because it uses decision trees and has an approach to selecting features. These factors together make it strong and effective for regression modeling. The hyperparameters considered for the ETR model are shown in Table 10.
Figure 6 and Fig. S7 shows the flowchart diagram that outlines the process for designing an ML model. In the first step, we collect data. Next, we select the input and output data. In this study, we have three data sets that consist of input and output data. In the third step, we filter and normalize the data. There may be some noise and high error in the experimental data that we can eliminate. We employed diagnostic plots within the Response Surface Methodology (RSM) framework, such as Normal % Probability vs Externally Studentized Residuals or Residuals vs. Predicted. These diagnostic tools enable us to visually identify outliers, influential data points, or patterns of deviation from expected behavior. To better understand the patterns analyzed by the models, we normalize the data between − 1 and + 1. Afterward, we separated 10%, 20%, 30%, and 40% of the data for testing and compared them. The dataset used for this evaluation was not derived from the data used to train the network. Instead, we utilized a separate portion of the data that was already designated for testing purposes. This approach ensured the integrity of the evaluation process by maintaining the independence of the test dataset from the training data. We have employed k-fold cross-validation in our assessment. This method enhances the reliability of our performance metrics by reducing the potential bias due to the random partitioning of data. Following these steps, five models are generated, comprising RF, MLP, RBF, XGB Regressor, and ETR for each output variable. Data is subsequently loaded into the model, and hyperparameters are configured. Subsequently, the model is evaluated using test data, and metrics such as MSE, RMSE, MAE, and R2 are examined. If these metrics meet the acceptable criteria, the hyperparameters are documented. Hyperparameter tuning and metric optimization were performed using the Adam optimizer. The values of \({\beta }_{1}\) and \({\beta }_{2}\) used in the Adam optimizer are typically set to the defaults: 0.9 for \({\beta }_{1}\) and 0.999 for \({\beta }_{2}\).
Flow chart of ML model.
The outcomes of the analysis of experimental data through ANOVA are presented in Tables S1, S2. The model's p-value is the most suitable criterion for assessing the significance of the resulting model in the ANOVA. If the p-value is less than 0.05, it indicates the significance of the model or its parameters34. According to Tables S1 and S2, the model’s p-value (< 0.0001) indicates that the resulting model is significant and credible. According to Table 11, the predicted R2 of 0.9637 and the adjusted R2 of 0.9685 for af show a reasonable agreement, with a difference of less than 0.2. The adeq precision, which measures the signal-to-noise ratio, is desirable when it is greater than 4. In this case, the ratio of 67.909 indicates a satisfactory signal level. Therefore, this model can effectively guide the design process. The model F-value of 303.82 suggests that the model is significant. There is only a 0.01% chance that such a large F-value could occur due to random fluctuations. P-values below 0.05 indicate that the model terms are statistically significant. In this case study, the following model terms are significant: B, C, D, E, F, AB, AC, AD, AE, AG, BC, BG, CG, DG, EF, FG, F2, and G2. A is the surface area, B is the corrugation angle, C is the channel base, D is the crimp height, E is the packed bed height, and F is the liquid load. Finally, the gas velocity is represented by G. On the other hand, values greater than 0.1 indicate that the model terms are not significant. If there are many insignificant model terms, reducing the model may improve its performance. According to Table 11 for kL, the predicted R2 and adjusted R2 values are 0.8843 and 0.9134, respectively, which show reasonable agreement. The model F-value of 49.43 implies that the model is significant. There is only a 0.01% chance that an F-value this large could occur due to noise. In this case, B, C, E, F, G, AD, BH, CE, DG, FH, GH, A2, E2, and H2 are significant model terms. A is the surface area, B is the corrugation angle, C is the channel base, D is the crimp height, E is the packed bed height, F is the liquid load, and G is the Outlet molar concentration. Finally, the number of transfer units is represented by H. According to Table 11 for kG, the predicted R2 of 0.8906 is in reasonable agreement with the adjusted R2 of 0.9831. The model F-value of 130.72 implies that the model is significant. There is only a 0.01% chance that an F-value this large could occur due to noise. In this case, D, E, F, G, AE, BE, EF, E2 are significant model terms. A is the surface area, B is the corrugation angle, D is the crimp height, E is the gas velocity, and F is a number of transfer unit. Finally, the height of a transfer is represented by G (Tables S1 and S2).
The experimental outcomes were modeled using a quadratic equation as follows: Eq. (24) for af, Eq. (25) for kG, and Eq. (26) for kL, which describes the extent of influence and the interplay between variables. The equation, when expressed with actual variables, allows for making predictions about the output at specific levels of each factor. It is important to note that these levels should be defined in the original units for each factor. However, this equation is not suitable for assessing the relative influence of each factor since the coefficients are adjusted to match the units of each factor, and the intercept is not positioned at the center of the design.
Diagnostic plots are used to evaluate and diagnose the accuracy of a model. The accuracy of the quadratic models in forecasting af, kG, and kL outputs was assessed by contrasting the values they generated with the actual data points, as demonstrated in Fig. 7a–c. The results showed that the quadratic model was well-fitted to the experimental data for outputs, as indicated by the linear distribution of data in Fig. 7 and the R2 values in Table 11. Figure 8 illustrates the effect of each parameter on the outputs. Parameters A, B, and others were introduced in Tables S1 and S2. Based on these graphs, which parameters have the greatest and least impact on the mass transfer coefficients can be inferred. It is possible to compare the impacts of all process parameters on the af, kL, and kG at the central point using the perturbation plot. As can be seen from Fig. 8, the highest impact among the input parameters in this model is A, B, and C on af. Also, G and E have the highest impact on kG.
Plots comparing predicted values to actual values for three different outputs of (a) kG, (b) af, and (c) kL.
Graphs depicting the predicted versus actual outcomes for three output variables of (a) af, (b) kG, and (c) kL.
Structured packing parameters contain ap, θ, B, and h. To check the effect of the ap and θ, its three-dimensional plot is shown in Fig. S4. To draw this diagram and check these two parameters, other variables were considered constants. The plot illustrates that as ap rises and θ declines, af also increases. Also, with the increase in the ap, the impact of the θ related to the packing is more evident. On the other hand, the 3D plot of af, B, and h with constants is shown in Fig. S4. The B has a greater impact on the af and is a more effective parameter compared to h. The B substantially impacts the af due to its role in promoting effective liquid spreading, increasing turbulence, minimizing channeling, and controlling flooding, all of which are crucial for enhanced mass transfer. The h contributes less to these factors and mainly affects gas flow and pressure drop, thus having a more limited effect on the mass transfer efficiency. Therefore, B configuration is generally a more effective design parameter for optimizing the effective area for mass transfer60,32. Both of these parameters have a linear effect on the output.
The effect of ap and θ on kG is shown in Fig. S5. As seen in this figure, the changes in these parameters are not linear with the output. With an increase in ap and a decrease θ, the kG increases. However, it should be noted that for ap, the rate of these changes gradually increases, whereas for the θ, the rate of changes gradually decreases. Furthermore, as the value of the ap increases, the variations in the θ have a more pronounced effect on the output. Conversely, when the θ decreases, the impact of variations in the ap becomes more pronounced. An increase in ap with variations in the corrugation angle can sharply impact the kG because the θ determines how effectively the increased surface can be utilized by the flowing phases. A large ap combined with an optimal θ can lead to enhanced turbulence and better mixing, thus increasing the kG. Conversely, reducing the θ tends to lengthen the path that the gas travels over the surface, which could enhance the gas–liquid contact time and potentially increase the mass transfer rate. This makes the system more sensitive to variations in ap. However, if the θ is decreased too much, it can also reduce mixing and lead to laminar flow conditions, which are less conducive to mass transfer61,62. The effect of B and h on kG is shown in the 3D plot in Fig. S5. The relationship between these parameters and the output is not linear either. By increasing the h, the kG increases, but by increasing the channel base, the kG decreases. This phenomenon can be attributed to three primary factors: the enhancement of the specific surface area available for mass transfer, the induction of turbulence which disrupts the laminar flow and promotes better mixing, and the extension of gas residence times within the packing structure which allows for a more complete mass transfer process. Contrastingly, the augmentation of B presents an inverse relationship with the kG. This effect can be rationalized by considering the ramifications of a broader B: it tends to homogenize the flow, thus lowering the localized zones of intense mass transfer; it reduces the degree of turbulence and, consequently, the efficiency of the gas–liquid contact; it increases the propensity for channeling where preferential pathways undermine the contact between the phases; and it impedes the surface renewal rate critical for maintaining concentration gradients that drive mass transfer22. The rate of change also gradually decreases with an increase in the B when the h remains constant, but for the h when B remains constant, this rate increases.
The variations of structural packing parameters on kL are depicted in Fig. S6. As evident from the graphs, except for θ, the influence of the other parameters is highly nonlinear. After analyzing this figure, it was concluded that the θ has a more significant impact on the kL compared to the ap. Furthermore, both parameters exhibit a direct relationship with the output, meaning that an increase in their values leads to an increase in the kL. The maximum value of the kL corresponds to the maximum values of these two parameters. By examining Fig. S6, it becomes evident that B has a greater and more nonlinear impact on the kL. Furthermore, both parameters exhibit a direct relationship with the output. However, it should be noted that the rate of change of the h, when the B is constant, decreases, but the rate of change of the B, when the h is constant, increases. When the B is held constant, the rate of change of h shows a decreasing trend in its impact on the kL. This phenomenon can be attributed to factors such as diminishing specific surface area for mass transfer and reduced turbulence as h increases, leading to a saturation effect. Conversely, when h remains constant, the rate of change of the B exhibits an increasing impact on the kL. This is due to an expansion of specific surface area and increased turbulence, creating more favorable conditions for enhanced liquid–gas interaction32,63.
In this section, the results related to the ML on the provided data are described. For each of the af, kG, and kL, five models were tested separately, and the best ones were selected. These models include the RF, MLP, RBF, XGB Regressor, and ETR. For some of the mentioned model’s tests, the best parameters related to the model for achieving the best predictions were extracted, and these hyperparameters differ for each of these models. The evaluation criteria employed for the purpose of discerning the optimal model encompass the metrics of MSE, RMSE, MAE, and R2. After selecting the best model, three-dimensional graphs were plotted with the best model, and the corresponding analyses were described. We have evaluated the performance of our models—ETR, MLP, XGB Regressor, RF, RBF across four different train-test ratios: 90–10, 80–20, 70–30, and 60–40. As seen in Tables 13, 15, and 17, ETR model shows reduced effectiveness with smaller training sets, with potential overfitting evident in larger training splits and test performance is strong but slightly worsens as the training data decreases. MLP maintains stable performance across splits, though slight overfitting is observed in the 60–40 split and shows less overfitting compared to ETR and RF, evident from closer training and test scores. As seen in Tables 13, 15 and 17, the XGB Regressor demonstrates consistent and robust performance across splits with a strong balance between training and testing, making it a strong candidate for generalization and slightly lower training scores compared to ETR and RF, which may be beneficial in preventing overfitting. RF exhibits high training scores indicative of potential overfitting. However, test scores are relatively stable, suggesting decent generalizability. For the RF model, test performance is generally good but decreases with less training data. RBF stands out for its consistent and well-balanced performance across all splits, showing the least discrepancy between training and testing metrics. The study utilized an RBF architecture comprising a single hidden layer. Models for all four Train-Test Ratios (90–10, 80–20, 70–30, 60–40) were trained for 300 epochs. To address potential overfitting issues, the training process incorporated techniques such as “ReduceLROnPlateau” and early stopping. The MLP architecture utilizes the Rectified Linear Unit (ReLU) activation function for all the hidden layers.
For af, the inputs and outputs were fed into the model, and the model was executed based on the algorithm explained in the theory section. Among the five models that were tested for af, the RBF and XGB Regressor models showed the best performance. The optimal hyperparameter values of these two models are shown in Table 12.
One of the important criteria for selecting the best model for modeling af is the R2 value. In Fig. S1, R2 has been examined for 90% train and 10% test data. As it is evident, most of these models have been able to make good predictions. Based on this figure, the best models are, in order, RBF, XGB Regressor, MLP, ETR, and RF models. Furthermore, the performance comparison in Fig. 9 also confirms this. The variations in metric values for each epoch in Fig. 9 are also indicative of the convergence and good performance of the RBF. The results for each of the five models for the experimental data are shown in Table 13. As evident from these results, the RBF is the best model based on all four available criteria. Therefore, this model has been used to create three-dimensional plots, which are shown in Fig. 10. As shown in Fig. 10a,c, it can be observed that an increase in the value of ap and a decrease in the value of θ leads to an increase in the output value. Additionally, as evident from the three-dimensional plots, as the peaking ap, at the minimum value of the θ, the rate of change in the af gradually decreases until it reaches a constant value. This trend can be attributed to the diminishing impact of additional surface area on mass transfer efficiency, reaching a saturation point. However, at the maximum value of θ, this rate is increasing. This behavior suggests that higher corrugation angles enhance the sensitivity of the fractional effective area to changes in surface area, indicating a more pronounced effect on mass transfer efficiency60,63. Therefore, it can be concluded that at smaller θ, an increase in the ap has a more significant impact on the af. Similarly, it can be inferred that when the af is larger, a reduction in the θ of packing becomes more effective. The influence of two other peaking parameters is also shown in Fig. 10b,d. As can be observed, B has an extremum when h is constant at the maximum value. This extremum implies that when the value of h is 0.03, the maximum output occurs within a range between 0 and 0.06. Additionally, the rate of change in the output with variations in the h is greater, indicating a greater impact of this parameter on the af.
(a) Metrics per epoch for RBF model, and (b) radar plot with R2 as a metric for af.
The 3D plots and contour with RBF showing simultaneous effects of (a) ap and θ on af, and (b) B and h on af and contour plot for effects of (c) ap and θ on af, and (d) B and h on af.
For kG, five models were executed, and the hyperparameters for each model are described in this section. Among the five models that were tested for kG, the RBF and MLP models showed the best performance. The optimal hyperparameter values of these two models are shown in Table 14.
Figure S2 shows actual train and test data against predicted values for five models. As evident from the figures, the RF and ETR models did not make accurate predictions of the data. However, the RBF and MLP models had predictions very close to the test data. Furthermore, from the examination of Table 15, which displays the results for each of the five models with four different criteria, it can be concluded that, in order, MLP and RBF are the best models. The values of MSE and R2, equal to 0.000563 and 0.99, indicate very good predictions by the RBF for the kG.
Metrics per epoch for the RBF model and performance comparison for kG are shown in Fig. 11. Three-dimensional plots with the RBF model, as shown in Fig. 12a–d, were plotted. Based on these graphs, it can be observed that the kG has a direct relationship with the ap and an inverse relationship with the θ. Furthermore, this model has exhibited two relative extrema at the minimum value of the θ, from which it can be inferred that in some regions, an increase in the ap may lead to a reduction in the kG. For the B, its relationship with the kG is inverse, but for the h, it generally has a direct relationship.
(a) Metrics per epoch for RBFNN model, and (b) radar plot with R2 as a metric for kG.
The 3D plots and contour with RBF showing simultaneous effects of (a) ap and θ on kG, and (b) B and h on kG and contour plot for effects of (c) ap and θ on kG, and (d) B and h on kG.
This section is dedicated to evaluating the performance of models for experimental data related to kL. Among the five models tested for kL, the RBF and MLP models showed the best performance. The optimal hyperparameter values of these two models are shown in Table 16.
In Fig. S3, the predicted values generated by the model output are displayed alongside their corresponding experimental values. Based on this figure and Table 17, we can infer that the MLP and RBF models have provided the most accurate predictions. The MSE value of 0.0089 for the MLP model signifies its excellent predictive performance compared to the other models.
Figure 13 shows metrics per epoch for the MLP model and a performance comparison for kL. From Fig. 14, it can be observed that the kL exhibits a direct correlation with the ap and packing θ. Another noteworthy aspect concerning these plots is that the effect of changes in ap is more pronounced on kL, when the packing θ is high, and likewise, the impact of variations in θ is more pronounced on kL, when the ap is high. In summary, the maximum kL is achieved when we have the highest possible ap and the maximum possible packing angle. Furthermore, the effect of two other parameters, namely, h and B, directly influences the kL. When B is at its maximum value, changes in h exhibit a linear relationship with the kL. However, when B is at its minimum value, these variations become nonlinear. Conversely, when h is at its minimum value, changes in B follow a linear pattern with respect to the kL, while these changes become nonlinear when h is at its maximum value.
(a) Metrics per epoch for MLP model, and (b) radar plot with R2 as a metric for kL.
The 3D plots and contour with RBF showing simultaneous effects of (a) ap and θ on kL, and (b) B and h on kL and contour plot for effects of (c) ap and θ on kL, and (d) B and h on kL.
To assess the performance of the RSM model against the trained ML models, a randomly selected test dataset was employed for evaluation. This dataset was input into both the RSM model and the two best ML models for each of the af, kG, and kL. The following phase in this process entails validating the models by comparing their estimated af, kG, and kL values with the experimental data and computing the AARD. Tables 18, 19, and 20 presents the results of the comparison between the RSM and the ML models. The ML models exhibited greater predictability, as evidenced by their lower AARD values in contrast to the RSM model.
The growing need for analysis and process optimization, along with the expanding accessibility of statistical software and enhanced computing capabilities, has resulted in the extensive utilization of RSM and ML modeling techniques. In this study, af, kG and kL were modeled using RSM and ML for commonly used industrial structured packings. RSM modeling demonstrated the influence of each input parameter on the output and resulted in the generation of equations for each output with a quadratic model, allowing the use of suitably structured packings in packed towers for optimal performance. Among the packing parameters, the channel base had the greatest effect on af and kL, while crimp height had the greatest effect on kG. The R2 values for af, kG, and kL, respectively, were assigned as 0.9717, 0.9907, and 0.9323. The research utilized numerous ML models, including RF, MLP, RBF, XGB Regressor, and ETR, all of which have been analyzed and evaluated. After conducting the examinations, we concluded that, for the af and kG, the RBF with R2 values of 0.981 and 0.992, respectively, and for kL, the MLP with an R2 value of 0.987 provided the best predictions. The RF model exhibited the weakest performance among the ML models. The RSM method produced adequate equations for each output variable with good predictability, the ML method provides superior modeling capabilities. Therefore, moving towards more modern, data-driven methods like ML for these calculations could yield better, more accurate results consistently.
The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.
Column cross-sectional area (m2)
Effective interfacial area (m−1)
Fractional effective area
Packing surface area (m−1)
Packing channel base (m)
Bias
Bias value associated with the ith neuron
Molar concentration (mol/m3)
Center of the Gaussian function
Coefficient of determination
Equivalent diameter (m)
Hydraulic diameter of packed bed (m)
Hydraulic diameter for the gas phase (m)
Liquid-phase diffusivity (m2/s)
Gas-phase diffusivity (m2/s)
The nonlinear activation transfer function
Gravity constant, 9.8 (m/s2)
MLP output vector
Packed bed height (m)
Crimp height (m)
Fractional liquid hold-up (m3/m3)
The t-th iteration of the objective of XGB Regressor function
Overall mass transfer coefficient in liquid phase
Overall mass transfer coefficient in gas phase
Liquid-phase mass transfer coefficient (m/s)
Gas phase mass transfer coefficient (m/s)
Liquid load (m/ h)
Packing equivalent length (1/aP), (m)
Length of the gas flow channel in a packing element, (m)
Packing type and size-dependent constants in the Olujic model
Number of predictions
Packing type and size-dependent constants in the Olujic model
Number of neurons
Regularization function
Relative velocity Reynolds number
Channel side (m)
Section of the column (m2)
Number of leaves in XGB Regressor
Gas-phase superficial velocity (m/s)
Effective liquid-phase diffusivity (m2/s)
Liquid-phase superficial velocity (m/s)
Effective gas-phase diffusivity (m2/s)
Vapor
Volumetric gas flow rate (m3 h−1)
Weight
Mole fractions in the liquid phases
Input variable
Equilibrium mole fractions in the liquid phases
Mole fractions in the gas phases
Response function
Equilibrium mole fractions in the gas phases
XGB Regressor model's output
Packed height (m)
Capillary number(μu/σ)
Liquid-phase Froude number (uL2/gδL)
Mixing number (M/aP3)
Reynolds number (ρu/aPμ)
Schmidt number(μ/ρd)
Sherwood number (k/aPD)
Liquid-phase Weber number(ρLuL2δL/σ)
Constant term of quadratic equation
Linear coefficients of quadratic equation
Quadratic coefficients of quadratic equation
Interaction coefficients of quadratic equation
Packing porosity
Unaccounted factors in quadratic equation
Packing surface void fraction
Packing corrugation angle
Gas dynamic viscosity (cP)
Liquid dynamic viscosity (cP)
Pi number (3.1415)
Gas phase density, (kg/m3)
Liquid phase density, (kg/m3)
Surface tension, N/m
Threshold limit
Complexity of each leaf in XGB Regressor
The output of neuron j in layer k
Penalty parameter
Experimentally calculated parameter of quadratic equation
Gas–liquid friction factor
Spread of the Gaussian function
Output of the Gaussian function
Average absolute relative deviation
Artificial Neural Networks
Extra trees regressor
Height equivalent to theoretical plate
Height of a transfer unit (m)
Laminar flow
Layered double hydroxide
Mean average error
Machine learning
Multilayer perceptron
Mean Squared Error
Mellapak
Number of transfer units
Polypropylene
Polyvinyl chloride
Piperazine
Radial basis function neural network
Response Surface Methodology
Raschig Superpack
Random forest
Root mean squared error
Turbulent flow
Neurons are the basic units of the large neural network
The learning rate is a small positive scalar value that determines the step size or the rate at which the model's weights are updated during each iteration of training. It regulates how quickly or slowly the model converges to a solution
In the training process, the inputs enter each training step and give outputs that are compared with the target to calculate an error. With this process, weights and biases are calculated and modified in each epoch
The activation function is a mathematical function in between the input feeding the current neuron and its output going to the next layer
Bias is a constant that helps the model fit best for the given data
Represents the importance and strengths of the feature/input to the Neurons
Karami, B. & Ghaemi, A. Cost-effective nanoporous hypercross-linked polymers could drastically promote the CO2 absorption rate in amine-based solvents, improving energy-efficient CO2 capture. Ind. Eng. Chem. Res. 60(7), 3105–3114 (2021).
Article CAS Google Scholar
Zafari, P. & Ghaemi, A. Modeling and optimization of CO2 capture into mixed MEA-PZ amine solutions using machine learning based on ANN and RSM models. Results Eng. 19, 101279 (2023).
Article CAS Google Scholar
Rosli, A. et al. Advances in liquid absorbents for CO 2 capture: A review. J. Phys. Sci. 28, 121–144 (2017).
Article CAS Google Scholar
Wang, C. et al. Packing characterization: Mass transfer properties. Energy Procedia 23, 23–32 (2012).
Article ADS Google Scholar
Mirzaei, S., Shamiri, A. & Aroua, M. K. A review of different solvents, mass transfer, and hydrodynamics for postcombustion CO2 capture. Rev. Chem. Eng. 31(6), 521–561 (2015).
Article CAS Google Scholar
Pavlenko, A. et al. Investigation of flow parameters and efficiency of mixture separation on a structured packing. AIChE J. 60(2), 690–705 (2014).
Article ADS MathSciNet CAS Google Scholar
Xu, B. et al. Mass transfer performance of CO2 absorption into aqueous DEEA in packed columns. Int. J. Greenhouse Gas Control 51, 11–17 (2016).
Article Google Scholar
Tsai, R. E. et al. A dimensionless model for predicting the mass-transfer area of structured packing. AIChE J. 57(5), 1173–1184 (2011).
Article ADS CAS Google Scholar
Bravo, J. L. Mass transfer in gauze packings. Hydrocarb. Process. 64(1), 91–95 (1985).
CAS Google Scholar
Whitman, W. G. The two film theory of gas absorption. Int. J. Heat Mass Transf. 5(5), 429–433 (1962).
Article ADS Google Scholar
Billet, R. & Schultes, M. Predicting mass transfer in packed columns. Chem. Eng. Technol. 16(1), 1–9 (1993).
Article CAS Google Scholar
Brunazzi, E. & Paglianti, A. Liquid-film mass-transfer coefficient in a column equipped with structured packings. Ind. Eng. Chem. Res. 36(9), 3792–3799 (1997).
Article CAS Google Scholar
Olujic, Z. Development of a complete simulation model for predicting the hydraulic and separation performance of distillation columns equipped with structured packings. Chem. Biochem. Eng. Q. 11(1), 31–46 (1997).
CAS Google Scholar
Hanley, B. & Chen, C. C. New mass-transfer correlations for packed towers. AIChE J. 58(1), 132–152 (2012).
Article ADS CAS Google Scholar
Valenz, L. et al. Absorption mass-transfer characteristics of Mellapak packings series. Ind. Eng. Chem. Res. 50(21), 12134–12142 (2011).
Article CAS Google Scholar
Wang, C. Mass Transfer Coefficients and Effective Area of Packing (2015).
Naderi, K., Foroughi, A. & Ghaemi, A. Analysis of hydraulic performance in a structured packing column for air/water system: RSM and ANN modeling. Chem. Eng. Process. 193, 109521 (2023).
Article CAS Google Scholar
Tan, Z. et al. SnO2/ATP catalyst enabling energy-efficient and green amine-based CO2 capture. Chem. Eng. J. 453, 139801 (2023).
Article CAS Google Scholar
Tan, Z. et al. Attapulgite as a cost-effective catalyst for low-energy consumption amine-based CO2 capture. Sep. Purif. Technol. 298, 121577 (2022).
Article CAS Google Scholar
Zhang, X. et al. One-step synthesis of efficient manganese-based oxide catalyst for ultra-rapid CO2 absorption in MDEA solutions. Chem. Eng. J. 465, 142878 (2023).
Article ADS CAS Google Scholar
Zhang, X. et al. Solid base LDH-catalyzed ultrafast and efficient CO2 absorption into a tertiary amine solution. Chem. Eng. Sci. 278, 118889 (2023).
Article ADS CAS Google Scholar
Flagiello, D. et al. A review on gas-liquid mass transfer coefficients in packed-bed columns. ChemEngineering 5(3), 43 (2021).
Article CAS Google Scholar
Pashaei, H., Mashhadimoslem, H. & Ghaemi, A. Modeling and optimization of CO2 mass transfer flux into Pz-KOH-CO2 system using RSM and ANN. Sci. Rep. 13(1), 4011 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Nuchitprasittichai, A. & Cremaschi, S. Optimization of CO2 capture process with aqueous amines: A comparison of two simulation-optimization approaches. Ind. Eng. Chem. Res. 52(30), 10236–10243 (2013).
Article CAS Google Scholar
Khoshraftar, Z. & Ghaemi, A. Modeling of CO2 solubility in piperazine (PZ) and diethanolamine (DEA) solution via machine learning approach and response surface methodology. Case Stud. Chem. Environ. Eng. 8, 100457 (2023).
Article CAS Google Scholar
Valera, V. Y., Codolo, M. C. & Martins, T. D. Artificial neural network for prediction of SO2 removal and volumetric mass transfer coefficient in spray tower. Chem. Eng. Res. Des. 170, 1–12 (2021).
Article CAS Google Scholar
Valera, V. Y., Martins, T. D. & Codolo, M. C. Experimental evaluation and neural networks modeling of removal efficiency and volumetric mass transfer coefficient for gas desulfurization in spray tower. Chem. Eng. Sci. 285, 119568 (2024).
Article CAS Google Scholar
Di Caprio, U. et al. Predicting overall mass transfer coefficients of CO2 capture into monoethanolamine in spray columns with hybrid machine learning. J. CO2 Util. 70, 102452 (2023).
Article Google Scholar
Wang, C. et al. Dimensionless models for predicting the effective area, liquid-film, and gas-film mass-transfer coefficients of packing. Ind. Eng. Chem. Res. 55(18), 5373–5384 (2016).
Article CAS Google Scholar
Flagiello, D. et al. Characterization of mass transfer coefficients and pressure drops for packed towers with Mellapak 250. X. Chem. Eng. Res. Des. 161, 340–356 (2020).
Article CAS Google Scholar
Lhuissier, M. et al. Volatile organic compounds absorption in a structured packing fed with waste oils: Experimental and modeling assessments. Chem. Eng. Sci. 238, 116598 (2021).
Article CAS Google Scholar
Macfarlan, L. H., Phan, M. T. & Eldridge, R. B. Methodologies for predicting the mass transfer performance of structured packings with computational fluid dynamics: A review. Chem. Eng. Process. 172, 108798 (2022).
Article CAS Google Scholar
Khuri, A. I. & Mukhopadhyay, S. Response surface methodology. Wiley Interdiscip. Rev. 2(2), 128–149 (2010).
Article Google Scholar
Moradi, M. R., Ramezanipour Penchah, H. & Ghaemi, A. CO2 capture by benzene-based hypercrosslinked polymer adsorbent: Artificial neural network and response surface methodology. Can. J. Chem. Eng. 101, 5621–5642 (2023).
Article CAS Google Scholar
Qadir, R. et al. Enzyme-assisted extraction of phenolics from Capparis spinosa fruit: Modeling and optimization of the process by RSM and ANN. ACS Omega 7(37), 33031–33038 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hemmati, A., Ghaemi, A. & Asadollahzadeh, M. RSM and ANN modeling of hold up, slip, and characteristic velocities in standard systems using pulsed disc-and-doughnut contactor column. Sep. Sci. Technol. 56(16), 2734–2749 (2021).
Article CAS Google Scholar
Wan Omar, W. N. N. Response Surface Methodology (RSM): Learn and Apply (2020).
Ghaemi, A., Dehnavi, M. K. & Khoshraftar, Z. Exploring artificial neural network approach and RSM modeling in the prediction of CO2 capture using carbon molecular sieves. Case Stud. Chem. Environ. Eng. 7, 100310 (2023).
Article CAS Google Scholar
Ghaemi, A. et al. Hydrodynamic behavior of standard liquid-liquid systems in Oldshue-Rushton extraction column; RSM and ANN modeling. Chem. Eng. Process. 168, 108559 (2021).
Article CAS Google Scholar
Khoshraftar, Z. & Ghaemi, A. Evaluation of pistachio shells as solid wastes to produce activated carbon for CO2 capture: Isotherm, response surface methodology (RSM) and artificial neural network (ANN) modeling. Curr. Res. Green Sustain. Chem. 5, 100342 (2022).
Article CAS Google Scholar
Kolbadinejad, S. et al. Deep learning analysis of Ar, Xe, Kr, and O2 adsorption on activated carbon and zeolites using ANN approach. Chem. Eng. Process. 170, 108662 (2022).
Article CAS Google Scholar
Mashhadimoslem, H. et al. Development of predictive models for activated carbon synthesis from different biomass for CO2 adsorption using artificial neural networks. Ind. Eng. Chem. Res. 60(38), 13950–13966 (2021).
Article CAS Google Scholar
Shen, K. Effect of Batch Size on Training Dynamics. https://medium.com/mini-distill/effect-of-batch-size-on-training-dynamics-21c14f7a716e (2018).
Brownlee, J. How to avoid overfitting in deep learning neural networks. Mach. Learn. Mastery 17, 12 (2018).
Google Scholar
Jierula, A. et al. Study on accuracy metrics for evaluating the predictions of damage locations in deep piles using artificial neural networks with acoustic emission data. Appl. Sci. 11(5), 2314 (2021).
Article CAS Google Scholar
Dorigo, W. et al. Mapping invasive Fallopia japonica by combined spectral, spatial, and temporal analysis of digital orthophotos. Int. J. Appl. Earth Obs. Geoinf. 19, 185–195 (2012).
ADS Google Scholar
Ling, Z. et al. A nonintrusive load monitoring method for office buildings based on random forest. Buildings 11(10), 449 (2021).
Article Google Scholar
Murtagh, F. Multilayer perceptrons for classification and regression. Neurocomputing 2(5–6), 183–197 (1991).
Article MathSciNet Google Scholar
Govindarajan, M. & Chandrasekaran, R. Intrusion detection using neural based hybrid classification methods. Comput. Netw. 55(8), 1662–1671 (2011).
Article Google Scholar
Fausett, L. V. Fundamentals of Neural Networks: Architectures, Algorithms and Applications (Pearson Education India, 2006).
Google Scholar
Haykin, S. Neural Networks: A Comprehensive Foundation (Prentice Hall, 1998).
Google Scholar
Kobayashi, K. & Salam, M. U. Comparing simulated and measured values using mean squared deviation and its components. Agron. J. 92(2), 345–352 (2000).
Article Google Scholar
Faris, H., Aljarah, I. & Mirjalili, S. Evolving radial basis function networks using moth–flame optimizer. In Handbook of Neural Computation 537–550 (Elsevier, 2017).
Chapter Google Scholar
Zhao, Z. et al. Prediction of interfacial interactions related with membrane fouling in a membrane bioreactor based on radial basis function artificial neural network (ANN). Bioresour. Technol. 282, 262–268 (2019).
Article CAS PubMed Google Scholar
Khoshraftar, Z. & Ghaemi, A. Modeling and prediction of CO2 partial pressure in methanol solution using artificial neural networks. Curr. Res. Green Sustain. Chem. 6, 100364 (2023).
Article CAS Google Scholar
Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. arXiv arXiv:1412.6980 (2014).
Dhaliwal, S. S., Nahid, A.-A. & Abbas, R. Effective intrusion detection system using XGBoost. Information 9(7), 149 (2018).
Article Google Scholar
Zhang, D. et al. A data-driven design for fault detection of wind turbines using random forests and XGboost. Ieee Access 6, 21020–21031 (2018).
Article Google Scholar
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).
Article Google Scholar
Wang, C. et al. Packing characterization for post combustion CO2 capture: Mass transfer model development. Energy Procedia 63, 1727–1744 (2014).
Article CAS Google Scholar
Olujić, Ž, Seibert, F. & Fair, J. R. Influence of corrugation geometry on the performance of structured packings: An experimental study. Chem. Eng. Process. 39, 335–342 (2000).
Article Google Scholar
Gu, C. et al. Numerical analysis of the influence of packing corrugation angle on the flow and mass transfer characteristics of cryogenic distillation. Appl. Therm. Eng. 214, 118847 (2022).
Article CAS Google Scholar
Macfarlan, L. H., Phan, M. T. & Eldridge, R. B. Structured packing geometry study for liquid-phase mass transfer and hydrodynamic performance using CFD. Chem. Eng. Sci. 249, 117353 (2022).
Article CAS Google Scholar
Download references
School of Chemical, Petroleum and Gas Engineering, Iran University of Science and Technology, Narmak, Tehran, 16846, Iran
Amirsoheil Foroughi, Kamyar Naderi & Ahad Ghaemi
Department of Electrical Engineering, Iran University of Science and Technology, Narmak, Tehran, 16846-13114, Iran
Mohammad Sadegh Kalami Yazdi & Mohammad Reza Mosavi
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
A F. : Conceptualization, Methodology, Software, Conceived and designed the experiments, Validation, Formal analysis, Investigation, Resources, Data curation, Writing—original draft, Writing—review & editing, Supervision Visualization, Project administration. K N: Conceptualization, Methodology, Software, Conceived and designed the experiments, Validation, Formal analysis, Investigation, Resources, Data curation, Writing—original draft, Writing -review & editing, Supervision Visualization, Project administration. M S Y: Software, Writing -review & editing, Validation, Formal analysis, Data curation. A G: Corresponding author Supervision, Funding acquisition, Software, Validation, Formal analysis, Investigation, Resources, Visualization. M R M: Supervision, Funding acquisition, Software, Validation, Formal analysis, Investigation, Resources, Visualization.
Correspondence to Ahad Ghaemi.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Foroughi, A., Naderi, K., Ghaemi, A. et al. Analysis of effective area and mass transfer in a structure packing column using machine learning and response surface methodology. Sci Rep 14, 19711 (2024). https://doi.org/10.1038/s41598-024-70339-0
Download citation
Received: 01 April 2024
Accepted: 14 August 2024
Published: 24 August 2024
DOI: https://doi.org/10.1038/s41598-024-70339-0
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative