Dimensionality Reduction Surrogate Methods

Leo Migdal

-Dec 10, 2025, 9:37 PM

dimensionality reduction surrogate methods

You have full access to this open access article Surrogate modeling has been popularized as an alternative to full-scale models in complex engineering processes such as manufacturing and computer-assisted engineering. The modeling demand exponentially increases with complexity and number of system parameters, which consequently requires higher-dimensional engineering solving techniques. This is known as the curse of dimensionality. Surrogate models are commonly used to replace costly computational simulations and modeling of complex geometries. However, an ongoing challenge is to reduce execution and memory consumption of high-complexity processes, which often exhibit nonlinear phenomena.

Dimensionality reduction algorithms have been employed for feature extraction, selection, and elimination for simplifying surrogate models of high-dimensional problems. By applying dimensionality reduction to surrogate models, less computation is required to generate surrogate model parts while retaining sufficient representation accuracy of the full process. This paper aims to review the current literature on dimensionality reduction integrated with surrogate modeling methods. A review of the current state-of-the-art dimensionality reduction and surrogate modeling methods is introduced with a discussion of their mathematical implications, applications, and limitations. Finally, current studies that combine the two topics are discussed and avenues of further research are presented. Avoid common mistakes on your manuscript.

Data mining has become a rapidly growing field in recent years. At the same time, data generation has seen a surge in volume, leading to a growth in size, complexity, and data dimensionality. High-dimensional data exists where the number of data features is on the order of the number of samples or observations [1]. These datasets can be computationally expensive to learn and generating mapping functions between input and output can be a cumbersome task. Thus, reducing the number of features, or the problem dimensionality, can greatly simplify the learning and training of regression and classification models for extracting patterns in the data. Dimensionality reduction (DR) techniques seek to reduce the data dimensionality and identify intrinsic data structure while sacrificing minimal accuracy and information.

DR can be achieved through feature elimination, feature selection, or feature extraction. Feature elimination involves reducing the input dimension space by eliminating features of the dataset that are deemed unimportant. Although this simplifies the computations afterwards, no information is gained by dropping those features. Feature selection involves using statistics to determine and rank features based on their information contribution to the overall dataset. These methods can be categorized as filter and wrapper methods and have been explored in detail in [2]. It is important to note that there is no universal method for ranking data features as different tests will yield different contribution scores.

Global sensitivity analysis methods [3], which identify the ‘most important’ inputs in unstructured datasets and ignore the others, have emerged as a novel feature selection method for machine learning (ML) prediction models [4]. Finally, feature extraction methods, like principal component analysis (PCA), create new independent features that are combinations of the original dataset features. In this paper, dimensionality reduction methods are classified as linear and non-linear methods. Linear DR methods transform data to a lower-dimension feature space through linear kernelization and combinations of original variables. The linear techniques presented in this paper predominantly perform dimensionality reduction through linear algebra. Non-linear DR methods are applied when the initial data space contains nonlinear relationships and structure.

These include kernel PCA (kPCA), manifold learning methods, and autoencoders. Typically, non-linear DR techniques generate a lower-dimensional representation of the data while preserving distance between data points. Furthermore, these methods can subsequently be supervised or unsupervised schemes. Unsupervised DR methods, such as PCA, only consider the input feature matrix for pattern identification, while supervised methods such as partial least squares (PLS) and linear discriminant analysis (LDA) consider both features and the... The overall goal of DR methods is to enhance the accuracy and efficiency of data mining by reducing the dataset and increasing data quality. This section provides implementation for concepts related to dimensionality reduction methods.

The code in this section demonstrates the application of the dimensionality reduction methods to interesting datasets that illustrate the performance and nature of different dimensionality reduction methods. It also demonstrates the sequential sampling algorithms that use dimensionality reduction methods on the 10 dimensional Styblinski-Tang function. This illustrates the difficulty of performing sequential sampling on high-dimensional functions and the use of dimensionality reduction to improve the performance of sequential sampling. The above concepts are covered in the following two sections: Sequential Sampling using Dimensionality Reduction Surrogate modeling has been popularized as an alternative to full-scale models in complex engineering processes such as manufacturing and computer-assisted engineering.

The modeling demand exponentially increases with complexity and number of system parameters, which consequently requires higher-dimensional engineering solving techniques. This is known as the curse of dimensionality. Surrogate models are commonly used to replace costly computational simulations and modeling of complex geometries. However, an ongoing challenge is to reduce execution and memory consumption of high-complexity processes, which often exhibit nonlinear phenomena. Dimensionality reduction algorithms have been employed for feature extraction, selection, and elimination for simplifying surrogate models of high-dimensional problems. By applying dimensionality reduction to surrogate models, less computation is required to generate surrogate model parts while retaining sufficient representation accuracy of the full process.

This paper aims to review the current literature on dimensionality reduction integrated with surrogate modeling methods. A review of the current state-of-the-art dimensionality reduction and surrogate modeling methods is introduced with a discussion of their mathematical implications, applications, and limitations. Finally, current studies that combine the two topics are discussed and avenues of further research are presented. Hou, C. K. J., & Behdinan, K.

(2022, December 1). Dimensionality Reduction in Surrogate Modeling: A Review of Combined Methods. Data Science and Engineering. Springer. https://doi.org/10.1007/s41019-022-00193-5 Mendeley helps you to discover research relevant for your work.

Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods,... High dimensional data can be hard for machines to work with, requiring significant time and space for analysis. It also presents a challenge for humans, since it's hard to visualize or understand data in more than three dimensions. Reducing the dimensionality of a data set, while keeping its essential features relatively intact, can make algorithms more efficient and allow analysts to visualize trends and patterns. The reduced-dimensional representations of data are often referred to as "intrinsic variables". This description implies that these are the values from which the data was produced.

For example, consider a dataset that contains images of a letter 'A', which has been scaled and rotated by varying amounts. Each image has 32×32 pixels. Each image can be represented as a vector of 1024 pixel values. Each row is a sample on a two-dimensional manifold in 1024-dimensional space (a Hamming space). The intrinsic dimensionality is two, because two variables (rotation and scale) were varied in order to produce the data. Information about the shape or look of a letter 'A' is not part of the intrinsic variables because it is the same in every instance.

Nonlinear dimensionality reduction will discard the correlated information (the letter 'A') and recover only the varying information (rotation and scale). The image to the right shows sample images from this dataset (to save space, not all input images are shown), and a plot of the two-dimensional points that results from using a NLDR algorithm... By comparison, if principal component analysis, which is a linear dimensionality reduction algorithm, is used to reduce this same dataset into two dimensions, the resulting values are not so well organized. This demonstrates that the high-dimensional vectors (each representing a letter 'A') that sample this manifold vary in a non-linear manner. It should be apparent, therefore, that NLDR has several applications in the field of computer-vision. For example, consider a robot that uses a camera to navigate in a closed static environment.

The images obtained by that camera can be considered to be samples on a manifold in high-dimensional space, and the intrinsic variables of that manifold will represent the robot's position and orientation. Academia.edu no longer supports Internet Explorer. To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser. Surrogate modeling has been popularized as an alternative to full-scale models in complex engineering processes such as manufacturing and computer-assisted engineering. The modeling demand exponentially increases with complexity and number of system parameters, which consequently requires higher-dimensional engineering solving techniques. This is known as the curse of dimensionality.

Surrogate models are commonly used to replace costly computational simulations and modeling of complex geometries. However, an ongoing challenge is to reduce execution and memory consumption of high-complexity processes, which often exhibit nonlinear phenomena. Dimensionality reduction algorithms have been employed for feature extraction, selection, and elimination for simplifying surrogate models of high-dimensional problems. By applying dimensionality reduction to surrogate models, less computation is required to generate surrogate model parts while retaining sufficient representation accuracy of the ... Surrogate modeling problems often include variable fidelity data. Most approaches consider the case of two available levels of fidelity, while engineers can have data with more than two samples sorted by fidelity.

We consider Gaussian process regression framework that can construct surrogate models with arbitrary number of fidelity levels. While straightforward implementation struggles from numerical instability and numerical problems, our approach adopts Bayesian paradigm and provides direct control of numerical properties of surrogate model construction problems. Benchmark of the presented approach consists of various artificial and real data problems with the focus on surrogate modeling of an airfoil and a C-shape press. Surrogate modeling aims at replacing of expensive to evaluate computational code by a fast surrogate model constructed using data [1]. As engineer often can vary level of fidelity of computer code, so data have variable fidelity [2], [3], [4]. For example, we can estimate aerodynamic quality of an airfoil using a wind tunnel experiment or varying mesh quality for a computational code.

So, surrogate modeling algorithms have to take into account variable fidelity data. Cokriging [5], [6], [7], or variable fidelity provides a natural way to handle variable fidelity data. Common implementations of cokriging handle data with two levels of fidelity [2], while in many case engineer can present data with more levels of fidelity [8], . Our approach allows to construct surrogate models using arbitrary number of levels of fidelity. We paid special attention to avoiding of numerical problems connected with surrogate modeling using Gaussian process regression framework. In simulation-based realization of complex systems, we are forced to address the issue of computational complexity.

Dimensionality Reduction Surrogate Methods

People Also Search

You Have Full Access To This Open Access Article Surrogate

Dimensionality Reduction Algorithms Have Been Employed For Feature Extraction, Selection,

Data Mining Has Become A Rapidly Growing Field In Recent

DR Can Be Achieved Through Feature Elimination, Feature Selection, Or

Global Sensitivity Analysis Methods [3], Which Identify The ‘most Important’