A Dual Enhanced Stochastic Gradient Descent Method With Dynamic
Scientific Reports volume 15, Article number: 40389 (2025) Cite this article In modern machine learning, optimization algorithms are crucial; they steer the training process by skillfully navigating through complex, high-dimensional loss landscapes. Among these, stochastic gradient descent with momentum (SGDM) is widely adopted for its ability to accelerate convergence in shallow regions. However, SGDM struggles in challenging optimization landscapes, where narrow, curved valleys can lead to oscillations and slow progress. This paper introduces dual enhanced SGD (DESGD), which addresses these limitations by dynamically adapting both momentum and step size on the same update rules of SGDM. In two optimization test functions, the Rosenbrock and Sum Square functions, the suggested optimizer typically performs better than SGDM and Adam.
For example, it accomplishes comparable errors while achieving up to 81–95% fewer iterations and 66–91% less CPU time than SGDM and 67–78% fewer iterations with 62–70% quicker runtimes than Adam. On the MNIST dataset, the proposed optimizer achieved the highest accuracies and lowest test losses across the majority of batch sizes. Compared to SGDM, they consistently improved accuracy by about 1–2%, while performing on par with or slightly better than Adam in accuracy and error. Although SGDM remained the fastest per-step optimizer, our method’s computational cost is aligned with that of other adaptive optimizers like Adam. This marginal increase in per-iteration overhead is decisively justified by the substantial gains in model accuracy and reduction in training loss, demonstrating a favorable cost-to-performance ratio. The results demonstrate that DESGD is a promising practical optimizer to handle scenarios demanding stability in challenging landscapes.
Machine learning (ML) optimization is critical to the development of models that exhibit efficiency, scalability and superior performance. With the continuous advancement of modern ML approaches, the necessity of optimization in the training of complex models, such as deep neural networks, is becoming more essential. Recent developments, such as gradient-based approaches, adaptive learning rate strategies and stochastic optimization techniques, have greatly enhanced the performance of models in various applications like disease diagnosis1,2,3, photovoltaic power forecasting4,5, large language models training6... In addition, improving model performance plays a crucial role in enhancing computational efficiency, resulting in reduced training time, lower resource utilization and an increase in the accessibility and deployment of ML solutions in real... Thus, the area of ML optimization remains a promising research area, offering significant ideas that enhance the progress of artificial intelligence across diverse sectors. One of the powerful methods used in ML optimization is the gradient descent (GD), which minimizes the loss function \(J(\theta )\) where \(\theta\) are the model’s parameters by updating them in the negative direction...
The step size to reach a local minimum is found by the learning rate \(\alpha\). The GD method has different variants according to the number of data samples that will be fed into the optimization process. Stochastic gradient descent (SGD) performs the parameters update using one data sample at a time instead of using all the data samples and completes one epoch i.e. one iteration, after finishing all the data samples. I’m excited to share our recently published research paper, now appearing in Scientific Reports (Nature Portfolio): 📄 “A Dual Enhanced Stochastic Gradient Descent Method with Dynamic Momentum and Step Size Adaptation for Improved Optimization... Specifically: 1) Adaptive Momentum using a conjugate-gradient–inspired update, with stability ensured through truncation schemes.
2) Adaptive Step Size using gradient alignment (cosine similarity), which is line-search-free and computationally lightweight. Our proposed optimizer shows faster and more stable convergence than traditional methods, requiring fewer training iterations while maintaining a competitive computational cost. It delivers strong, consistent performance across both classical optimization benchmarks and a neural network experiment, highlighting its potential as a practical and efficient alternative in challenging optimization settings. This work opens the door for more robust and efficient training—especially in high-curvature optimization landscapes. This paper represents our proof of concept, and we plan to expand this idea into more advanced models, larger datasets, and real-world applications in future work. I am deeply grateful to my supervisors and co-authors, Dr.
Mohamed Fathy , Dr. Yasser Dahab, and Dr. Emad Abdallah, for their invaluable support and dedicated effort. 🔗 Open-access full paper: Scientific Reports (2025) https://lnkd.in/dDa5CFDy Being my student makes me really proud, and I wish you continued success and personal development. In modern machine learning, optimization algorithms are crucial; they steer the training process by skillfully navigating through complex, high-dimensional loss landscapes.
Among these, stochastic gradient descent with momentum (SGDM) is widely adopted for its ability to accelerate convergence in shallow regions. However, SGDM struggles in challenging optimization landscapes, where narrow, curved valleys can lead to oscillations and slow progress. This paper introduces dual enhanced SGD (DESGD), which addresses these limitations by dynamically adapting both momentum and step size on the same update rules of SGDM. In two optimization test functions, the Rosenbrock and Sum Square functions, the suggested optimizer typically performs better than SGDM and Adam. For example, it accomplishes comparable errors while achieving up to 81–95% fewer iterations and 66–91% less CPU time than SGDM and 67–78% fewer iterations with 62–70% quicker runtimes than Adam. On the MNIST dataset, the proposed optimizer achieved the highest accuracies and lowest test losses across the majority of batch sizes.
Compared to SGDM, they consistently improved accuracy by about 1–2%, while performing on par with or slightly better than Adam in accuracy and error. Although SGDM remained the fastest per-step optimizer, our method’s computational cost is aligned with that of other adaptive optimizers like Adam. This marginal increase in per-iteration overhead is decisively justified by the substantial gains in model accuracy and reduction in training loss, demonstrating a favorable cost-to-performance ratio. The results demonstrate that DESGD is a promising practical optimizer to handle scenarios demanding stability in challenging landscapes. Keywords: Adaptive momentum; Adaptive step size; Gradient-based optimization; Machine learning optimization; Stochastic gradient descent. Declarations.
Competing interests: The authors declare no competing interests. Components of dual enhanced SGD (DESGD) algorithm. Dual enhanced SGD (DESGD) algorithm procedure.
People Also Search
- A dual enhanced stochastic gradient descent method with dynamic ...
- New Optimization Algorithm: DESGD for Improved Performance
- Gradient accelerated stochastic dual dynamic programming for economic ...
- Enhancing Stochastic Gradient Descent: A Unified Framework and Novel ...
- PDF Stochastic gradient descent Lecture 10 - MIT
- PDF Dual stochastic natural gradient descent - Springer
- Dynamic Gradient Descent and Reinforcement Learning for AI-Enhanced ...
- PDF A dual enhanced stochastic gradient descent method with dynamic ...
- Stochastic Gradient Descent-Ascent: Unified Theory and New ... - PMLR
Scientific Reports Volume 15, Article Number: 40389 (2025) Cite This
Scientific Reports volume 15, Article number: 40389 (2025) Cite this article In modern machine learning, optimization algorithms are crucial; they steer the training process by skillfully navigating through complex, high-dimensional loss landscapes. Among these, stochastic gradient descent with momentum (SGDM) is widely adopted for its ability to accelerate convergence in shallow regions. However,...
For Example, It Accomplishes Comparable Errors While Achieving Up To
For example, it accomplishes comparable errors while achieving up to 81–95% fewer iterations and 66–91% less CPU time than SGDM and 67–78% fewer iterations with 62–70% quicker runtimes than Adam. On the MNIST dataset, the proposed optimizer achieved the highest accuracies and lowest test losses across the majority of batch sizes. Compared to SGDM, they consistently improved accuracy by about 1–2%,...
Machine Learning (ML) Optimization Is Critical To The Development Of
Machine learning (ML) optimization is critical to the development of models that exhibit efficiency, scalability and superior performance. With the continuous advancement of modern ML approaches, the necessity of optimization in the training of complex models, such as deep neural networks, is becoming more essential. Recent developments, such as gradient-based approaches, adaptive learning rate st...
The Step Size To Reach A Local Minimum Is Found
The step size to reach a local minimum is found by the learning rate \(\alpha\). The GD method has different variants according to the number of data samples that will be fed into the optimization process. Stochastic gradient descent (SGD) performs the parameters update using one data sample at a time instead of using all the data samples and completes one epoch i.e. one iteration, after finishing...
2) Adaptive Step Size Using Gradient Alignment (cosine Similarity), Which
2) Adaptive Step Size using gradient alignment (cosine similarity), which is line-search-free and computationally lightweight. Our proposed optimizer shows faster and more stable convergence than traditional methods, requiring fewer training iterations while maintaining a competitive computational cost. It delivers strong, consistent performance across both classical optimization benchmarks and a ...