Optimizers In Deep Learning A Comparative Study And Analysis

Leo Migdal

-Nov 17, 2025, 4:03 AM

optimizers in deep learning a comparative study and analysis

Ijraset Journal For Research in Applied Science and Engineering Technology Authors: Sujay Bashetty, Kalyan Raja , Sahiti Adepu, Ajeet Jain DOI Link: https://doi.org/10.22214/ijraset.2022.48050 Machine learning has enormously contributed towards optimization techniques with new ways for optimization algorithms. These approaches in deep learning have wide applications with resurgence of novelty starting from Stochastic Gradient Descent to convex and non-convex ones. Selecting an optimizer is a vital choice in deep learning as it determines the training speed and final performance predicted by the DL model.

The complexity further increases with growing deeper due to hyper-parameter tuning and as the data sets become larger. In this work, we analyze most popular and widely optimizers algorithms empirically. The augmenting behaviors of these are tested on MNIST, Auto Encoder data sets. We compare them pointing out their similarities, differences and likelihood of their suitability for a given applications. Recent variants of optimizers are highlighted. The article focuses on their critical role and pinpoints which one would be a better option while making a trade-off.

Deep learning (DL) algorithms are essential in statistical computations because of their efficiency as data sets grow in size. Interestingly, one of the pillars of DL is the mathematical tactics of the optimization process that make decisions based on previously invisible data. This is achieved through carefully chosen parameters for a given learning problem (an intuitive near-optimal solution). The hyper–parameters are the parameters of a learning algorithm and not of a given model. Evidently, the inspiration is to look forward to the optimizing algorithm which works well and predict accurately [1, 2, 3, 4]. Many people have worked on text classification in ML because of the fundamental problem of learning from examples.

Similarly, speech and image recognition have been dealt with great success and accuracy – yet offers the place for new improvements. In achieving higher goals, use of various optimizing techniques involving convexity principles are much more cited [5, 6, 7] now a days and using logistic and other regression techniques. Moreover, the Stochastic Gradient Descent (SGD) has been very popular over last many years, but also suffers from ill-conditioning and also taking more time to compute for larger data sets. In some cases, it also requires hyper-parameter tuning and different learning rates. This repository contains implementations and analysis of multiple deep learning models, including Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM). It also includes a comparative study of optimizer performance in training these models.

Deep learning has transformed the landscape of machine learning by enabling powerful predictive models. This project explores: Sanjana Deshpande: Model implementation and performance evaluation. Team Members (Aditya Nandan Reddy Sanivarapu and Dinesh Kumar Nayak): Contributed to architecture design, coding, and result analysis. This project is licensed under the MIT License. See the LICENSE file for details.

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 1615))

Included in the following conference series: Optimization algorithms are essential for the effectiveness of training and performance of deep learning models, but their comparative efficiency across various architectures and data sets remains insufficiently quantified. In this study, the output of three widely used optimizers-adam, RMSPROP and lamb-system is evaluated on a convolution (Resnet-50, VGG-19), repetitive (LSTM, GRU) and reviews based on transformer across image classification (MNIST, CIFAR-10, OCT-2017)... During more than 1200 controlled experiments with extensive hyperparameter tuning, Adam achieves optimal low -data accuracy (92.3% per MNist) and transformation applications (4.2% higher BLEU scores), while RMSProp excels in convolution networks. comparison with medical display) compared to Adam) compared to Adam. Lamb proves to be a better distributed training, allowing 98% of large batch doses to be used and reduces the Imagenet era by 53%.

The introduction of a normalized metric of convergence efficiency shows that adaptive methods bring 22–37% faster than SGD, but show sensitivity to learning speed plans. Critical compromises are identified, including Adam’s vulnerability to gradient sparsite in the tasks of NLP, RMSPROP robustness towards class and Lamb scalability of costly balance. These findings provide instructions that can be available, emphasizing that the optimizer’s efficiency is dependent on context and must be aligned with model architecture, data distribution and computing sources. H. Anandaram, K. S.

Shreenidhi, L. Madaan, A. Dhole and N.S. Talwandi---Contributed equally This is a preview of subscription content, log in via an institution to check access. Indian Journal of Science and Technology

Year: 2025, Volume: 18, Issue: 10, Pages: 803-810 1Research Scholar, Department of Computer Science, Bharathidasan University, Tiruchirappalli, 620 024, Tamil Nadu, India2Associate Professor, Department of Computer Science, Bharathidasan University, Tiruchirappalli, 620 024, Tamil Nadu, India *Corresponding AuthorEmail: [email protected] Received Date:17 January 2025, Accepted Date:19 March 2025, Published Date:30 March 2025

Optimizers In Deep Learning A Comparative Study And Analysis

People Also Search

Ijraset Journal For Research In Applied Science And Engineering Technology

The Complexity Further Increases With Growing Deeper Due To Hyper-parameter

Deep Learning (DL) Algorithms Are Essential In Statistical Computations Because

Similarly, Speech And Image Recognition Have Been Dealt With Great

Deep Learning Has Transformed The Landscape Of Machine Learning By