Stochastic Gradient Method With Barzilai Borwein Step For Springer

Leo Migdal

-Nov 26, 2025, 4:19 PM

stochastic gradient method with barzilai borwein step for springer

The use of stochastic gradient algorithms for nonlinear optimization is of considerable interest, especially in the case of high dimensions. In this case, the choice of the step size is of key importance for the convergence rate. In this paper, we propose two new stochastic gradient algorithms that use an improved Barzilai–Borwein step size formula. Convergence analysis shows that these algorithms enable linear convergence in probability for strongly convex objective functions. Our computational experiments confirm that the proposed algorithms have better characteristics than two-point gradient algorithms and well-known stochastic gradient methods. This is a preview of subscription content, log in via an institution to check access.

Price excludes VAT (USA) Tax calculation will be finalised during checkout. K. Chaudhuri, C. Monteleoni, and D. Sarwate, “Differentially private empirical risk minimization,” J. Mach.

Learn. Res., No. 12, 1069–1109 (2011). H. Robbins and S. Monro, “A stochastic approximation method,” Ann.

Math. Stat. 22, 400–407 (1951). The use of stochastic gradient algorithms for nonlinear optimization is of considerable interest, especially in the case of high dimensions. In this case, the choice of the step size is of key importance for the convergence rate. In this paper, we propose two new stochastic gradient algorithms that use an improved Barzilai–Borwein step size formula.

Convergence analysis shows that these algorithms enable linear convergence in probability for strongly convex objective functions. Our computational experiments confirm that the proposed algorithms have better characteristics than two-point gradient algorithms and well-known stochastic gradient methods. This is a preview of subscription content, log in via an institution to check access. Price excludes VAT (USA) Tax calculation will be finalised during checkout. K. Chaudhuri, C.

Monteleoni, and D. Sarwate, “Differentially private empirical risk minimization,” J. Mach. Learn. Res., No. 12, 1069–1109 (2011).

H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Stat. 22, 400–407 (1951).

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13164)) Included in the following conference series: A crucial aspect in designing a learning algorithm is the selection of the hyperparameters (parameters that are not trained during the learning process). In particular the effectiveness of the stochastic gradient methods strongly depends on the steplength selection. In recent papers [9, 10], Franchini et al. propose to adopt an adaptive selection rule borrowed from the full-gradient scheme known as Limited Memory Steepest Descent method [8] and appropriately tailored to the stochastic framework.

This strategy is based on the computation of the eigenvalues (Ritz-like values) of a suitable matrix obtained from the gradients of the most recent iterations, and it enables to give an estimation of the... The possible increase of the size of the sub-sample used to compute the stochastic gradient is driven by means of an augmented inner product test approach [3]. The whole procedure makes the tuning of the parameters less expensive than the selection of a fixed steplength, although it remains dependent on the choice of threshold values bounding the variability of the steplength... The contribution of this paper is to exploit a stochastic version of the Barzilai-Borwein formulas [1] to adaptively select the endpoints range for the Ritz-like values. A numerical experimentation for some convex loss functions highlights that the proposed procedure remains stable as well as the tuning of the hyperparameters appears less expensive. This work has been partially supported by the INdAM research group GNCS and by POR-FSE 2014–2020 funds of Emilia-Romagna region.

This is a preview of subscription content, log in via an institution to check access. In unconstrained optimization problems, gradient descent method is the most basic algorithm, and its performance is directly related to the step size. In this paper, we develop a family of gradient step sizes based on Barzilai-Borwein method, named regularized Barzilai-Borwein (RBB) step sizes. We indicate that the reciprocal of the RBB step size is the close solution to an \( \varvec{\ell }_{\varvec{2}}^{\varvec{2}} \)-regularized least squares problem. We propose an adaptive regularization parameter scheme based on the principle of the alternate Barzilai-Borwein (ABB) method and the local mean curvature of the objective function. We introduce a new alternate step size criterion into the ABB method, forming a three-term alternate step size, thereby establishing an enhanced RBB method for solving quadratic and general unconstrained optimization problems efficiently.

We apply the proposed algorithms to solve typical quadratic and non-quadratic optimization problems, and further employ them to address spherical t-design, which is a nonlinear nonconvex optimization problem on an Oblique manifold. This is a preview of subscription content, log in via an institution to check access. Price excludes VAT (USA) Tax calculation will be finalised during checkout. All data that support the findings of this study are included within the article. All experiments were implemented in MATLAB R2024a. All the runs were carried out on a PC with an 12th Gen Intel(R) Core(TM) i7-12700H 2.30 GHz and 32 GB of RAM

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1824))

Included in the following conference series: Minimization problems involving a finite sum as objective function often arise in machine learning applications. The number of components of the finite-sum term is typically very large, by making unfeasible the computation of its gradient. For this reason stochastic gradient methods are commonly considered. The performance of these approaches strongly relies on the selection of both the learning rate and the mini-batch size employed to compute the stochastic direction. In this paper we combine a recent idea to select the learning rate as a diagonal matrix based on stochastic Barzilai-Borwein rules together with an adaptive subsampling technique to fix the mini-batch size.

Convergence results of the resulting stochastic gradient algorithm are shown for both convex and non-convex objective functions. Several numerical experiments on binary classification problems are carried out to compare the proposed method with other state-of-the-art schemes. G. Franchini, F. Porta, V. Ruggiero, I.

Trombini and L. Zanni—These authors contributed equally to this work. This is a preview of subscription content, log in via an institution to check access.

People Also Search

The Use Of Stochastic Gradient Algorithms For Nonlinear Optimization Is

Price Excludes VAT (USA) Tax Calculation Will Be Finalised During