Incorporating The Barzilai Borwein Adaptive Step Size Into Sugradient
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.© Copyright 2025 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions. We propose several modifications of the Barzilai–Borwein (BB) step size in the variance reduction (VR) methods for finite-sum optimization problems. Our first approach relies on a scalar function, which we call the TaiL Function (TLF). The TLF maps the computed BB step size to some positive real number, which will be used as the step size instead. The computational overhead is almost negligible and the functional forms of TLFs in this work don’t involve any problem-dependent parameters.
In the strongly convex setting, due to the undesirable appearance of the condition number \(\kappa \) in the linear convergence rate, the IFO complexity of VR methods with BB step size has the form... With the utilization of the TLF, the aforementioned complexity is improved to \(\mathcal {O}((n+\kappa ^{\tilde{a}})\log (1/\epsilon ))\), \(\tilde{a}\in \mathbb {R}_{+}, \tilde{a}<a\). In the non-convex setting, we improve \(\mathcal {O}(n+n\epsilon ^{-1})\) of SVRG-SBB to \(\mathcal {O}(n+n^{\beta }\epsilon ^{-1})\), where \(\beta \in \mathbb {R}_{+}\) can take any value in (2/3, 1). Specifically, the constant step size regime is recovered by taking the TLF as a constant function, whose function value relies on problem-dependent parameters. As a counterpart of the constant step size regime, we also propose a BB-based vibration technique to set step sizes for VR methods, leading to methods with novel one-parameter step sizes. These methods have the same complexities compared to their constant step size versions.
Meanwhile, they are more robust w.r.t. the sole step size parameter empirically. Moreover, a novel analysis is proposed for SARAH-I-type methods in the strongly convex setting. Numerical tests corroborate the proposed methods. This is a preview of subscription content, log in via an institution to check access. Price excludes VAT (USA) Tax calculation will be finalised during checkout.
Note that the need of increasing \(\eta _{\textrm{low}}\) and \(\eta _{\textrm{up}}\) excludes certain choices of TLFs. For example, we can’t choose decreasing functions. And if an increasing function increases too slow in a neighbourhood of 0 (e.g., \(h(\eta )=\eta ^2\)), we can’t choose it either. Since in such cases, even though m is decreased, \(\eta _{\textrm{low}}\) and \(\eta _{\textrm{up}}\) still may not increase. We explore the inequality in more detail. Assume that the TLF satisfies \(h(0)=0\), then the inequality essentially requires:
People Also Search
- Incorporating the Barzilai-Borwein Adaptive Step Size into Sugradient ...
- Enhancing logit stochastic user equilibrium convergence in large-scale ...
- A faster path-based algorithm with Barzilai-Borwein step size for ...
- PDF New Adaptive Barzilai-borwein Step Size and Its Application in Solving ...
- Accelerating Stochastic Recursive and Semi-stochastic Gradient Methods ...
- PDF Barzilai-Borwein Step Size for Stochastic Gradient Descent
- On the Improvement of the Barzilai-Borwein Step Size in Variance ...
A Not-for-profit Organization, IEEE Is The World's Largest Technical Professional
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.© Copyright 2025 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions. We propose several modifications of the Barzilai–Borwein (BB) step size in the variance reduction (VR) methods for finite-s...
In The Strongly Convex Setting, Due To The Undesirable Appearance
In the strongly convex setting, due to the undesirable appearance of the condition number \(\kappa \) in the linear convergence rate, the IFO complexity of VR methods with BB step size has the form... With the utilization of the TLF, the aforementioned complexity is improved to \(\mathcal {O}((n+\kappa ^{\tilde{a}})\log (1/\epsilon ))\), \(\tilde{a}\in \mathbb {R}_{+}, \tilde{a}<a\). In the non-co...
Meanwhile, They Are More Robust W.r.t. The Sole Step Size
Meanwhile, they are more robust w.r.t. the sole step size parameter empirically. Moreover, a novel analysis is proposed for SARAH-I-type methods in the strongly convex setting. Numerical tests corroborate the proposed methods. This is a preview of subscription content, log in via an institution to check access. Price excludes VAT (USA) Tax calculation will be finalised during checkout.
Note That The Need Of Increasing \(\eta _{\textrm{low}}\) And \(\eta
Note that the need of increasing \(\eta _{\textrm{low}}\) and \(\eta _{\textrm{up}}\) excludes certain choices of TLFs. For example, we can’t choose decreasing functions. And if an increasing function increases too slow in a neighbourhood of 0 (e.g., \(h(\eta )=\eta ^2\)), we can’t choose it either. Since in such cases, even though m is decreased, \(\eta _{\textrm{low}}\) and \(\eta _{\textrm{up}}...