On The Improvement Of The Barzilai Borwein Step Size In Variance

Leo Migdal
-
on the improvement of the barzilai borwein step size in variance

We propose several modifications of the Barzilai–Borwein (BB) step size in the variance reduction (VR) methods for finite-sum optimization problems. Our first approach relies on a scalar function, which we call the TaiL Function (TLF). The TLF maps the computed BB step size to some positive real number, which will be used as the step size instead. The computational overhead is almost negligible and the functional forms of TLFs in this work don’t involve any problem-dependent parameters. In the strongly convex setting, due to the undesirable appearance of the condition number \(\kappa \) in the linear convergence rate, the IFO complexity of VR methods with BB step size has the form... With the utilization of the TLF, the aforementioned complexity is improved to \(\mathcal {O}((n+\kappa ^{\tilde{a}})\log (1/\epsilon ))\), \(\tilde{a}\in \mathbb {R}_{+}, \tilde{a}<a\).

In the non-convex setting, we improve \(\mathcal {O}(n+n\epsilon ^{-1})\) of SVRG-SBB to \(\mathcal {O}(n+n^{\beta }\epsilon ^{-1})\), where \(\beta \in \mathbb {R}_{+}\) can take any value in (2/3, 1). Specifically, the constant step size regime is recovered by taking the TLF as a constant function, whose function value relies on problem-dependent parameters. As a counterpart of the constant step size regime, we also propose a BB-based vibration technique to set step sizes for VR methods, leading to methods with novel one-parameter step sizes. These methods have the same complexities compared to their constant step size versions. Meanwhile, they are more robust w.r.t. the sole step size parameter empirically.

Moreover, a novel analysis is proposed for SARAH-I-type methods in the strongly convex setting. Numerical tests corroborate the proposed methods. This is a preview of subscription content, log in via an institution to check access. Price excludes VAT (USA) Tax calculation will be finalised during checkout. Note that the need of increasing \(\eta _{\textrm{low}}\) and \(\eta _{\textrm{up}}\) excludes certain choices of TLFs. For example, we can’t choose decreasing functions.

And if an increasing function increases too slow in a neighbourhood of 0 (e.g., \(h(\eta )=\eta ^2\)), we can’t choose it either. Since in such cases, even though m is decreased, \(\eta _{\textrm{low}}\) and \(\eta _{\textrm{up}}\) still may not increase. We explore the inequality in more detail. Assume that the TLF satisfies \(h(0)=0\), then the inequality essentially requires: The Barzilai-Borwein (BB) method is an effective gradient method for solving unconstrained optimization problems. Based on the observation of two classical BB step sizes, by constructing a variational least squares model, we propose a new class of BB step sizes, each of which still has the quasi-Newton property.

The original BB step sizes are two special cases of the new step sizes. Numerical experiments verify the effectiveness of the new step sizes. Keywords: Barzilai-Borwein , variational, least squares, step size, convergence Mathematics Subject Classification: 90C20, 90C25, 90C30. In this paper, we consider the unconstrained optimization problem where f:ℝn⟶ℝ:𝑓⟶superscriptℝ𝑛ℝf:\mathbb{R}^{n}\longrightarrow\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟶ blackboard_R is continuously differentiable.

A minimizer is denoted by x∗subscript𝑥x_{*}italic_x start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT. The gradient method for solving (1) is an iterative method of the form Journal of Industrial and Management Optimization Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Japan This work was supported by the Grant-in-Aid for Scientific Research (C) 21K11769 from Japan Society for the Promotion of Science. In this paper, we consider improving the stochastic variance reduce gradient (SVRG) method by incorporating the curvature information of the objective function.

We propose to reduce the variance of stochastic gradients using the computationally efficient Barzilai-Borwein (BB) method by incorporating it into the SVRG. We also incorporate a BB-step size as a variant. We show linear convergence to not only the proposed method but also the other existing SVRG variants that use second-order information. We conduct the numerical experiments on the benchmark datasets and demonstrate that the proposed method with a constant step size outperforms the existing variance reduced methods for some test problems. Figure 1. comparison of M1, M2, and M3 on adult, covtype, gisette and mnist38

We propose several modifications of the Barzilai–Borwein (BB) step size in the variance reduction (VR) methods for finite-sum optimization problems. Our first approach relies on a scalar function, which we call the TaiL Function (TLF). The TLF maps the computed BB step size to some positive real number, which will be used as the step size instead. The computational overhead is almost negligible and the functional forms of TLFs in this work don’t involve any problem-dependent parameters. In the strongly convex setting, due to the undesirable appearance of the condition number \(\kappa \) in the linear convergence rate, the IFO complexity of VR methods with BB step size has the form... With the utilization of the TLF, the aforementioned complexity is improved to \(\mathcal {O}((n+\kappa ^{\tilde{a}})\log (1/\epsilon ))\), \(\tilde{a}\in \mathbb {R}_{+}, \tilde{a}<a\).

In the non-convex setting, we improve \(\mathcal {O}(n+n\epsilon ^{-1})\) of SVRG-SBB to \(\mathcal {O}(n+n^{\beta }\epsilon ^{-1})\), where \(\beta \in \mathbb {R}_{+}\) can take any value in (2/3, 1). Specifically, the constant step size regime is recovered by taking the TLF as a constant function, whose function value relies on problem-dependent parameters. As a counterpart of the constant step size regime, we also propose a BB-based vibration technique to set step sizes for VR methods, leading to methods with novel one-parameter step sizes. These methods have the same complexities compared to their constant step size versions. Meanwhile, they are more robust w.r.t. the sole step size parameter empirically.

Moreover, a novel analysis is proposed for SARAH-I-type methods in the strongly convex setting. Numerical tests corroborate the proposed methods. This is a preview of subscription content, log in via an institution to check access. Price excludes VAT (USA) Tax calculation will be finalised during checkout. Note that the need of increasing \(\eta _{\textrm{low}}\) and \(\eta _{\textrm{up}}\) excludes certain choices of TLFs. For example, we can’t choose decreasing functions.

And if an increasing function increases too slow in a neighbourhood of 0 (e.g., \(h(\eta )=\eta ^2\)), we can’t choose it either. Since in such cases, even though m is decreased, \(\eta _{\textrm{low}}\) and \(\eta _{\textrm{up}}\) still may not increase. We explore the inequality in more detail. Assume that the TLF satisfies \(h(0)=0\), then the inequality essentially requires:

People Also Search

We Propose Several Modifications Of The Barzilai–Borwein (BB) Step Size

We propose several modifications of the Barzilai–Borwein (BB) step size in the variance reduction (VR) methods for finite-sum optimization problems. Our first approach relies on a scalar function, which we call the TaiL Function (TLF). The TLF maps the computed BB step size to some positive real number, which will be used as the step size instead. The computational overhead is almost negligible an...

In The Non-convex Setting, We Improve \(\mathcal {O}(n+n\epsilon ^{-1})\) Of

In the non-convex setting, we improve \(\mathcal {O}(n+n\epsilon ^{-1})\) of SVRG-SBB to \(\mathcal {O}(n+n^{\beta }\epsilon ^{-1})\), where \(\beta \in \mathbb {R}_{+}\) can take any value in (2/3, 1). Specifically, the constant step size regime is recovered by taking the TLF as a constant function, whose function value relies on problem-dependent parameters. As a counterpart of the constant step...

Moreover, A Novel Analysis Is Proposed For SARAH-I-type Methods In

Moreover, a novel analysis is proposed for SARAH-I-type methods in the strongly convex setting. Numerical tests corroborate the proposed methods. This is a preview of subscription content, log in via an institution to check access. Price excludes VAT (USA) Tax calculation will be finalised during checkout. Note that the need of increasing \(\eta _{\textrm{low}}\) and \(\eta _{\textrm{up}}\) exclud...

And If An Increasing Function Increases Too Slow In A

And if an increasing function increases too slow in a neighbourhood of 0 (e.g., \(h(\eta )=\eta ^2\)), we can’t choose it either. Since in such cases, even though m is decreased, \(\eta _{\textrm{low}}\) and \(\eta _{\textrm{up}}\) still may not increase. We explore the inequality in more detail. Assume that the TLF satisfies \(h(0)=0\), then the inequality essentially requires: The Barzilai-Borwe...

The Original BB Step Sizes Are Two Special Cases Of

The original BB step sizes are two special cases of the new step sizes. Numerical experiments verify the effectiveness of the new step sizes. Keywords: Barzilai-Borwein , variational, least squares, step size, convergence Mathematics Subject Classification: 90C20, 90C25, 90C30. In this paper, we consider the unconstrained optimization problem where f:ℝn⟶ℝ:𝑓⟶superscriptℝ𝑛ℝf:\mathbb{R}^{n}\longrig...