Questions tagged [optimization]
Use this tag for any use of optimization within statistics.
2,833 questions
6
votes
1
answer
149
views
Why do “good” loss functions in ML need both Lipschitz continuity and smoothness?
I’m trying to understand the common assumptions in machine-learning optimization theory, where a “well-behaved” loss function is often required to be both L-Lipschitz and β-smooth (i.e., have β-...
1
vote
0
answers
30
views
Is the strong duality of the hard-margin SVM really trivially satisfied all the time?
It is widely known that if you were to calculate the maximizer of the dual SVM program (denote as $\alpha^*$), then the primal minimizer of the hard-margin SVM program,
\begin{aligned}&{\underset {...
0
votes
0
answers
9
views
Are there any other powerful optimization tools available besides the ABC and PSO algorithms? [duplicate]
What are other optimization tools that are powerful enough to improve the accuracy performance of the neural network model? Please give me recent tools that are powerful
3
votes
1
answer
68
views
Efficient minimization of minimax objective function involving piecewise linear functions
Given an empirical cdf $\hat{F}$ with support on $[0,1]$, I am interested in finding the histogram with $B$ (unequal) bins with cdf $F_B$ that minimizes the maximum absolute deviation between the cdfs....
2
votes
0
answers
42
views
What's the distribution of the lagrange multipliers found by quadratic programming? [closed]
I am trying to figure out how to infer C in support vector machine.
C is the upper bound on magnitude of lagrange multipliers. These multipliers are not independent. They are probably mutually ...
1
vote
0
answers
60
views
Impact of Full Probability Distribution in GP Regression on Optimisation
In the context of an engineering design project that requires determining optimal design configurations (e.g., finding optimal design configurations of nozzle that maximise thrust ratio and discharge ...
10
votes
3
answers
753
views
Maximizing profit given a PMF
A grocery store has $n$ watermelons to sell and makes $\$ 1.00$ on each sale. Say the number of consumers of these watermelons is a random variable with a distribution that can be approximated by
$$f(...
4
votes
1
answer
147
views
How do you maintain orthonormality during optimization?
I am trying to iteratively optimize a set of vectors $\{w_1, w_2, ..., w_n\}$ such that the following holds:
$$
w_r =
\begin{cases}
\underset{w}{\arg\min} \; \sum_x \left\lVert (x^\top w) w - x \...
2
votes
1
answer
124
views
A question about minimizing $l_2$ norm with regularization
PREMISES: this question likely arises from my very basic knowledge of the field. Please, be very detailed in the answer, even it can seem that some facts are trivial. Also, sorry for my poor english.
...
14
votes
2
answers
441
views
How can I estimate a function from its level sets?
I am developing an app. Let $f:X\subseteq \mathbb{R}^n \rightarrow \mathbb{R}$ be a function satisfying some regularity conditions (e.g. continuity and smoothness), and let $2 \leq n \leq 100$.
$f$ ...
2
votes
2
answers
134
views
How can I get scipy.curve_fit to converge on data involving a wrapping phase angle?
I am trying to fit the phase angle of complex data with a very simple function phi(f) = mf, where m is the gradient and ...
7
votes
1
answer
339
views
What are "conditional modes"?
The start parameter of the glmmTMB function takes a list.
Some possible components of that list are:
...
1
vote
0
answers
61
views
best approaches for multiple root finding when functions are not differentiable
I have a problem similar to one I posted about recently but sufficiently different to warrant its own discussion I think.
I have k functions, each of the same k-dimensional vector x, and I want to ...
0
votes
0
answers
69
views
Proving Convergence of Mean and Variance in a Recursive Gaussian Update Process
I'm researching the statistical convergence properties of a recursive system that arises during the training of custom neural network structure.
My specific question is: How can I prove convergence of ...
0
votes
0
answers
34
views
Uniqueness of solution for bias vector in policy evaluation
I have a two dimensional state space MPD with state space $(s,i)$ where $s$ can take values in natural numbers and $i \in \{1,2\}$. I have written the policy evaluation equation for a policy
$$r-g+(P-...
0
votes
0
answers
45
views
Prediction of optimum variables through XGboost
I have a large dataset of soil moisture data (satellite) and water table depths (measurements).
I would like to derive the optimum soil moisture levels to predict the water table depths most ...
1
vote
1
answer
111
views
What Constrained Optimization method to use when my objective isn't strictly differentiable
I'm trying to find the vector of parameters x which gets me the optimal reward, subject to a couple of constraints like $f(x)=k$ and $g(x) \geq C $.
I have lower and upper bounds for each component of ...
0
votes
0
answers
51
views
Sampling region about a minimum loss reached after optimisation
I am optimising an objective function w.r.t some box-like constraints using (Adaptive Moment Estimation (ADAM). Are there any techniques to help me sample the region around the solution that gives me ...
8
votes
1
answer
174
views
MLE for unrestricted $\theta$ of $N(\theta,\theta^2)$
Given $X_1, ..., X_n \sim N(\theta, \theta^2)$, I'm trying to find the MLE for $\theta$. This is similar to previous posts like this one: MLE of $\theta$ in $N(\theta, \theta^2)$
However, suppose we ...
2
votes
0
answers
36
views
Finding a subset with target mean and covariance
I have a large set of data, and I'm looking for a subset with certain properties. The whole set is made up of $N$ vectors in $\mathbb R^n$, and I have a target mean vector $\overrightarrow \mu$ and ...
0
votes
0
answers
58
views
Convexity of loss function in model fitting without known data
I am a bit confused about the concept of convexity analysis when doing model fitting. Say I have developed some model of two parameters $f(x;\theta_1,\theta_2)$, that I will plan to fit to some data I ...
1
vote
1
answer
46
views
FISTA Optimizer Implementation for Neural Networks with Sparse Regularization
I'm implementing a FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) optimizer in PyTorch for training neural networks with sparse regularization. My implementation doesn't seem to be working as ...
1
vote
0
answers
48
views
Portfolio optimisation for 2 shares - What are some recommended metrics to use?
I want to maximize the total number of shares of either A or B, by reallocating shares daily. For simplicity, the trades occur at each day’s closing prices. I'm basically determining the "optimal ...
0
votes
0
answers
50
views
Questions about calculating uncertainty and correlation matrix of model parameters from optimization
I am running a nonlinear earth system model to optimize 42 parameters p with 7 different kinds of observations $O_j$ where ...
2
votes
1
answer
160
views
How many folds should a unnnested CV have compared to a nested CV
I read in the mlr3 book about nested resampling that:
Nested resampling is a method to compare models and to estimate the generalization
performance of a tuned model, however, this is the performance ...
4
votes
0
answers
115
views
Difference between weight decay and L2 regularization
I'm reading [Ilya Loshchilov's work][1] on decoupled weight decay and regularization. The big takeaway seems to be that weight decay and $L^2$ norm regularization are the same for SGD but they are ...
2
votes
0
answers
79
views
Constrained Ridge Regression with Prior Estimates and Multicollinearity
I'm working with a regression problem where I want to explain a dependent variable $Y$ using features $x_1, \ldots, x_n$. The main constraint is that the weights (coefficients) must sum to one:
$$
\...
2
votes
0
answers
59
views
Best model to combine predictors [closed]
I have a few curves that predict the same outcomes, all curves are extremely similar but vary a little in terms of noise and predictions (guessing they have lots of similar variables and some ...
13
votes
2
answers
627
views
Understanding the Saddle Point Intuition in GANs
I was watching a talk by Tom Goldstien about his work on stabilizing GANs with predictions. He used an interesting visualization comparing SGD to adversarial nets. Intuitively, one is looking for the ...
0
votes
0
answers
40
views
Decomposing a Weighted Average of Multiple (96x3) Data Points, Including an Unknown Contribution
I'm working with a dataset where each data point has a shape of (96, 3), with each element being a value between 0 and 1. I have a set of approximately 75 reference data points.
My goal is to take a ...
0
votes
0
answers
36
views
Score Matching Algorithim
I've been reading about score matching and I have a very basic question about how one would (naively) implement the algorithm via gradient descent.
Say I have some sort of neural network that that ...
0
votes
0
answers
19
views
2
votes
1
answer
88
views
In NLLS, how do you produce accurate estimates of RMSE(true_params) given RMSE(global_minimum_params)?
I have an exponential decay
$f(t) = \sum_n \left( A_n e^{-\frac{t}{\tau_n}} \right) + c + \epsilon(t)$,
where n represents the different exponential decay components, $A_n$ represents each decay ...
1
vote
0
answers
152
views
Runtime complexity of Wasserstein distance
Background. Given 2 samples of size $n$ and each datapoint has feature dimension $d$ -- the goal is to compute the Wasserstein-1 distance between the 2 samples.
Question. What is the runtime ...
0
votes
0
answers
27
views
How to Achieve Consistent Parameter Fitting Across Different Objects in a Nonlinear Regression Model?
Problem Description:
I am using a black-box nonlinear regression model to fit parameters based on measurement data. These measurements are taken from various physically distinct objects, but I expect ...
3
votes
0
answers
145
views
Why do skip connections cause drastically smoother loss landscapes in neural networks?
I'm reading the paper "Visualizing the Loss Landscape of Neural Nets" by Hao Li et al. In this paper, the authors visualize loss landscapes of neural networks using filter-wise normalized ...
0
votes
0
answers
29
views
smoothing parameters
I am doing a cases study and I need to forecast the sales. I am using multi linear regression and also winter's method and the decomposition approach with Holt's method. I am using these methods as ...
0
votes
0
answers
53
views
Nesterov Accelerated Gradient Descent Stalling with High Regularization in Extreme Learning Machine
I'm implementing Nesterov Accelerated Gradient Descent (NAG) on an Extreme Learning Machine (ELM) with one hidden layer. My loss function is the Mean Squared Error (MSE) with $L^2$ regularization.
The ...
0
votes
0
answers
43
views
Second Moment (Uncentered Variance) Estimate of Gradient
I am reading Kingma and Lei Ba's paper introducing the Adam optimizer. I was looking over some derivations for the second moment estimate:
I noticed that they find the sum of a finite geometric ...
3
votes
1
answer
80
views
Do deep learning frameworks "look ahead" when calculating gradient in Nesterov optimization?
The whole point behind Nesterov optimization is to calculate the gradient not at the current parameter values $\theta_t$, but at $\theta_t + \beta m$, where $\beta$ is the momentum coefficient and $m$ ...
3
votes
2
answers
105
views
Solve Least Square Problem of a Sum of $N$ Quadratic Forms with a Positive Vector
Suppose we are given a list of $N$ positive definite quadratic forms $X^TQ_k X$ (where $k\in[1,N]$ and
$Q_k\in\mathbb{R}^{p\times p}$ $\forall k$), and a positive vector $V$ of same length $N$ i.e. $V=...
0
votes
1
answer
95
views
How to get a smaller number of optimal K in K-means clustering
I want to obtain a small optimal value of $k$ (with $k ≤ 5$) for k-means clustering on a dataset of size $5000$. I have used the BIC and the Gap statistic to determine the optimal number of clusters, ...
0
votes
0
answers
28
views
Is it possible to combine sub-optimization problems into one optimization problem?
Considering a vector of decision variables $w\in\Re^{n\times 1}$, such that $w = \begin{bmatrix}w_1 & w_2 & \cdots & w_n\end{bmatrix}^\top$, which can be determined recursively by solving $...
1
vote
0
answers
37
views
Genetic Algorithm Multi Objective Clustering using distance and variance [closed]
I am trying to cluster using PyGAD by minimizing the Euclidean distance (vanilla kmeans) for 2d points with the added objective that inter-cluster variance of a third feature, a weight, should be ...
0
votes
0
answers
55
views
If the main benefit of BatchNorm is loss landscape smoothing, why do we use z-score normalisation instead of min-max?
According to recent papers, the main reason why BatchNorm works is because it smooths the loss landscape. So if the main benefit is loss landscape smoothing, why do we need mean subtraction at all? ...
1
vote
0
answers
69
views
Fix the random effects estimates in nlme/lme4
I'd like to write a function that incorporates the idea of this paper in R. It's about calculating local effect sizes for mixed models. More specifically: calculate $f^2$ for each of the model's ...
3
votes
2
answers
224
views
Is it possible to aggregate AIC/BIC values for participant-level model comparisons?
I have a dataset consisting of emotional time series data derived from hundreds of participants, who each took part in an ecological momentary assessment (EMA) study. Since each participant has ...
0
votes
0
answers
50
views
Optimal Importance Sampling
Suppose we want to estimate
$$r = \mathbb{E}_{x\backsim p(x)} [f(x)]$$ via importance sampling i.e.
$$r = \mathbb{E}_{x\backsim q(x)} \left[\frac{f(x)p(x)}{q(x)}\right]$$
Now wikipedia says that ...
0
votes
0
answers
50
views
Propagation of errors not being invariable
I noticed, that for a simple polynomial, a straight line or a parabola for example, the propagation of errors when finding the roots, is not translation-invariant.
For example, with a line:
$$f(x) = a ...
0
votes
0
answers
68
views
Optimizing Function With Measurement Error in R (Simulated Method of Moments)
A common problem in statistics is to assume a population, simulate many samples, and find parameters that most closely match (in an MSE sense) a desired set of statistics. For illustration, here is a ...