Newest 'optimization' Questions

6 votes

1 answer

149 views

Why do “good” loss functions in ML need both Lipschitz continuity and smoothness?

I’m trying to understand the common assumptions in machine-learning optimization theory, where a “well-behaved” loss function is often required to be both L-Lipschitz and β-smooth (i.e., have β-...

Antonios Sarikas

881

asked Nov 26 at 17:39

1 vote

0 answers

30 views

Is the strong duality of the hard-margin SVM really trivially satisfied all the time?

It is widely known that if you were to calculate the maximizer of the dual SVM program (denote as $\alpha^*$), then the primal minimizer of the hard-margin SVM program, \begin{aligned}&{\underset {...

Your neighbor Todorovich

707

asked Nov 17 at 13:36

0 votes

0 answers

9 views

Are there any other powerful optimization tools available besides the ABC and PSO algorithms? [duplicate]

What are other optimization tools that are powerful enough to improve the accuracy performance of the neural network model? Please give me recent tools that are powerful

bbadyalina

823

asked Nov 5 at 1:10

3 votes

1 answer

68 views

Efficient minimization of minimax objective function involving piecewise linear functions

Given an empirical cdf $\hat{F}$ with support on $[0,1]$, I am interested in finding the histogram with $B$ (unequal) bins with cdf $F_B$ that minimizes the maximum absolute deviation between the cdfs....

Leland Stirner

243

asked Oct 21 at 23:36

2 votes

0 answers

42 views

What's the distribution of the lagrange multipliers found by quadratic programming? [closed]

I am trying to figure out how to infer C in support vector machine. C is the upper bound on magnitude of lagrange multipliers. These multipliers are not independent. They are probably mutually ...

Coo

121

asked Sep 29 at 23:58

1 vote

0 answers

60 views

Impact of Full Probability Distribution in GP Regression on Optimisation

In the context of an engineering design project that requires determining optimal design configurations (e.g., finding optimal design configurations of nozzle that maximise thrust ratio and discharge ...

xminx

11

asked Sep 19 at 13:30

10 votes

3 answers

753 views

Maximizing profit given a PMF

A grocery store has $n$ watermelons to sell and makes $\$ 1.00$ on each sale. Say the number of consumers of these watermelons is a random variable with a distribution that can be approximated by $$f(...

Epsilon_45

121

asked Sep 17 at 20:04

4 votes

1 answer

147 views

How do you maintain orthonormality during optimization?

I am trying to iteratively optimize a set of vectors $\{w_1, w_2, ..., w_n\}$ such that the following holds: $$ w_r = \begin{cases} \underset{w}{\arg\min} \; \sum_x \left\lVert (x^\top w) w - x \...

Aniruddha

143

asked Sep 9 at 17:33

2 votes

1 answer

124 views

A question about minimizing $l_2$ norm with regularization

PREMISES: this question likely arises from my very basic knowledge of the field. Please, be very detailed in the answer, even it can seem that some facts are trivial. Also, sorry for my poor english. ...

2by2is2mod2

123

asked Sep 3 at 20:52

14 votes

2 answers

441 views

How can I estimate a function from its level sets?

I am developing an app. Let $f:X\subseteq \mathbb{R}^n \rightarrow \mathbb{R}$ be a function satisfying some regularity conditions (e.g. continuity and smoothness), and let $2 \leq n \leq 100$. $f$ ...

Escherichia

404

asked Aug 24 at 1:47

2 votes

2 answers

134 views

How can I get scipy.curve_fit to converge on data involving a wrapping phase angle?

I am trying to fit the phase angle of complex data with a very simple function phi(f) = mf, where m is the gradient and ...

Danica Scott

23

asked Aug 22 at 4:34

7 votes

1 answer

339 views

What are "conditional modes"?

The start parameter of the glmmTMB function takes a list. Some possible components of that list are: ...

robertspierre

3,403

asked Aug 9 at 23:44

1 vote

0 answers

61 views

best approaches for multiple root finding when functions are not differentiable

I have a problem similar to one I posted about recently but sufficiently different to warrant its own discussion I think. I have k functions, each of the same k-dimensional vector x, and I want to ...

gazza89

2,532

asked Jul 18 at 12:32

0 votes

0 answers

69 views

Proving Convergence of Mean and Variance in a Recursive Gaussian Update Process

I'm researching the statistical convergence properties of a recursive system that arises during the training of custom neural network structure. My specific question is: How can I prove convergence of ...

Guillaume

1

asked Jul 16 at 9:28

0 votes

0 answers

34 views

Uniqueness of solution for bias vector in policy evaluation

I have a two dimensional state space MPD with state space $(s,i)$ where $s$ can take values in natural numbers and $i \in \{1,2\}$. I have written the policy evaluation equation for a policy $$r-g+(P-...

stochs

asked Jul 6 at 0:33

0 votes

0 answers

45 views

Prediction of optimum variables through XGboost

I have a large dataset of soil moisture data (satellite) and water table depths (measurements). I would like to derive the optimum soil moisture levels to predict the water table depths most ...

Thomas

538

asked Jun 26 at 9:59

1 vote

1 answer

111 views

What Constrained Optimization method to use when my objective isn't strictly differentiable

I'm trying to find the vector of parameters x which gets me the optimal reward, subject to a couple of constraints like $f(x)=k$ and $g(x) \geq C $. I have lower and upper bounds for each component of ...

gazza89

2,532

asked Jun 25 at 14:07

0 votes

0 answers

51 views

Sampling region about a minimum loss reached after optimisation

I am optimising an objective function w.r.t some box-like constraints using (Adaptive Moment Estimation (ADAM). Are there any techniques to help me sample the region around the solution that gives me ...

jercai

101

asked May 22 at 12:44

8 votes

1 answer

174 views

MLE for unrestricted $\theta$ of $N(\theta,\theta^2)$

Given $X_1, ..., X_n \sim N(\theta, \theta^2)$, I'm trying to find the MLE for $\theta$. This is similar to previous posts like this one: MLE of $\theta$ in $N(\theta, \theta^2)$ However, suppose we ...

djtech

135

asked May 22 at 5:12

2 votes

0 answers

36 views

Finding a subset with target mean and covariance

I have a large set of data, and I'm looking for a subset with certain properties. The whole set is made up of $N$ vectors in $\mathbb R^n$, and I have a target mean vector $\overrightarrow \mu$ and ...

Aaron

asked May 15 at 18:04

0 votes

0 answers

58 views

Convexity of loss function in model fitting without known data

I am a bit confused about the concept of convexity analysis when doing model fitting. Say I have developed some model of two parameters $f(x;\theta_1,\theta_2)$, that I will plan to fit to some data I ...

Heisenbugs

1

asked May 7 at 6:59

1 vote

1 answer

46 views

FISTA Optimizer Implementation for Neural Networks with Sparse Regularization

I'm implementing a FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) optimizer in PyTorch for training neural networks with sparse regularization. My implementation doesn't seem to be working as ...

Maxou

21

asked Apr 29 at 22:34

1 vote

0 answers

48 views

Portfolio optimisation for 2 shares - What are some recommended metrics to use?

I want to maximize the total number of shares of either A or B, by reallocating shares daily. For simplicity, the trades occur at each day’s closing prices. I'm basically determining the "optimal ...

Frankie139

63

asked Apr 23 at 2:21

0 votes

0 answers

50 views

Questions about calculating uncertainty and correlation matrix of model parameters from optimization

I am running a nonlinear earth system model to optimize 42 parameters p with 7 different kinds of observations $O_j$ where ...

Xu Shan

213

asked Apr 17 at 12:53

2 votes

1 answer

160 views

How many folds should a unnnested CV have compared to a nested CV

I read in the mlr3 book about nested resampling that: Nested resampling is a method to compare models and to estimate the generalization performance of a tuned model, however, this is the performance ...

ChickenTartR

43

asked Apr 8 at 7:39

4 votes

0 answers

115 views

Difference between weight decay and L2 regularization

I'm reading [Ilya Loshchilov's work][1] on decoupled weight decay and regularization. The big takeaway seems to be that weight decay and $L^2$ norm regularization are the same for SGD but they are ...

Danny Wen

323

asked Apr 6 at 0:43

2 votes

0 answers

79 views

Constrained Ridge Regression with Prior Estimates and Multicollinearity

I'm working with a regression problem where I want to explain a dependent variable $Y$ using features $x_1, \ldots, x_n$. The main constraint is that the weights (coefficients) must sum to one: $$ \...

ridge_master

21

asked Apr 4 at 16:00

2 votes

0 answers

59 views

Best model to combine predictors [closed]

I have a few curves that predict the same outcomes, all curves are extremely similar but vary a little in terms of noise and predictions (guessing they have lots of similar variables and some ...

jesal

21

asked Mar 31 at 0:49

13 votes

2 answers

627 views

Understanding the Saddle Point Intuition in GANs

I was watching a talk by Tom Goldstien about his work on stabilizing GANs with predictions. He used an interesting visualization comparing SGD to adversarial nets. Intuitively, one is looking for the ...

Danny Wen

323

asked Mar 27 at 21:58

0 votes

0 answers

40 views

Decomposing a Weighted Average of Multiple (96x3) Data Points, Including an Unknown Contribution

I'm working with a dataset where each data point has a shape of (96, 3), with each element being a value between 0 and 1. I have a set of approximately 75 reference data points. My goal is to take a ...

molecularPost

1

asked Mar 27 at 20:53

0 votes

0 answers

36 views

Score Matching Algorithim

I've been reading about score matching and I have a very basic question about how one would (naively) implement the algorithm via gradient descent. Say I have some sort of neural network that that ...

Vasting

155

asked Mar 14 at 18:59

0 votes

0 answers

19 views

Why does my deep reinforcement learning not converge at all? [duplicate]

...

waterbrother

1

asked Mar 14 at 2:31

2 votes

1 answer

88 views

In NLLS, how do you produce accurate estimates of RMSE(true_params) given RMSE(global_minimum_params)?

I have an exponential decay $f(t) = \sum_n \left( A_n e^{-\frac{t}{\tau_n}} \right) + c + \epsilon(t)$, where n represents the different exponential decay components, $A_n$ represents each decay ...

Oliver

21

asked Mar 12 at 22:45

1 vote

0 answers

152 views

Runtime complexity of Wasserstein distance

Background. Given 2 samples of size $n$ and each datapoint has feature dimension $d$ -- the goal is to compute the Wasserstein-1 distance between the 2 samples. Question. What is the runtime ...

Resu

355

asked Mar 12 at 5:28

0 votes

0 answers

27 views

How to Achieve Consistent Parameter Fitting Across Different Objects in a Nonlinear Regression Model?

Problem Description: I am using a black-box nonlinear regression model to fit parameters based on measurement data. These measurements are taken from various physically distinct objects, but I expect ...

dvd8719

55

asked Mar 10 at 22:21

3 votes

0 answers

145 views

Why do skip connections cause drastically smoother loss landscapes in neural networks?

I'm reading the paper "Visualizing the Loss Landscape of Neural Nets" by Hao Li et al. In this paper, the authors visualize loss landscapes of neural networks using filter-wise normalized ...

Danny Wen

323

asked Mar 6 at 20:37

0 votes

0 answers

29 views

smoothing parameters

I am doing a cases study and I need to forecast the sales. I am using multi linear regression and also winter's method and the decomposition approach with Holt's method. I am using these methods as ...

Forecast

1

asked Mar 6 at 13:41

0 votes

0 answers

53 views

Nesterov Accelerated Gradient Descent Stalling with High Regularization in Extreme Learning Machine

I'm implementing Nesterov Accelerated Gradient Descent (NAG) on an Extreme Learning Machine (ELM) with one hidden layer. My loss function is the Mean Squared Error (MSE) with $L^2$ regularization. The ...

Paolo Pedinotti

1

asked Mar 2 at 13:48

0 votes

0 answers

43 views

Second Moment (Uncentered Variance) Estimate of Gradient

I am reading Kingma and Lei Ba's paper introducing the Adam optimizer. I was looking over some derivations for the second moment estimate: I noticed that they find the sum of a finite geometric ...

maticos

1

asked Mar 2 at 1:17

3 votes

1 answer

80 views

Do deep learning frameworks "look ahead" when calculating gradient in Nesterov optimization?

The whole point behind Nesterov optimization is to calculate the gradient not at the current parameter values $\theta_t$, but at $\theta_t + \beta m$, where $\beta$ is the momentum coefficient and $m$ ...

Antonios Sarikas

881

asked Feb 22 at 22:20

3 votes

2 answers

105 views

Solve Least Square Problem of a Sum of $N$ Quadratic Forms with a Positive Vector

Suppose we are given a list of $N$ positive definite quadratic forms $X^TQ_k X$ (where $k\in[1,N]$ and $Q_k\in\mathbb{R}^{p\times p}$ $\forall k$), and a positive vector $V$ of same length $N$ i.e. $V=...

Ernest F

31

asked Feb 14 at 12:54

0 votes

1 answer

95 views

How to get a smaller number of optimal K in K-means clustering

I want to obtain a small optimal value of $k$ (with $k ≤ 5$) for k-means clustering on a dataset of size $5000$. I have used the BIC and the Gap statistic to determine the optimal number of clusters, ...

Aria

35

asked Feb 11 at 12:53

0 votes

0 answers

28 views

Is it possible to combine sub-optimization problems into one optimization problem?

Considering a vector of decision variables $w\in\Re^{n\times 1}$, such that $w = \begin{bmatrix}w_1 & w_2 & \cdots & w_n\end{bmatrix}^\top$, which can be determined recursively by solving $...

Stephen Ge

295

asked Feb 7 at 7:49

1 vote

0 answers

37 views

Genetic Algorithm Multi Objective Clustering using distance and variance [closed]

I am trying to cluster using PyGAD by minimizing the Euclidean distance (vanilla kmeans) for 2d points with the added objective that inter-cluster variance of a third feature, a weight, should be ...

Peter Hogan

11

asked Feb 6 at 4:16

0 votes

0 answers

55 views

If the main benefit of BatchNorm is loss landscape smoothing, why do we use z-score normalisation instead of min-max?

According to recent papers, the main reason why BatchNorm works is because it smooths the loss landscape. So if the main benefit is loss landscape smoothing, why do we need mean subtraction at all? ...

FadiBenz

31

asked Jan 31 at 10:00

1 vote

0 answers

69 views

Fix the random effects estimates in nlme/lme4

I'd like to write a function that incorporates the idea of this paper in R. It's about calculating local effect sizes for mixed models. More specifically: calculate $f^2$ for each of the model's ...

Mathijs

11

asked Jan 23 at 13:21

3 votes

2 answers

224 views

Is it possible to aggregate AIC/BIC values for participant-level model comparisons?

I have a dataset consisting of emotional time series data derived from hundreds of participants, who each took part in an ecological momentary assessment (EMA) study. Since each participant has ...

Piethon

124

asked Jan 20 at 8:21

0 votes

0 answers

50 views

Optimal Importance Sampling

Suppose we want to estimate $$r = \mathbb{E}_{x\backsim p(x)} [f(x)]$$ via importance sampling i.e. $$r = \mathbb{E}_{x\backsim q(x)} \left[\frac{f(x)p(x)}{q(x)}\right]$$ Now wikipedia says that ...

Lazy Guy

35

asked Jan 16 at 18:21

0 votes

0 answers

50 views

Propagation of errors not being invariable

I noticed, that for a simple polynomial, a straight line or a parabola for example, the propagation of errors when finding the roots, is not translation-invariant. For example, with a line: $$f(x) = a ...

Euler

123

asked Jan 9 at 14:09

0 votes

0 answers

68 views

Optimizing Function With Measurement Error in R (Simulated Method of Moments)

A common problem in statistics is to assume a population, simulate many samples, and find parameters that most closely match (in an MSE sense) a desired set of statistics. For illustration, here is a ...

ivo Welch

161

asked Jan 7 at 17:01

Questions tagged [optimization]