Skip to main content

Questions tagged [optimization]

Use this tag for any use of optimization within statistics.

Filter by
Sorted by
Tagged with
6 votes
1 answer
149 views

I’m trying to understand the common assumptions in machine-learning optimization theory, where a “well-behaved” loss function is often required to be both L-Lipschitz and β-smooth (i.e., have β-...
Antonios Sarikas's user avatar
1 vote
0 answers
30 views

It is widely known that if you were to calculate the maximizer of the dual SVM program (denote as $\alpha^*$), then the primal minimizer of the hard-margin SVM program, \begin{aligned}&{\underset {...
Your neighbor Todorovich's user avatar
0 votes
0 answers
9 views

What are other optimization tools that are powerful enough to improve the accuracy performance of the neural network model? Please give me recent tools that are powerful
bbadyalina's user avatar
3 votes
1 answer
68 views

Given an empirical cdf $\hat{F}$ with support on $[0,1]$, I am interested in finding the histogram with $B$ (unequal) bins with cdf $F_B$ that minimizes the maximum absolute deviation between the cdfs....
Leland Stirner's user avatar
2 votes
0 answers
42 views

I am trying to figure out how to infer C in support vector machine. C is the upper bound on magnitude of lagrange multipliers. These multipliers are not independent. They are probably mutually ...
Coo's user avatar
  • 121
1 vote
0 answers
60 views

In the context of an engineering design project that requires determining optimal design configurations (e.g., finding optimal design configurations of nozzle that maximise thrust ratio and discharge ...
xminx's user avatar
  • 11
10 votes
3 answers
753 views

A grocery store has $n$ watermelons to sell and makes $\$ 1.00$ on each sale. Say the number of consumers of these watermelons is a random variable with a distribution that can be approximated by $$f(...
Epsilon_45's user avatar
4 votes
1 answer
147 views

I am trying to iteratively optimize a set of vectors $\{w_1, w_2, ..., w_n\}$ such that the following holds: $$ w_r = \begin{cases} \underset{w}{\arg\min} \; \sum_x \left\lVert (x^\top w) w - x \...
Aniruddha's user avatar
  • 143
2 votes
1 answer
124 views

PREMISES: this question likely arises from my very basic knowledge of the field. Please, be very detailed in the answer, even it can seem that some facts are trivial. Also, sorry for my poor english. ...
2by2is2mod2's user avatar
14 votes
2 answers
441 views

I am developing an app. Let $f:X\subseteq \mathbb{R}^n \rightarrow \mathbb{R}$ be a function satisfying some regularity conditions (e.g. continuity and smoothness), and let $2 \leq n \leq 100$. $f$ ...
Escherichia's user avatar
2 votes
2 answers
134 views

I am trying to fit the phase angle of complex data with a very simple function phi(f) = mf, where m is the gradient and ...
Danica Scott's user avatar
7 votes
1 answer
339 views

The start parameter of the glmmTMB function takes a list. Some possible components of that list are: ...
robertspierre's user avatar
1 vote
0 answers
61 views

I have a problem similar to one I posted about recently but sufficiently different to warrant its own discussion I think. I have k functions, each of the same k-dimensional vector x, and I want to ...
gazza89's user avatar
  • 2,532
0 votes
0 answers
69 views

I'm researching the statistical convergence properties of a recursive system that arises during the training of custom neural network structure. My specific question is: How can I prove convergence of ...
Guillaume's user avatar
0 votes
0 answers
34 views

I have a two dimensional state space MPD with state space $(s,i)$ where $s$ can take values in natural numbers and $i \in \{1,2\}$. I have written the policy evaluation equation for a policy $$r-g+(P-...
user avatar
0 votes
0 answers
45 views

I have a large dataset of soil moisture data (satellite) and water table depths (measurements). I would like to derive the optimum soil moisture levels to predict the water table depths most ...
Thomas's user avatar
  • 538
1 vote
1 answer
111 views

I'm trying to find the vector of parameters x which gets me the optimal reward, subject to a couple of constraints like $f(x)=k$ and $g(x) \geq C $. I have lower and upper bounds for each component of ...
gazza89's user avatar
  • 2,532
0 votes
0 answers
51 views

I am optimising an objective function w.r.t some box-like constraints using (Adaptive Moment Estimation (ADAM). Are there any techniques to help me sample the region around the solution that gives me ...
jercai's user avatar
  • 101
8 votes
1 answer
174 views

Given $X_1, ..., X_n \sim N(\theta, \theta^2)$, I'm trying to find the MLE for $\theta$. This is similar to previous posts like this one: MLE of $\theta$ in $N(\theta, \theta^2)$ However, suppose we ...
djtech's user avatar
  • 135
2 votes
0 answers
36 views

I have a large set of data, and I'm looking for a subset with certain properties. The whole set is made up of $N$ vectors in $\mathbb R^n$, and I have a target mean vector $\overrightarrow \mu$ and ...
user avatar
0 votes
0 answers
58 views

I am a bit confused about the concept of convexity analysis when doing model fitting. Say I have developed some model of two parameters $f(x;\theta_1,\theta_2)$, that I will plan to fit to some data I ...
Heisenbugs's user avatar
1 vote
1 answer
46 views

I'm implementing a FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) optimizer in PyTorch for training neural networks with sparse regularization. My implementation doesn't seem to be working as ...
Maxou's user avatar
  • 21
1 vote
0 answers
48 views

I want to maximize the total number of shares of either A or B, by reallocating shares daily. For simplicity, the trades occur at each day’s closing prices. I'm basically determining the "optimal ...
Frankie139's user avatar
0 votes
0 answers
50 views

I am running a nonlinear earth system model to optimize 42 parameters p with 7 different kinds of observations $O_j$ where ...
Xu Shan's user avatar
  • 213
2 votes
1 answer
160 views

I read in the mlr3 book about nested resampling that: Nested resampling is a method to compare models and to estimate the generalization performance of a tuned model, however, this is the performance ...
ChickenTartR's user avatar
4 votes
0 answers
115 views

I'm reading [Ilya Loshchilov's work][1] on decoupled weight decay and regularization. The big takeaway seems to be that weight decay and $L^2$ norm regularization are the same for SGD but they are ...
Danny Wen's user avatar
  • 323
2 votes
0 answers
79 views

I'm working with a regression problem where I want to explain a dependent variable $Y$ using features $x_1, \ldots, x_n$. The main constraint is that the weights (coefficients) must sum to one: $$ \...
ridge_master's user avatar
2 votes
0 answers
59 views

I have a few curves that predict the same outcomes, all curves are extremely similar but vary a little in terms of noise and predictions (guessing they have lots of similar variables and some ...
jesal's user avatar
  • 21
13 votes
2 answers
627 views

I was watching a talk by Tom Goldstien about his work on stabilizing GANs with predictions. He used an interesting visualization comparing SGD to adversarial nets. Intuitively, one is looking for the ...
Danny Wen's user avatar
  • 323
0 votes
0 answers
40 views

I'm working with a dataset where each data point has a shape of (96, 3), with each element being a value between 0 and 1. I have a set of approximately 75 reference data points. My goal is to take a ...
molecularPost's user avatar
0 votes
0 answers
36 views

I've been reading about score matching and I have a very basic question about how one would (naively) implement the algorithm via gradient descent. Say I have some sort of neural network that that ...
Vasting's user avatar
  • 155
2 votes
1 answer
88 views

I have an exponential decay $f(t) = \sum_n \left( A_n e^{-\frac{t}{\tau_n}} \right) + c + \epsilon(t)$, where n represents the different exponential decay components, $A_n$ represents each decay ...
Oliver's user avatar
  • 21
1 vote
0 answers
152 views

Background. Given 2 samples of size $n$ and each datapoint has feature dimension $d$ -- the goal is to compute the Wasserstein-1 distance between the 2 samples. Question. What is the runtime ...
Resu's user avatar
  • 355
0 votes
0 answers
27 views

Problem Description: I am using a black-box nonlinear regression model to fit parameters based on measurement data. These measurements are taken from various physically distinct objects, but I expect ...
dvd8719's user avatar
  • 55
3 votes
0 answers
145 views

I'm reading the paper "Visualizing the Loss Landscape of Neural Nets" by Hao Li et al. In this paper, the authors visualize loss landscapes of neural networks using filter-wise normalized ...
Danny Wen's user avatar
  • 323
0 votes
0 answers
29 views

I am doing a cases study and I need to forecast the sales. I am using multi linear regression and also winter's method and the decomposition approach with Holt's method. I am using these methods as ...
Forecast's user avatar
0 votes
0 answers
53 views

I'm implementing Nesterov Accelerated Gradient Descent (NAG) on an Extreme Learning Machine (ELM) with one hidden layer. My loss function is the Mean Squared Error (MSE) with $L^2$ regularization. The ...
Paolo Pedinotti's user avatar
0 votes
0 answers
43 views

I am reading Kingma and Lei Ba's paper introducing the Adam optimizer. I was looking over some derivations for the second moment estimate: I noticed that they find the sum of a finite geometric ...
maticos's user avatar
3 votes
1 answer
80 views

The whole point behind Nesterov optimization is to calculate the gradient not at the current parameter values $\theta_t$, but at $\theta_t + \beta m$, where $\beta$ is the momentum coefficient and $m$ ...
Antonios Sarikas's user avatar
3 votes
2 answers
105 views

Suppose we are given a list of $N$ positive definite quadratic forms $X^TQ_k X$ (where $k\in[1,N]$ and $Q_k\in\mathbb{R}^{p\times p}$ $\forall k$), and a positive vector $V$ of same length $N$ i.e. $V=...
Ernest F's user avatar
0 votes
1 answer
95 views

I want to obtain a small optimal value of $k$ (with $k ≤ 5$) for k-means clustering on a dataset of size $5000$. I have used the BIC and the Gap statistic to determine the optimal number of clusters, ...
Aria's user avatar
  • 35
0 votes
0 answers
28 views

Considering a vector of decision variables $w\in\Re^{n\times 1}$, such that $w = \begin{bmatrix}w_1 & w_2 & \cdots & w_n\end{bmatrix}^\top$, which can be determined recursively by solving $...
Stephen Ge's user avatar
1 vote
0 answers
37 views

I am trying to cluster using PyGAD by minimizing the Euclidean distance (vanilla kmeans) for 2d points with the added objective that inter-cluster variance of a third feature, a weight, should be ...
Peter Hogan's user avatar
0 votes
0 answers
55 views

According to recent papers, the main reason why BatchNorm works is because it smooths the loss landscape. So if the main benefit is loss landscape smoothing, why do we need mean subtraction at all? ...
FadiBenz's user avatar
1 vote
0 answers
69 views

I'd like to write a function that incorporates the idea of this paper in R. It's about calculating local effect sizes for mixed models. More specifically: calculate $f^2$ for each of the model's ...
Mathijs's user avatar
  • 11
3 votes
2 answers
224 views

I have a dataset consisting of emotional time series data derived from hundreds of participants, who each took part in an ecological momentary assessment (EMA) study. Since each participant has ...
Piethon's user avatar
  • 124
0 votes
0 answers
50 views

Suppose we want to estimate $$r = \mathbb{E}_{x\backsim p(x)} [f(x)]$$ via importance sampling i.e. $$r = \mathbb{E}_{x\backsim q(x)} \left[\frac{f(x)p(x)}{q(x)}\right]$$ Now wikipedia says that ...
Lazy Guy's user avatar
0 votes
0 answers
50 views

I noticed, that for a simple polynomial, a straight line or a parabola for example, the propagation of errors when finding the roots, is not translation-invariant. For example, with a line: $$f(x) = a ...
Euler's user avatar
  • 123
0 votes
0 answers
68 views

A common problem in statistics is to assume a population, simulate many samples, and find parameters that most closely match (in an MSE sense) a desired set of statistics. For illustration, here is a ...
ivo Welch's user avatar
  • 161

1
2 3 4 5
57