Questions tagged [linear-regression]
Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.
772 questions
1
vote
0
answers
46
views
MMM model vs Monte Carlo
I was given a project where only using Net Media Value and possibily audience considered , I have to try to estimate sales and unit return of media investment. I was asked to try to apply a Monte ...
7
votes
2
answers
178
views
Why are kernel methods often considered hard to interpret, despite accessible θ coefficients?
I'm trying to understand why kernel methods are frequently regarded as difficult to interpret. To me, in principle, the model’s parameters are accessible.
We are trying to learn $h_{\theta}$ (let's ...
5
votes
1
answer
84
views
Changes over time is significant
I am not sure if this is the right place to ask, but I have two fecundity datasets per year. One for males, the other for females:
To give an excerpt of the data:
Gender
year
number born
M
1990
1
M
...
1
vote
0
answers
44
views
Regression analysis for histograms
I am working in the field of LIDAR/RADAR and could use your help in exploring certain ideas. I have a certain scenario where I want to map histograms to certain numerical value (distance of object in ...
5
votes
1
answer
81
views
"Singular values of x" in LinearRegression
LinearRegression has an attribute singular_ which returns "singular values of x". According to a definition I found: "singularity is ... when a ...
2
votes
1
answer
42
views
Categorical variable coefficient one hot encoded
After Linear Regression, one of my categorical variable (gender) got OHE and as a result I have 2 coefficient for gender_0 and gender_1. How do I stop Orange from OHE that variable so that I only have ...
3
votes
1
answer
112
views
Constant feature ignored by Spark LinearRegression?
I am running a linear regression model using PySpark, and came across following weird behavior:
When I include a constant feature (representing an intercept term), it is ignored completely by Spark. I....
1
vote
0
answers
52
views
Predicting PGA Tour results with Linear Regression
I have curated a dataset from various online sources that contains information about each PGA player's weekly performance/trends. I'm attempting to predict their finishing positions at the next ...
-1
votes
1
answer
185
views
Why linear regression doing not so well with respect to walk-forward validation?
I followed from this question1,question2.
I have the following task to do: I have time series data. Training by the consecutive 3 days to predict the each 4th day. Each day data represents one CSV ...
3
votes
1
answer
128
views
How to incorporate weights (probability measurements) of data into a mean squared error loss function
I am training a CNN to regress on 4 targets related to a given image. Within the image is a point of interest whose position can be defined by phi, and theta (corresponding to x and y of a normal ...
0
votes
1
answer
157
views
Finding importance of features on a target variable
I have a dataset containing features and a target variable, all of which are numeric values. I wanted to see which variables influence the target variable in what way, if at all, and thought a ...
1
vote
1
answer
50
views
Can I use percentages to determine the influence of one variable on a dependent variable?
I have four independent variables to analyze their influence on one independent variable. One of the independent variables is coded in percentage. How can I determine its influence on the dependent ...
0
votes
0
answers
62
views
Linear Regression with coefficients coming from LightGBM
I was wondering if anyone has tried to use a LightGBM to estimate the alpha and beta of a linear regression model. I am looking into this because I am seeking an interpretable model. A direct lgbm ...
4
votes
1
answer
656
views
Why linear regression doing well in time series data?
I followed from this question.
I have the following task to do: I have time series data. Training by the consecutive 3 days to predict the each 4th day. Each day data represents one CSV file which ...
3
votes
1
answer
80
views
When I use linear regression in machine learning, variables selection is same as choosing turning parameters?
I am a newbie in machine learning. After days of studying the ideas of machine learning, I have made some conclusions, which are below (I only consider supervised learning).
Step 1: Data splitting
...
3
votes
1
answer
290
views
regression model outperform every models
I followed from this question.
Case1:
I have the following task: Train for consecutive 3 days to
predict each fourth day. Each day's data represents one CSV file,
which has dimensions 24x25. Each ...
7
votes
1
answer
336
views
When the regression models outperforms naive method?
I followed from this question.
Case1:
I have the following task to do: Training by the consecutive 3 days to predict the each 4th day. Each day data represents one CSV file which has dimension 24x25. ...
2
votes
1
answer
224
views
Is the dataset fit for Linear and Logistic Regression
I am trying to check the correlation in a red wine quality dataset via a scatter plot but it seems it just doesn't seem to be linear.
I have applied the preprocessing steps below:
Standard Scaler ...
1
vote
1
answer
414
views
What does "overfitting" exactly means in linear regression?
I was trying to understand the overfitting concept. So I know that when the training R^2 is greater than 95% it means the model is overfitted and after doing some ...
1
vote
1
answer
62
views
Multivariate linear regression via scikit and statsmodels
want to preface this first with terminology: multivariate regression deals with the case where there are more than one dependent variables while multiple regression deals with the case where there is ...
2
votes
2
answers
59
views
Understanding the Role of Dummy Variables in Categorical Regression Models
In a categorical regression model with $k$ categories, we use $k-1$ dummy variables. I understand that the $k$-th dummy variable is redundant because the information from the first $k-1$ dummies is ...
1
vote
1
answer
64
views
Interpreting the variance of parameter estimates in linear regression
I am reading through ESL and came across this equation (3.6) where the variance of the parameter estimates are provided as $$Var(\hat{\beta}) = (X^TX)^{-1}{\sigma}^2$$
I can understand the ...
3
votes
4
answers
350
views
My results from linear regression differs from my collegues despite having same data. Is this to be expected?
Long story short: Guy who did these calculations quit and did not leave any code behind. Now I am tasked with recreating the necessarry calculations to perform this years calculations - but my results ...
0
votes
1
answer
94
views
With ridge regression, weights can approach 0 for large values of lambda but will never equal 0 (unlike Lasso). Why?
I've been trying to figure out why Ridge regression has weights approach 0 for large values of lambda but they are never equal to 0, unlike Lasso and Simple Linear Regression.
According to this ...
0
votes
1
answer
60
views
training data includes data not needing predictions - should these be included in training? (best practice question)
Best practice advice for linear regression - if training data contains entries that do not need predictions, is it commonplace to remove these entries? For example, if you are predicting a fare ...
0
votes
1
answer
45
views
Probablistic Assumption on Linear Regression
I am reading Stanford CS229's lecture notes online and on page 16 (page 17 in PDF page identification) and I am stuck on understanding a good portion of the page. For the context, we assume that the ...
0
votes
0
answers
36
views
What's the difference between my OLS from scratch vs sklearn's OLS?
I'm coding linear regression via OLS from scratch. When I compare the results to scikit-learn's implementation, the coefficients in my version appear to be twice the magnitude of scikit-learn's.
I'm ...
0
votes
0
answers
43
views
1
vote
1
answer
48
views
1
vote
1
answer
24
views
Optimize coefficients for multi variable linear regression of scoring metric
I have ecommerce site which I try to optimize my search results to give the most relevant ones for the user.
To give the most relevant results for searches I made a ...
1
vote
2
answers
880
views
Mean Absolute Error from Scratch in NumPy
I recently tried implementing MAE from scratch in NumPy. The loss value and the slope seem to be equivalent to what Scikit-learn outputs, but for some reason the intercept value seems to converge to ...
2
votes
0
answers
60
views
Are there any general theoretical results about the behavior of data in the neighborhood of a single data point?
I know from calculus that any relatively well-behaved function $y=f(x)$ can be approximated by a linear function $y=ax+b$ within a sufficiently small neighborhood around each point of an independent ...
0
votes
0
answers
28
views
Should you seasonally decompose TS data before linear regression?
I want to apply the U-MIDAS method which is basically Least Square regression to a cross sectioned time series. Do I need to seasonally decompose my X and Y and should I test for unit root? Some of ...
0
votes
1
answer
41
views
Using a very very small learning rate to not diverge?
i just started with machine learning and today i tried implementing the gradient descent algorithm for linear regression. If i use a bigger value for alpha(the learning rate) the absolute value of w ...
2
votes
1
answer
204
views
Linear regression with confidence interval
I am running a multivariate linear regression on noisy data, where the amount of error for each measurement is known (or at least estimated). It works reasonably well with weighted linear regression ...
0
votes
1
answer
71
views
Data splitting for OLS regression
This is what I have done ::
divided my dataset into training and testing sets--> got significant features via. feature selection using sequential feature selector ( MLxtend) on the training set--&...
6
votes
2
answers
196
views
What type of technique can be used to solve this question?
Apology for the ambiguous title, I do not know the term.
I have data of some products which a few variables: origin, weight, brand. For example:
Product A = "China, 100g, Brand X"
Product B ...
2
votes
0
answers
50
views
Correlation between predictions vs correlation between targets
In a multi-target model framework - where a separate model is estimated for each target - how can one take into account for correlations between targets during the training process ? For example say I ...
0
votes
0
answers
54
views
ML Methods For Modelling Latent Variables
I have some time series predictor variables, $\{\mathbf{X}_t\} = \{\mathbf{X}_0, \ldots, \mathbf{X}_n\}$, and some other time series data $\{\mathbf{Z}_t\} = \{\mathbf{Z}_0, \ldots, \mathbf{Z}_n\}$.
...
1
vote
1
answer
67
views
Minimize $\sum_i||Y_i-AX_i||^2$
I have N data vectors $X_i$ and N target vectors $Y_i$ where $i$ indexes the sample.
I would like to learn a linear map $A$ between the data and the target i.e find the matrix $A$ that minimize
$$\...
0
votes
1
answer
50
views
Can Linear Models infer Product Sum operation of Features to predict Target?
In a dataset of 9 columns: $X_1-X_8, y$.
$y = X_1 * X_5 + X_2 * X_6 + X_3 * X_7 + X_4 * X_8$
Can any form of linear model (anything but SVM, NN, Random Forest, XGBoost, etc.) predict $y$?
1
vote
1
answer
327
views
pos_label=1 is not a valid label. Should be one of [2,4]
I am trying to retrieve my precision score but I am getting an error as follows:
pos_label=1 is not a valid label. It should be one of [2 ,4]
And here is the code ...
1
vote
0
answers
64
views
Why do we have multi-target linear regression model? Is it solely because of the overwhelming number of target variables?
As the title stated: Why do we have multi-target linear regression model (a linear regression model that predicts several targets at once with a unique set of parameters)? Is it solely because of the ...
0
votes
1
answer
382
views
Effect on regression coefficients by multiplying a constant to a feature
I was solving one quiz question on Coursera and I found an interesting question.
If you double the value of a given feature (i.e. a specific column of
the feature matrix), what happens to the least-...
2
votes
1
answer
187
views
Model performance impact on social discrimination?
I am currently working on a project where the data concerns people and the dataset contain personal data with sensitive attributes. (typically: age, sex, handicap, race).
Now it seems there are mainly ...
0
votes
1
answer
136
views
Use prediction after using get_dummies in pandas?
I found similar question on this topic but no answer was helpful.
I had a data frame with a categorical column in it with 5 different values. I used get_dummies and used linear regression for ...
0
votes
1
answer
40
views
Are my regression metrics value correct?
So im using a dataset for Wine Prediction where im using Linear Regression model to predict the prices.
These are the steps i'm using:
...
0
votes
1
answer
117
views
Linear regression shows b_0 negative while it is a positive quantity
In linear regression, x is weight and y is price; none of the x and y can be negative. The linear regression line with b_0=-57.9 shows a negative y for x<=10 approximately. This signifies that more ...
0
votes
1
answer
303
views
Why linear kernel regression is equivalent to plain linear regression?
I am trying to understand either intuitively/geometricaly and/or mathematicaly why the followings are equivalent:
Classic Ordinay Least Squares linear regression
Linear-kernelized Ordinary Least ...
1
vote
0
answers
37
views
With infinite observations, would the weights resulting from ridge regression be the same as simple linear regression?
As the number of observations approaches infinity, do the weights of a linear regression approach the weights of a linear regression with L2 penalty?