Questions tagged [linear-regression]
Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.
772 questions
0
votes
1
answer
330
views
How do I fine-tune model performance after the initial run? (Scikit-Learn)
I've just started learning regression using scikit-learn and stumbled upon a problem. For a given dataset, let's say that I've imputed the missing data and one-hot encoded all categorical features. ...
1
vote
1
answer
276
views
Is it possible to explain why Lasso models eliminated certain coefficient?
Is it possible to understand why Lasso models eliminated specific coefficients?. During the modelling, many of the highly correlated features in data is being eliminated by Lasso regression. Is it ...
1
vote
1
answer
396
views
Understanding which variables impact your variable of interest the most (correlation, linear regression) and correctly interpreting results
How do you ascertain which variables lead to the greatest increase in another variable of interest?
Let's say you have a correlation matrix. You look at the row of the variable you are particularly ...
0
votes
0
answers
1k
views
NaN, inf or invalid value detected in endog, estimation infeasible error when training statsmodels GLM model
I am trying to build a GLM model (poisson family) using python statsmodels package on train data.
The data I have contains categorical values as exogenous variables and numerical values for my target (...
0
votes
1
answer
101
views
A question on intercepts and coefficients in linear regression
So I was studying through some sites and saw a Linear regression problem where a company is attempting to find the correct amount to spend on marketing. The example had a small dataset with units sold ...
0
votes
1
answer
126
views
Can I include a quotient as dependent variable and independent variables with same denominator in a linear model? How do we interpret such models?
I want to create a model in a food processing plant where my dependent variable is Electricity (KWhr) consumption per kg. Plant produce different food items with varying electricity consumption. I'm ...
1
vote
1
answer
102
views
How to fill missing values in a discrete column in sales predictions for a drug supply chain company
I have been working on a dataset that has data from a famous drug supply chain company. The first few records of the dataset look like the following;
Another data accompanies this (primary) dataset. ...
0
votes
2
answers
115
views
0
votes
2
answers
245
views
Linear Regression model underfitting
here is the source code of the model and the csv file. Using the csv file I have to apply linear regression Algorithm on it using "Sales" and "Profit". Train the model in such a ...
0
votes
0
answers
104
views
Why CNN Linear Regression predicting always same value?
I have dataset with around 3 million samples which almost fit to gauss distribution. X-axis are
normalized target values.
I am using WRN model and if i am solving binary or multi-class classification ...
4
votes
1
answer
719
views
How can I determine the accuracy of a hand-drawn line of best fit?
Here's the situation:
Users have manually drawn a straight line of best fit through a set
of data points. I have the equation (y = mx + c) for this line.
I have used least-squares regression to ...
4
votes
1
answer
1k
views
Are linear models better when dealing with too many features? If so, why?
I had to build a classification model in order to predict which what would be the user rating by using his/her review. (I was dealing with this dataset: Trip Advisor Hotel Reviews)
After some ...
0
votes
1
answer
488
views
Accessing regression coefficients when using MultiOutputRegressor
I am working on a multioutput (nr. targets: 2) regression task. The original data has a huge dimensionality (p>>n, i.e. there are far more predictors than ...
0
votes
0
answers
155
views
What correlation is considered to be big for linear regression predictors?
It is well known that if two linear regression predictors highly correlate, it is bad for our model, but which correlation is considered to be big? Is it 0.5,0.6,0.8,0.9..? I have tried to find out ...
0
votes
1
answer
365
views
Binary classification with seperate training and testing datasets [closed]
I have two datasets (train.csv) and (test.csv) revolving around predicting the death outcome for a disease. Both sets include 20 independent variables (age, weight, etc), but only the train.csv ...
1
vote
1
answer
107
views
Can anyone help me about cost function in linear regression. As from the below plot we have input values and predicted value there is no Y value, help
Can anyone help out please? I don't understand this
2
votes
1
answer
571
views
How to predict a discrete dependent variable on a continuous scale using regression
I am trying to find the 'optimal' amount of a certain medicinal cream to be applied to a patient in order to minimize the days the patient has a rash. However, the data for the cream doses are of the ...
1
vote
1
answer
225
views
Trouble understanding regression line learned by SGDRegressor
I am working on a demonstration notebook to better understand online (incremental) learning. I read in sklearn documentation that the number of regression models that support online learning via the <...
5
votes
1
answer
320
views
Visualizing effect of regularization for linear regression problem
I wanted to put together an example notebook to demonstrate how regularization makes an impact for such a simple model as a simple linear regression. When executing the below script though, I notice ...
1
vote
0
answers
72
views
statistical tests for null hypothesis - what if model is non linear?
I am reading the "An Introduction to Statistical Learning" (Gareth James & alii, Springer) as a primer to machine learning.
I am reading the part in linear regressors, and learnt there ...
0
votes
1
answer
333
views
Why are my ridge regression coefficients completely different from ordinary linear regression coefficients in MATLAB?
I am attempting to implement my own Ridge Regression algorithm and I am trying to achieve similar coefficients found in a MATLAB tutorial on regression.
Specifically, on the MATLAB tutorial page you ...
2
votes
1
answer
77
views
Return the gradient and y intercept (m, b) to create two lines to best fit the data
I have been working on this task for a few hours now and have been unsuccessful with getting the target result. I have tried using multiple methods of trying to split the dataset using different ...
0
votes
0
answers
56
views
Find $a, b, c$ minimizing MSE
Suppose you are given a "dummy" classifier. It looks like this:
$$
y(x) = \begin{cases} a \text{ if } x >= c \\ b \text{ else } \end{cases}
$$
Given some data set $\{(y_1, x_1), \dots (...
0
votes
1
answer
2k
views
Should one log transform discrete numerical variables?
I am working on a Linear Regression problem and one of the assumptions of a Linear Regression model is that the features should be Normally Distributed. Hence to convert my non linear features to ...
2
votes
0
answers
67
views
Why the line of Linear Regression is same as deming regression?
This is not a coding question. My doubt is purely mathematical. Say I take three points (1,2) (2,1) and (4,3)
A. I calculate the least fit line for linear regression. Simple linear regression(which ...
1
vote
1
answer
673
views
Building a linear regression model for every combination vs only one Machine Learning model
So my question is more on the conceptual side.
Given a dataset, I want to predict a given continuous variable Y. Now, there are 3 features, 2 categorical and one numerical (integer only). I know that ...
2
votes
0
answers
651
views
Linear regression with Pytorch not converging
I am trying to perform a simple linear regression using Pytorch lightning (a network with only one neuron). The network is supposed to learn a simple function: y=-4x...
1
vote
1
answer
650
views
Segmented function in R?
Could someone please explain what psi and npsi are?
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, keep.class=FALSE, ...)
...
2
votes
2
answers
187
views
Extracting linear trends from a dataset
Consider a sensor measurement f that varies with both temperature T and the properties of the fluid being measured. The temperature changes through each day and the fluid properties can be assumed to ...
1
vote
1
answer
62
views
Fit non-linear customised model
I have a data.frame that have two cols, $x=mz$ and $y=res$. There are about ~2 million rows in the DF. When I plot the graph I get the below.
What I'd like to do is ...
1
vote
1
answer
114
views
Why are we not checking the significance of the coefficients in Lasso and elastic net models
As far as I know, we don't check the coefficient significance in Lasso and elasticnet models. Is it because insignificant feature coefficients will be driven to zero in these models?. Does that mean ...
0
votes
2
answers
131
views
How can we make forecasts from stationary data
I'm confused about the concept of stationarity. Most definitions require the mean and Variance to be constant 'over any interval'. This statement confuses me, if any interval should have the same mean ...
4
votes
4
answers
1k
views
Multicollinearity vs Perfect multicollinearity for Linear regression
I have been trying to understand how multicollinearity within the independent variables would affect the Linear regression model. Wikipedia page suggests that only when there is a "perfect" ...
3
votes
2
answers
193
views
Does PCA helps to include all the variables even if there is high collinearity among variables?
I have a dataset that has high collinearity among variables. When I created the linear regression model, I could not include more than five variables ( I eliminated the feature whenever VIF>5). But ...
0
votes
0
answers
59
views
What is the best model for predicting delays?
Supposing we need to predict delays based on a previous dataset that contains the history of several, lets say, providers and their delivery delays. The goal is to minimize the loss due to those ...
1
vote
0
answers
33
views
Why there is a marked difference in metric scores using linear regression or MLP as readout for echo state network?
I am using a reservoir computing architecture comprising of an echo state network as per the paper Reservoir Computing Approaches for Representation and Classification of Multivariate Time Series
...
1
vote
1
answer
488
views
What's the correct cost function for Linear Regression
As we all know the cost function for linear regression is:
Where as when we use Ridge Regression we simply add lambda*slope**2 but there I always seee the below as cost function of linear Regression ...
1
vote
0
answers
40
views
AI algorithm model that outputs a list of unknown length [closed]
I have a dataset with the following x columns:
date
time
is_weekend
is_holiday
start_intersection
end_intersection
The output is a list of intersections, that connect start_intersection with ...
1
vote
0
answers
55
views
Does the appliance of R-squared to non-linear models depends on how we calculate it?
Does the appliance of R-squared to non-linear models depends on how we calculate it? $R^2 = \frac{SS_{exp}}{SS_{tot}}$ is going to be an inadequate measure for non-linear models since an increase of $...
2
votes
1
answer
1k
views
Does gradient descent always find global minimum for specific regression type?
From my understanding, linear regression is used for predicting an output based on an input using a linear equation that is optimally fitted to some input data. We choose the best fitted linear ...
1
vote
2
answers
42
views
Multiple Linear Regression for House Price Prediction score is 0.28 [closed]
I am trying to make predictions using this dataset
What I have done so far:
Dropped the Administrative column
Encoded the categorical data using ...
1
vote
0
answers
141
views
Implementation of a perceptron
I want to implement a single perceptron for linear regression using the following formulas:
the input data for the first case is one column (x(392, 1); y(392, 1)) and for the second case is (x(392, 7)...
1
vote
1
answer
61
views
One predictor variable and 3 response variable (categorical and continuous) [closed]
If I have predictor variables which are a mixture of continuous and categorical, and a response variable that is continuous.
What approach should I apply? Linear regression, logistic regression or k ...
1
vote
1
answer
1k
views
The effect of the λ in the Ridge regression
Why by increasing value of λ in Ridge estimator the slope of the line is decreasing? How exactly λ affects to the y = kx + b?
1
vote
0
answers
27
views
Error while calculating accuracy and matrix multiplication in tensor flow code for regression [closed]
I was writing a code for linear regression using tensor flow but I was getting errors while calculating matrix multiplication using tensor flow and while calculating accuracy.
...
1
vote
0
answers
75
views
Group points to reduce data set such that the linear regression stays the same
I have a very long dataset and I'm trying to reduce it by grouping the data in periods of 24 hours. In this way, there will be a single data point that represents that day, but they must yield the ...
1
vote
1
answer
41
views
Approximating weight of individual items from sum of their weight
Problem
I have a list of orders, approximation of their total weight and list of items they contain. I need to determine approximate weight of individual items.
In other words, I have a few thousand ...
1
vote
0
answers
39
views
Linear regression of times series data with heteroskedasticity
I am trying to find out if stock market movements, on average and in extreme conditions, affect gold prices. I am following the regression model proposed by Baur and McDermott (2010) which is given as:...
3
votes
2
answers
5k
views
Constraining linear regressor parameters in scikit-learn?
I'm using sklearn.linear_model.Ridge to use ridge regression to extract the coefficients of a polynomial.
However, some of the coefficients have physical ...
1
vote
0
answers
100
views
Sudden jumps in accuracy with logistic regression and bag of words : "glm.fit: algorithm did not converge"
I work on a bag of words, on the Toxic Comments Classifications challenge. The challenge is closed but the dataset is very nice to learn.
I use R, tf-idf, tm, and logistic regression.
I have a strange ...