Skip to main content

Questions tagged [linear-regression]

Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.

Filter by
Sorted by
Tagged with
0 votes
1 answer
330 views

I've just started learning regression using scikit-learn and stumbled upon a problem. For a given dataset, let's say that I've imputed the missing data and one-hot encoded all categorical features. ...
Garreth Lee's user avatar
1 vote
1 answer
276 views

Is it possible to understand why Lasso models eliminated specific coefficients?. During the modelling, many of the highly correlated features in data is being eliminated by Lasso regression. Is it ...
NAS_2339's user avatar
  • 273
1 vote
1 answer
396 views

How do you ascertain which variables lead to the greatest increase in another variable of interest? Let's say you have a correlation matrix. You look at the row of the variable you are particularly ...
Learning_and_xbox's user avatar
0 votes
0 answers
1k views

I am trying to build a GLM model (poisson family) using python statsmodels package on train data. The data I have contains categorical values as exogenous variables and numerical values for my target (...
Karima Touati's user avatar
0 votes
1 answer
101 views

So I was studying through some sites and saw a Linear regression problem where a company is attempting to find the correct amount to spend on marketing. The example had a small dataset with units sold ...
Benjamin Diaz's user avatar
0 votes
1 answer
126 views

I want to create a model in a food processing plant where my dependent variable is Electricity (KWhr) consumption per kg. Plant produce different food items with varying electricity consumption. I'm ...
NAS_2339's user avatar
  • 273
1 vote
1 answer
102 views

I have been working on a dataset that has data from a famous drug supply chain company. The first few records of the dataset look like the following; Another data accompanies this (primary) dataset. ...
Ritik P. Nayak's user avatar
0 votes
2 answers
245 views

here is the source code of the model and the csv file. Using the csv file I have to apply linear regression Algorithm on it using "Sales" and "Profit". Train the model in such a ...
Abrar Hussain's user avatar
0 votes
0 answers
104 views

I have dataset with around 3 million samples which almost fit to gauss distribution. X-axis are normalized target values. I am using WRN model and if i am solving binary or multi-class classification ...
TGD's user avatar
  • 1
4 votes
1 answer
719 views

Here's the situation: Users have manually drawn a straight line of best fit through a set of data points. I have the equation (y = mx + c) for this line. I have used least-squares regression to ...
Rob's user avatar
  • 81
4 votes
1 answer
1k views

I had to build a classification model in order to predict which what would be the user rating by using his/her review. (I was dealing with this dataset: Trip Advisor Hotel Reviews) After some ...
dsbr__0's user avatar
  • 191
0 votes
1 answer
488 views

I am working on a multioutput (nr. targets: 2) regression task. The original data has a huge dimensionality (p>>n, i.e. there are far more predictors than ...
lazarea's user avatar
  • 299
0 votes
0 answers
155 views

It is well known that if two linear regression predictors highly correlate, it is bad for our model, but which correlation is considered to be big? Is it 0.5,0.6,0.8,0.9..? I have tried to find out ...
No Name's user avatar
  • 21
0 votes
1 answer
365 views

I have two datasets (train.csv) and (test.csv) revolving around predicting the death outcome for a disease. Both sets include 20 independent variables (age, weight, etc), but only the train.csv ...
user113243's user avatar
2 votes
1 answer
571 views

I am trying to find the 'optimal' amount of a certain medicinal cream to be applied to a patient in order to minimize the days the patient has a rash. However, the data for the cream doses are of the ...
visionboy4's user avatar
1 vote
1 answer
225 views

I am working on a demonstration notebook to better understand online (incremental) learning. I read in sklearn documentation that the number of regression models that support online learning via the <...
lazarea's user avatar
  • 299
5 votes
1 answer
320 views

I wanted to put together an example notebook to demonstrate how regularization makes an impact for such a simple model as a simple linear regression. When executing the below script though, I notice ...
lazarea's user avatar
  • 299
1 vote
0 answers
72 views

I am reading the "An Introduction to Statistical Learning" (Gareth James & alii, Springer) as a primer to machine learning. I am reading the part in linear regressors, and learnt there ...
user305883's user avatar
0 votes
1 answer
333 views

I am attempting to implement my own Ridge Regression algorithm and I am trying to achieve similar coefficients found in a MATLAB tutorial on regression. Specifically, on the MATLAB tutorial page you ...
user1068636's user avatar
2 votes
1 answer
77 views

I have been working on this task for a few hours now and have been unsuccessful with getting the target result. I have tried using multiple methods of trying to split the dataset using different ...
Sultan's user avatar
  • 21
0 votes
0 answers
56 views

Suppose you are given a "dummy" classifier. It looks like this: $$ y(x) = \begin{cases} a \text{ if } x >= c \\ b \text{ else } \end{cases} $$ Given some data set $\{(y_1, x_1), \dots (...
nutcracker's user avatar
0 votes
1 answer
2k views

I am working on a Linear Regression problem and one of the assumptions of a Linear Regression model is that the features should be Normally Distributed. Hence to convert my non linear features to ...
spectre's user avatar
  • 2,288
2 votes
0 answers
67 views

This is not a coding question. My doubt is purely mathematical. Say I take three points (1,2) (2,1) and (4,3) A. I calculate the least fit line for linear regression. Simple linear regression(which ...
Ashish Gour's user avatar
1 vote
1 answer
673 views

So my question is more on the conceptual side. Given a dataset, I want to predict a given continuous variable Y. Now, there are 3 features, 2 categorical and one numerical (integer only). I know that ...
DPM's user avatar
  • 113
2 votes
0 answers
651 views

I am trying to perform a simple linear regression using Pytorch lightning (a network with only one neuron). The network is supposed to learn a simple function: y=-4x...
erap129's user avatar
  • 121
1 vote
1 answer
650 views

Could someone please explain what psi and npsi are? segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, ...) ...
Ann Zee's user avatar
  • 11
2 votes
2 answers
187 views

Consider a sensor measurement f that varies with both temperature T and the properties of the fluid being measured. The temperature changes through each day and the fluid properties can be assumed to ...
Tunneller's user avatar
  • 141
1 vote
1 answer
62 views

I have a data.frame that have two cols, $x=mz$ and $y=res$. There are about ~2 million rows in the DF. When I plot the graph I get the below. What I'd like to do is ...
V. Lad's user avatar
  • 11
1 vote
1 answer
114 views

As far as I know, we don't check the coefficient significance in Lasso and elasticnet models. Is it because insignificant feature coefficients will be driven to zero in these models?. Does that mean ...
NAS_2339's user avatar
  • 273
0 votes
2 answers
131 views

I'm confused about the concept of stationarity. Most definitions require the mean and Variance to be constant 'over any interval'. This statement confuses me, if any interval should have the same mean ...
Aditya Prakash's user avatar
4 votes
4 answers
1k views

I have been trying to understand how multicollinearity within the independent variables would affect the Linear regression model. Wikipedia page suggests that only when there is a "perfect" ...
ak1431's user avatar
  • 41
3 votes
2 answers
193 views

I have a dataset that has high collinearity among variables. When I created the linear regression model, I could not include more than five variables ( I eliminated the feature whenever VIF>5). But ...
NAS_2339's user avatar
  • 273
0 votes
0 answers
59 views

Supposing we need to predict delays based on a previous dataset that contains the history of several, lets say, providers and their delivery delays. The goal is to minimize the loss due to those ...
Alex Javarotti's user avatar
1 vote
0 answers
33 views

I am using a reservoir computing architecture comprising of an echo state network as per the paper Reservoir Computing Approaches for Representation and Classification of Multivariate Time Series ...
Jag's user avatar
  • 111
1 vote
1 answer
488 views

As we all know the cost function for linear regression is: Where as when we use Ridge Regression we simply add lambda*slope**2 but there I always seee the below as cost function of linear Regression ...
Chris_007's user avatar
  • 203
1 vote
0 answers
40 views

I have a dataset with the following x columns: date time is_weekend is_holiday start_intersection end_intersection The output is a list of intersections, that connect start_intersection with ...
Sharhad Bashar's user avatar
1 vote
0 answers
55 views

Does the appliance of R-squared to non-linear models depends on how we calculate it? $R^2 = \frac{SS_{exp}}{SS_{tot}}$ is going to be an inadequate measure for non-linear models since an increase of $...
mathgeek's user avatar
  • 121
2 votes
1 answer
1k views

From my understanding, linear regression is used for predicting an output based on an input using a linear equation that is optimally fitted to some input data. We choose the best fitted linear ...
Rahul's user avatar
  • 123
1 vote
2 answers
42 views

I am trying to make predictions using this dataset What I have done so far: Dropped the Administrative column Encoded the categorical data using ...
Omair's user avatar
  • 21
1 vote
0 answers
141 views

I want to implement a single perceptron for linear regression using the following formulas: the input data for the first case is one column (x(392, 1); y(392, 1)) and for the second case is (x(392, 7)...
Rim Sleimi's user avatar
1 vote
1 answer
61 views

If I have predictor variables which are a mixture of continuous and categorical, and a response variable that is continuous. What approach should I apply? Linear regression, logistic regression or k ...
Kiribatiadelie's user avatar
1 vote
1 answer
1k views

Why by increasing value of λ in Ridge estimator the slope of the line is decreasing? How exactly λ affects to the y = kx + b?
Dablup's user avatar
  • 11
1 vote
0 answers
27 views

I was writing a code for linear regression using tensor flow but I was getting errors while calculating matrix multiplication using tensor flow and while calculating accuracy. ...
User1086's user avatar
1 vote
0 answers
75 views

I have a very long dataset and I'm trying to reduce it by grouping the data in periods of 24 hours. In this way, there will be a single data point that represents that day, but they must yield the ...
Schroeder's user avatar
1 vote
1 answer
41 views

Problem I have a list of orders, approximation of their total weight and list of items they contain. I need to determine approximate weight of individual items. In other words, I have a few thousand ...
Draex_'s user avatar
  • 111
1 vote
0 answers
39 views

I am trying to find out if stock market movements, on average and in extreme conditions, affect gold prices. I am following the regression model proposed by Baur and McDermott (2010) which is given as:...
Hussain's user avatar
  • 11
3 votes
2 answers
5k views

I'm using sklearn.linear_model.Ridge to use ridge regression to extract the coefficients of a polynomial. However, some of the coefficients have physical ...
awho's user avatar
  • 31
1 vote
0 answers
100 views

I work on a bag of words, on the Toxic Comments Classifications challenge. The challenge is closed but the dataset is very nice to learn. I use R, tf-idf, tm, and logistic regression. I have a strange ...
Xiiryo's user avatar
  • 111

1 2 3
4
5
16