Questions tagged [linear-regression]
Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.
772 questions
0
votes
1
answer
262
views
Workflow when making a machine learning model
I'm new to data science, and kinda confused with the workflow and steps to make a model. Before learning the math and concepts behind the algorithms like SVM, linear regressions, etc, I would just ...
1
vote
1
answer
104
views
Linear Regression and Logistic Regression
I'm a beginner, and I'm wondering whether a logistic regression in a nut-shell is just normalizing a linear regression? Correct me if I'm wrong, but I came to this conclusion because the predicted ...
1
vote
2
answers
941
views
Why is it difficult to use a linear regression model for the classification problems?
Why is it difficult to use a linear regression model for the classification problems?
1
vote
0
answers
59
views
One-Hot encoded variables dominates importance among other variables
I am currently training some machine learning models to predict the 28-day compressive strength of cement, a continuous real-valued variable. The available dataset comprises samples from three ...
0
votes
1
answer
83
views
What are some Models/Methods to reduce noise using environmental data?
I have a set of pressure datasets from a mechanical device that frequently moves around the country. I also have several sets of environmental data (Altitude, ambient temperature etc.) from those ...
2
votes
2
answers
728
views
Parameter estimation in linear regression
Another test Q I couldn't answer -
We have marks of students belonging to 3 sections - A,B,C and two genders - M & F. Which regression model will not be able to estimate all the parameters?
1 ) ...
0
votes
1
answer
48
views
What Model to Choose for a NN with a Very Wide Output Layer?
The input of my neural network consists of 20 features, whereas the output consists of 20,000 of them (predicting a "quantum classical shadow" based on a few parameters: the rotation angle ...
0
votes
1
answer
60
views
ValueError: operands could not be broadcast together with shapes (13159,3) (13159,)
I am trying to predict the target variable and finding the difference from actual variable using polynomial regression. However predicted variable is an array of 3 dimension with the shape as (13159,3)...
0
votes
1
answer
146
views
Linear Model With Highly Correlated Attributes Producing Inconsistent Weights
I know that having correlated attributes violates the linear model assumption of independent attributes, and I'm not interested in creating a more sophisticated model to tease apart the dependent ...
1
vote
1
answer
141
views
Why Cost function is differentiable?
I've a very basic question about cost functions. I'm studying gradient descent and there we're using partial differentiation of features "Theta". But isn't the cost function an absolute ...
0
votes
2
answers
204
views
Does LinearRegression uses Gradient Descent for finding slope and y-intercept of the best fit line?
I know that Gradient Descent is an optimization algorithm used for optimizing the cost of the loss function.
Does Linear Regression model of the sklearn package use ...
2
votes
2
answers
2k
views
Why do residuals of linear regression model need to be normally distributed?
When evaluating the output from a linear/ridge regression model, I have taken the residuals between the predicted and test data. This gives me a normal distribution when I plot this data as a ...
0
votes
0
answers
67
views
How do you appropriately measure the real mean squared error of a box cox transformed linear regression model?
My understanding is that it can make sense to transform the outcomes of a linear regression model to make them more normally distributed. That's because it could 1) help me find more linear ...
0
votes
1
answer
173
views
Is it ok to normalize data using minmaxscalar on dependent variable?
I'm trying to make a sales prediction using the column X = item_amount and y = item_price_total, I'm confused whether it's okay to normalize data on the dependent variable using minmaxscalar?
With the ...
1
vote
1
answer
1k
views
Why COST FUNCTION AND MSE IS CALLED THE SAME?
Why are the cost function and mean squared errors called the same thing? WHEN THE COST FUNCTION IS 1/2M AND THE MSE IS 1/N. AND M=N
0
votes
1
answer
368
views
Why we need solver in LogisticRegression?
Why we need a solver like bfgs in LogisticRegression unlike LinearRegression? Don't we have a close form like LinearRegression?
2
votes
1
answer
96
views
Why would the result change so much for a linear regression with or without a constant?
I was running a Linear Regression with Wooldridge dataset named GPA2, which is found on Python library named wooldridge.
I tried two linear regressions. The first:
...
0
votes
1
answer
41
views
Help me identify the type of plot and the relationship between the dependent variables
Question: I am not sure how to describe the sample graph attached. Can you please help me identify the type of plot and how to statistically measure the relationship between the dependent variable (Y-...
1
vote
1
answer
98
views
regularized LLS, trying to compute by hand the optimal weights yields wrong results
given the following dataset $S = \{(0,1),(1,1),(1,2)\}$ and the regularized problem
$$\sum_{i=1}^3 (y_i - w_1 x_i - w_0)^2 + \lambda w_1^2 \quad \lambda = 1 $$ i was tasked with finding the optimal $...
1
vote
2
answers
532
views
Is it possible to overfit a simple single variable linear regression model?
I searched this question and the answer I got was about a general regression model, rather than a single variable, linear regression model. If you increase the number of variables, you could fit a ...
0
votes
1
answer
66
views
What can I do do address a regression with systematic bias towards the middle?
I’ve created a linear regression but my predicted output is usually too low for true high values and too high for true low values. I’ve tried introducing a pipeline where I use polynomial features, ...
1
vote
2
answers
71
views
Gradient vector starts to increase at some point, gradient descent from scratch
I have a simple linear function y = w0 + w1 * x, where w0 and w1 are weights,
And I'm trying to implement a gradient descent for it. I wrote the function and tested in on the data(a dataset of two ...
0
votes
1
answer
82
views
using forecast values from a univariate model as Input to linear regression?
I have weekly time series data for the last 2 years with variables "week", "marketing_spend", "web_traffic", and "revenue" ...
0
votes
1
answer
515
views
Regression with time series data
I want to predict temperature when time (datetime type, hourly data for five months) and humidity is given. Before starting in python, I created a regression model in excel. But instead of predicting ...
0
votes
1
answer
163
views
Testing RANSAC regression model
I am going to build the model (e.g. multiple linear regression) to predict the appartment cost in my city. First I have to find outliers in training data. For this task RANSAC regression algorithm ...
0
votes
2
answers
451
views
Which intrinsically explainable model has the highest performance?
Explainable AI can be achieved through intrinsically explainable models, like logistic and linear regression, or post-hoc explanations, like SHAP.
I want to use an intrinsically explainable model on ...
0
votes
2
answers
98
views
The ideal function in R for fit fitting n LASSO Regressions on n data sets
As part of a statistical learning research paper I am collaborating on, I am running/fitting two hundred sixty thousand different LASSO Regressions on the same number of different randomly generated ...
1
vote
1
answer
593
views
How to curve fit, Z variable dependent on X and Y?
I'm trying to find the function for this visualization:
I would like to get feedback if I'm taking the right approach. My approach:
These data points are created by a person. They are two ...
0
votes
1
answer
741
views
Predict coordinates from input of coordinates
I'm a newbie at data science and I want to ask how can I predict a set of coordinates from a set of input coordinates? That is (x1, y1) -> (x2, y2).
To give a ...
0
votes
1
answer
66
views
Linear regression not converging
I'm trying to implement the simplest possible machine learning algorithm which is linear regression. But I'm having trouble because the loss function is not converging. Please can you look at my ...
0
votes
1
answer
45
views
How to run a BE or FS Stepwise Regression on each dataset in a file folder full of datasets using lapply or map (without a loop)
All of the code in this question can be found in my GitHub Repository for this research project on Estimated Exhaustive Regression. Specifically, in the "Both BE & FS script" and "...
0
votes
1
answer
116
views
Day number as a feature in Linear regression
Goal - To train a Linear regression model for climatic studies.
Planned features: - Temperatures, Latitude, Longitude, Day Number (1st February = 32)
Would it be correct to include day number like ...
0
votes
1
answer
34
views
How to isolate a clear relationship from a subset of data with lots of noise and outliers
I am doing an analysis of aircraft data and I want to see how much fuel is burnt on landing. There are 2 main factors aircraft type and landing time (ie. time elapsed)
However there is a cheeky third ...
0
votes
1
answer
55
views
What (in the world) is well-conditioned vs. low rank fat-tail singular profile?
Scikit learn has a make_regression data generator. Can someone explain it to me like I'm 5 what is meant in the help docs by "The input set can either be well ...
0
votes
1
answer
164
views
Normalising data for simple linear regression
Consider a simple linear regression problem where:
X = [1,2,3,4,5,100,200]
Y= [2,4,6,8,10,200,400]
Clearly, the relationship is of the form $y=2x$; While trying ...
0
votes
0
answers
38
views
Is there a difference in result if we apply Polynomial / Kernel Regression on mean of target data, or all data?
Let's say we have some data :
input data X with shape (1, N=100), this will be duplicated 1000 times.
target data Y with shape (S=1000, N=100).
We have 1000 experimental data points, samples.
My ...
2
votes
1
answer
87
views
How to implement linear regression
I am having difficulty achieving the same result as in sklearn while implementing linear regression model from scratch.
After adjusting the learning rate, I obtained an AUC of 0.694 for this binary ...
0
votes
1
answer
35
views
Create ML model from dataframe with small number of rows
I have a dataframe with 50 rows (one row for each US state), and about 20 columns with different attributes with state related data. I'm looking to build a linear regression model to predict ...
0
votes
1
answer
300
views
How to choose neural network architecture for a relatively small dataset with less than 10 features for regression?
How to go about selecting an architecture for a dataset with 80 datapoints and 9 features for a regression model?
Working on the Desarhnais dataset, with "Effort" as the target variable.
...
0
votes
1
answer
1k
views
Linear Regression line not showing in plot
It's a silly problem, I know, but it's getting my nerves. Everything seems fine, but I cannot get the line to show on the plot.
I've put it in a public Google notebook, for your convenience.
t ...
2
votes
1
answer
47
views
What is the best way to determine if there is variable interactivity between independent parameters in a prediction model
OK, the best way to describe this is with an example. (admittedly simplified)
I want to predict the speed of drivers on a motorway and I have two input variables
the nationality of the driver
how ...
0
votes
0
answers
65
views
linear regression - at future time points
I have a dataset of customer transactions containing revenue, customer id, region, product category, product id, support team, date of transaction etc. The data ranges from Jan 2017 to Nov 2nd 2022.
...
0
votes
1
answer
4k
views
ValueError: Found unknown categories ['IR', 'HN', 'MT', 'PH', 'NZ', 'CZ', 'MD'] in column 3 during transform
I am trying to use Linear Regression, to predict salary in USD. I have the following data:
Data:
607 records
Numerical columns: year, salary, salary in USD
Categorical columns: experience, type, ...
0
votes
1
answer
44
views
What does the order of the lm summary coefficients signify?
I have
fit.all <-lm(Sepal.Length ~ .,iris)
summary(fit.all)->fit.all.summary
print(fit.all.summary$coefficients)
What are the coefficients ordered by?
0
votes
2
answers
66
views
Tensorflow - do I need to learn computer vision before linear (timeseries) regression?
I'm a newbie to tensorflow / keras and I am currently working my way through Deep Learning with Python (2nd edition) by Francois Chollet.
I understand the basics of Computer vision and the MNIST ...
0
votes
0
answers
38
views
Why is this an incorrect update of the parameters in the gradient descent algorithm? (Bishop, Pattern Recognition and Machine Learning)
Let's say we are performing a linear regression, with general model $y(x,w) = w_0 + w_1x$. The error function is $E(w) = \frac{1}{2N}\sum_n ((y(x_n,w)-t_n)^2$, for $N$ datapoints ${(x_n,t_n)}$ (...
0
votes
1
answer
788
views
Feature scaling in Linear Regression
I always use Linearregression() class in sklearn library for creating a linear regression model. According to my understanding, we need feature scaling in linear ...
3
votes
1
answer
1k
views
Dummy Variable trap in Linear Regression
The dummy variable trap is a common problem with linear regression when dealing with categorical variables, since one hot encoding introduces redundancy, so if we have m categories in our categorical ...
-1
votes
1
answer
61
views
How do the intercept and slope calculated in linear regression relate to the output of lm?
I have been looking at how to calculate coefficients by hand
and the example produces
$Y = 1,383.471380 + 10.62219546 * X$
However the output shown of lm does not show these values anywhere.
How do I ...
2
votes
1
answer
3k
views
Predict actual result after model trained with MinMaxScaler LinearRegression
I was doing the modeling on the House Pricing dataset. My target is to get the mse result and predict with the input variable
I have done the modeling, I'm doing the modeling with scaling the data ...