I am running a pooled OLS regression as a benchmark model on a panel data set of online forum member activity. The aim of the model is to understand the relationship between exposure to hate speech and its adoption.
The pooled model therefore includes exposure to hate speech the previous month as the independent variable of interest, alongside variables controlling for the number of months spent on the forum and the total number of forum posts. The results of the regression can be seen below:
Analysis of the residual errors from the pooled model suggest that the residual errors are not normally distributed, shown by a heavy-tailed Q-Q plot. While there is also evidence of heteroskedasticity and autocorrelation.
To improve upon the pooled OLS model I plan on implementing a fixed effects model, controlling for time and entity (forum member) fixed effects via a process of de-meaning. This is done using the PanelOLS package from linear models in python and using robust and clustered standard errors.
