Questions tagged [categorical-data]
Categorical (also called nominal) data can take on a limited number of possible values called categories. Categorical values "label", they do not "measure". Please use [ordinal-data] tag for discrete but ordered data types.
3,625 questions
0
votes
0
answers
36
views
Regression model for a survey [closed]
In our questionnaire the answers are in the categorical format therefore we used dummy trapping for the regression part, however we have a doubt to use which of the following 2 ways:
(i) For models ...
2
votes
1
answer
115
views
How do I estimate the linear effect for a factor so that my estimate doesn't depend on the sample size?
I’m trying to use the R poly() function with degree 1 to force glm to interpret a factor linearly. I’m puzzled by the fact that the size of the sample seems to increase the coefficient of the ...
1
vote
0
answers
24
views
Is there a way to perform a correspondence analysis with ordered variables?
I am trying to perform a correspondence analysis on a dataset of anatomical measurements of ecologically relevant features. Most of these variables are ordered factor variables representing binning of ...
1
vote
0
answers
13
views
Log-linear models and multiple comparisons: exploring multiple categorical and binary variables
I'm trying to understand how three categorical variables affect several binary variables. I am roughly following these instructions. Here is what my data look like (not my real data):
Binary answers ...
2
votes
1
answer
314
views
Why are ordinal variable levels not kept in order in glm?
I've been following the method illustrated here: Polynomial contrasts for regression to transform the results .L, .Q, .C, etc. of a glm ordinal factor regression in the values for each of the levels ...
0
votes
0
answers
102
views
Linearity assumption with categorical mean-encoded variables
I'm struggling to understand the linearity assumption when running OLS with continuous dependent var and categorical independent variables that have been mean-encoded (simple group mean per category).
...
0
votes
0
answers
70
views
Confusion about stange results of difference checking test (Classical Chi-square test and Bayesian Chi-square test)
I am a newbie at conducting difference checking test (Chi square test). When I make contegency table for doing Chi square test (classical and Bayesian tests), I get some phenomena that they would be ...
2
votes
1
answer
59
views
Analyzing Differences in Dependent Categorical Variable Given Two Different Subject Types
I am trying to analyze some survey data in R but I am a bit confused about how to run the right type of analysis. In the survey of college students, the participants were put in a hypothetical ...
0
votes
0
answers
65
views
Which technique should I use to test the independence categorical variables over repeated samples
I have individual level data with a performance measure (good/bad) and characteristic variables for the individual (e.g. gender). I usually analyse this using a chi-squared test to see if the ...
4
votes
1
answer
151
views
How to generate random categorical data when number of categories is very large?
Problem in brief
I would like to generate several samples of iid categorical data. The standard approach does not work because the potential number of categories is large, and I do not want to impose ...
0
votes
0
answers
68
views
meaning of "residuals" in calculating correlations from Spearman 1904
The free Statistics package "JASP" has a data library that illustrates various tests and operations. One of them is Factor Analysis. They use the data from Spearman's 1904 "General ...
0
votes
0
answers
47
views
Level-wise effect sizes of a categorical variable in a GLM
I am running a GLM (Gaussian Family; Identity link) on some medical data. I intend to find out if the level of disease severity has any effect on task performance. A minimum reproducible example (...
6
votes
2
answers
163
views
Are there rules of thumb for the sample size required when using a categorical predictor in linear regression?
I’ve had a reviewer suggest that I use ethnicity as a covariate in a linear regression. Some ethnic groups in the sample are small enough that I am a little worried that I will overfit if I do this.
...
2
votes
1
answer
124
views
GLM with 2 variables with factors, where neither has a "baseline"
I am trying to do a GLM with a dataset. My dataset consists of days individuals go on a social outing, and whether the outing was "better than average" (subjective). I have recorded the ...
9
votes
2
answers
291
views
Treating two columns in R with shared factors with the same coefficients
I am attempting to do analysis on a dataset using a GLM. In this dataset I have two columns with codes in about individuals, and trying to infer whether an individual passes.
For example:
...
4
votes
2
answers
260
views
Should I dummy code my categorical variable in SEM model?
I am working on doing a path analysis and using lavaan(). One of my endogenous variables is an ordered factor HOWEVER, the difference between each group is not ...
1
vote
0
answers
32
views
When dealing with correlated slopes and intercept, does it make sense to include only certain levels of the random slope variable (by subject)?
I am fitting a mixed effect model where some levels of the categorical variable are correlated with the intercept for the following formula, resulting in a singular fit:
...
0
votes
1
answer
135
views
Using bar chart vs histogram for dates [closed]
My general rule of thumb is that histograms should be used for continuous data, and bar charts for categorical data. (obviously not my rule)
What about dates? They are non-continuous (unlike, say, ...
1
vote
0
answers
62
views
interpreting the intercept and coefficients of fixed factors in mixed effects logistics model with weighted effect coding
This is the first time I used mixed effect logit model with effect coding, and I am a very confused. I have been trying to understand this for a few weeks, and would be deeply grateful for your ...
2
votes
1
answer
173
views
Custom contrasts involving interaction terms using emmeans
I'm trying to setup a custom contrast using emmeans but am a bit unsure on how to do so properly.
I have two factors, let's call them A and B, with three levels each.
I want to test the following ...
0
votes
0
answers
35
views
Panel data analysis question
I just have a quick question. I am trying to make a panel analysis, comparing different EU member-states over multiple years. My dependent variable is 'trust in EU institutions', and my independent ...
4
votes
1
answer
242
views
using different coding schemes (dummy vs. effect coding) for different predictors in the same logistics regression model
In my study, students are divided into three groups and each group read one text. After that, all the three groups completed the same reading comprehension test. There are 15 test items, consisting of ...
3
votes
3
answers
369
views
Correct method for Chi-square testing for yes/no data
I have following data:
I am trying to analyze it by applying Chi-square test in Excel with CHITEST(Data B, Data E) function:
I also tried with using only the &...
0
votes
0
answers
34
views
Can I use breslow-day test in cross sectional study?
I'm doing a comparative research regarding difference of $X$ (independent variables) $Y$ (dependent variables) relation between urban and rural groups. The design is cross sectional. Here are several ...
0
votes
0
answers
82
views
How to compare 5 groups on categorical DVs with covariates in SPSS?
I’m new to statistics and working in SPSS. I have a 5-level categorical independent variable and several categorical dependent variables, some binary (yes/no), some with more than two levels. I also ...
0
votes
0
answers
128
views
Post-hoc test in R for lm() with significant interaction between numeric and multiple categorical variables
I am interested in how the relationship between two traits (trait1 and trait2) varies between groups (A and B) and treatments (C and T). Specifically, I want to know whether the relationship between ...
4
votes
1
answer
199
views
Structural Equation Modeling with categorical variables (nominal/ordinal)
How can an SEM model be fitted when the dataset includes both continuous and categorical variables?
5
votes
2
answers
176
views
What is random variable associated with complete random assignment in a finite sample to $K$ treatments
I am struggling how to define a random variable which represents complete random treatment assignment in an experiment when there are $K$ levels of treatment.
To define some terms
Simple random ...
0
votes
0
answers
56
views
Comparing the percent cover distribution with categorical variables in R -- will a KS test work?
I am looking to determine if the percent cover distribution across canopy/growth form classes differs significantly between ForestType1 and ForestType2. In each forest type, I have about 10 canopy/...
1
vote
1
answer
321
views
Cluster analysis with Gower distance
I have a dataset that includes both numeric and categorical variables, and I want to perform cluster analysis. Thus, I choose the Gower distance as distance metric. Next, I perform agglomerative ...
0
votes
0
answers
95
views
ANCOVA - help with categorical variable
As part of my research I am running ANCOVAs, using the baseline score of the measure of interest as the covariate. 2 of my outcome measures are subscales made up of 2 items and after testing for ...
2
votes
1
answer
153
views
Whether to use GLM or GAM on Negative Binomial data with categorical and numeric predictor variables
First ever question here so I apologize if I miss any appropriate information.
I'm working on some ecological count data of different vegetation classifications (Oaks, pines, grasses, forbs, etc...) ...
0
votes
0
answers
46
views
Pairwse comparisons when categorical variables has zeros in their levels
I have a species occurrence dataset (community matrix) where I analyze beetle preferences for tree species and treatment using a GLMM + Tukey HSD. My issue is that species absent in some tree species (...
0
votes
0
answers
73
views
Collinearity problem for categorical variables and ordinal regression model
I am struggling with the collinearity. I have a dataset including 10,000 observations, and all the the independent variables are factor variables, such as age group, household size group, ...
0
votes
0
answers
53
views
Cluster number and size in modelling a categorical variable by GEE
I had to learn about statistical models to approach a genetics project that I inherited: we obtained genotypes for hundreds of biallelic SNPs (possible values for each SNP: 0 = non-carrier, 1 = ...
6
votes
0
answers
320
views
Reconstructing count table when only pairwise features are visible
Assume we are only able to observe two-way entry table counting the number of observations of a pair of categorical features $x_i,x_j$.
$$
\begin{array}{c|ccc}
& & x_j & \\
\hline
...
0
votes
0
answers
75
views
How to draw a random sample with uniform marginals?
I have a population with $k$ categorical variables. I know the distribution of these categories. I would like to randomly choose a sample from my population so that the marginals are uniform. I don'...
1
vote
1
answer
92
views
Why comparison letters do not differ when factor was found significant?
I ran a series of models in r with two factors involved using the "lmer" function, testing also for the factor interaction.
For example:
...
1
vote
1
answer
112
views
Comparing differences in preference with 3 values including neutral
Comparing differences in preference with 3 values including neutral
Scenario: Analyzing preference data with 3 values (For example: Which do you prefer: Football, Baseball, or no preference (i.e., ...
1
vote
0
answers
31
views
Which statistical test to use for 10 level categorical variable, 2 participants, 630 repeat measures?
I have data of 2 players playing a card game 21 times. Each game consists of 30 turns. Data has been collected at the end of each turn, specifically, how long the turn took and what type of action was ...
0
votes
0
answers
37
views
Assessing the performance of a model with nominal categorical outcomes by comparing to experimental data
I'm modeling a phenomenon which has 10 nominal categorical outcomes. The relative probabilities of these categories are affected by a handful of variables (that I know of and can record).
Question 1
...
5
votes
1
answer
124
views
Ordinal regression with different associations between predictor and outcome
I am new to working with regression analyses and I am planning to calculate an ordinal regression model.
I have an ordinal dependent variable, with 5 possible outcomes. I have several predictors, some ...
0
votes
0
answers
31
views
Cross validation: Multilevel model with the research question, what are the factors that shape practice?
I was advised to use Multilevel modeling for my data analysis on this platform. The model speaks to the data structure and my study's theoretical framework. I thus find it suitable.
Here is the model ...
0
votes
0
answers
70
views
Quantifying categorical association between two binary variables in the presence of group structure
I have data on 4 roughly but not perfectly balanced groups, about 60 subjects per group. For each subject, I observe each of 10 binary variables with overall prevalences between 10% and 90%, no ...
0
votes
2
answers
73
views
Why is individual significance of interaction term dummies dependent on the base/omitted category, but joint significance is not?
Don't think I can show the data but in an linear regression model, I have (in addition to a couple other variables) an interaction term between continuous variable age and categorical variable health. ...
3
votes
2
answers
223
views
Data preparation for binary logistic with households as a unit of analysis
I am trying to run a binary logistic on different factors, to establish which factors shape the study phenomena. This data is for the people above a specified age. This means the number of people in ...
2
votes
2
answers
165
views
Correlation between interval and nominal variables
I have a numeric variable with village sizes (in hectares) and a categorical variable with four soil types. I would like to investigate if soil type is associated with village size using R. I have ...
3
votes
1
answer
207
views
MANOVA with some dependent variables being ordinal and some being interval
A MANOVA test seems to be a good fit for the following study except that the dependent variables are not interval as required, I want to examine how heritage and non-heritage learners differ on two ...
1
vote
0
answers
52
views
Difference between dichotomous and ordinal variable for inputation purposes
I'm reading the documentation of the Amelia R package.
In the Ordinal section of the documentation there is written that ordinal variables include dichotomous variables, and one example is gender, ...
3
votes
1
answer
130
views
Data categorization
I have categorized my education dataset for the analysis below. However, I have one occurrence of a respondent who attended a Missionary school that I do not know its level and I am unsure where to ...