Skip to main content

Questions tagged [dataset]

Requests for datasets are off-topic on this site. Use this tag for questions concerning creating, processing, or maintaining datasets.

Filter by
Sorted by
Tagged with
2 votes
1 answer
75 views

My current issue lies within EMR extracted data for medications. There are multiple variables named: Medication_1, Medication_2, Medication_3, etc... This data may overlap and analyzing each column ...
Abdallah Al-Ani's user avatar
2 votes
0 answers
59 views

This question is specific to ordinal data collected on the likert scale What is the best metric to discard annotators with low inter-annotator agreement (IAA) with others? from e.g., Cohen’s Kappa, ...
user2160809's user avatar
0 votes
0 answers
44 views

i want to create a data regression between two financial indexes, but they don't have a perfect correspondence in the data of observation (for example one has the relevations for 17/6 18/6 19/6 but ...
ConfusedConsultant's user avatar
1 vote
1 answer
101 views

I was wondering how a poisson regression would work given my dataset which describes a series of zip codes stratified by age groups, gender and death counts. The regression would use death counts as ...
Seyong Chang's user avatar
0 votes
0 answers
66 views

I have a data file from a Monte Carlo simulation of fifteen protein chains. The file contains 10 million r_end_to_end 3D vectors as rows and 3 x 15 = 45 columns. My ...
user366312's user avatar
  • 2,077
1 vote
0 answers
87 views

I have my historical sales data and I want to check for the trend (increase, decrease or no change). When I do my annual line graph, the slop of my linear equation is positive (indicating increase) ...
monique's user avatar
  • 31
1 vote
0 answers
30 views

I'm working on a bigger school project, trying to classify timeseries measurements with Minirocket/Rocket. My trainingdata consists of a 1D matrix containing the measurements, and a seperate 1D matrix ...
Michael's user avatar
  • 11
1 vote
0 answers
198 views

When we have cross-sectional data, we can easily detect and remove outliers. But how should one approach outliers when we are dealing with panel data? Since we have $i$ entities and $t$ times periods, ...
TFT's user avatar
  • 345
2 votes
0 answers
65 views

I am working with time series and want test different forecasting methods but first I need to test if my time series (sales) data is stationary or not. So I have been learning about KPSS and Dickey-...
monique's user avatar
  • 31
0 votes
0 answers
11 views

I have an imbalanced dataset with multiple classes where some have less than 100 some are more than 10k,where i want to apply random forest(the dataset is confidential so i cant share),i used all ...
Deepak kumar's user avatar
1 vote
1 answer
77 views

My data looks like similar to this: (the picture below is not mine, but describes perfectly my situation) where the IDs are not unique but for each ID value I have a unique target value The following ...
Moez Daly's user avatar
1 vote
0 answers
184 views

I am fitting two different GPs with derivative observations (one with 9 dimensional input and one 12 dimensional input), however for some reason I am getting much worse results for the 12 dimensional ...
m-julian's user avatar
1 vote
0 answers
82 views

The Problem Hello everyone. I'm working with a dataset that has 15300 samples with 49 features each, equally distributed amongst three classes. I used TSNE to reduce the dimensions of the feature ...
Amyr14's user avatar
  • 11
3 votes
1 answer
92 views

I would like to generate a synthetic dataset where there are multiple records per ID, and self-consistency is maintained among records of each ID. For example, imagine a dataset where the ID is a ...
user12138762's user avatar
0 votes
0 answers
46 views

** "I'm trying to find a correlation between the confirmed cases and deaths rates against HUMIDEX values. As you can see, the data is very scattered, so I understand that polynomial and ...
Carlos Leonel Guerrero Rodrigu's user avatar
0 votes
1 answer
93 views

I have to create a hypothetical study focusing on the relationship between sBCMA (soluble B-cell maturation antigen in blood) and the expression of BCMA on bone marrow cells in patients with multiple ...
youknow 321's user avatar
2 votes
0 answers
74 views

I'm studying the article "Estimating the number of clusters in a data set via the gap statistic" by R. Tibshirani, G. Walther and T.Hastie: https://academic.oup.com/jrsssb/article/63/2/411/...
user2702's user avatar
0 votes
0 answers
43 views

I have a data visualization, showing the sentiment of two lemmas "гей" (var a) and "трансгендер" (var b) in a news corpus throughout the year. Here is the dataframe sample of my ...
pindakazen's user avatar
1 vote
0 answers
34 views

In paired-trial validation, a statistical (ML) models are trained on $n$ datasets separately and then applied to other datasets, as a way of estimating the generalization of the models obtained. ...
Roger V.'s user avatar
  • 5,091
4 votes
2 answers
202 views

I am attempting to perform an autocorrelation study using python on a discontinuous time series dataset. To share a bit about how my data looks like, it is a single column of values, which spans over ...
Sam's user avatar
  • 83
0 votes
1 answer
87 views

I am building an audio classification system using CNN. My dataset consists of different audio I have recorded and spliced to equal time lengths. Like with any other common ML or DL tasks, I am to ...
Flash's user avatar
  • 1
0 votes
1 answer
129 views

I am trying to calculate the reliability of a difference score. Specifically, the data have, for each participant, scores for 10 items in Condition X (1s and 0s), as well as 10 different items in ...
Altair555's user avatar
0 votes
0 answers
80 views

I have a 2D data array indicating a chemical percentage content (PC) in a chemical droplet. I am trying to calculate the average PC in the droplet. The image of one of these arrays is shown below (the ...
user7077252's user avatar
3 votes
1 answer
495 views

I have a dataset that I want to perform a regression on. However, some of the columns are not in numerical form. For example, the extra classes column. What I ...
Charlotte's user avatar
1 vote
1 answer
177 views

I've noticed that there are some data analysis being done in some scientific field where the authors would split out an entire dataset into subsets based on a particular property. One classic example ...
Syuma's user avatar
  • 115
0 votes
1 answer
67 views

I would appreciate your help with a question I have. I'm creating a Difference-in-Difference study to examine how a conditional cash transfer to individuals 18 years of age to be spent in sport ...
Retir's user avatar
  • 1
1 vote
0 answers
140 views

There is a sizeable body of literature on the issue of multiple maximizers in maximum likelihood estimation, such as https://projecteuclid.org/journals/statistical-science/volume-15/issue-4/...
Tom Solberg's user avatar
3 votes
0 answers
223 views

Recreating data variance from the posterior distribution Take a set of data points $(x, y)$ with (Gaussian) uncertainties $\sigma_y$ on the $y$ coordinate; they are modeled as $y \sim f(x; \alpha) + \...
Jacopo Tissino's user avatar
0 votes
0 answers
53 views

[In case you feel inclined to close this question because I'm asking for a dataset - I'm looking for solutions in the spirit of point 2 (on-topic) in the accepted answer to this question about asking ...
Scriddie's user avatar
  • 2,673
0 votes
0 answers
42 views

Because I'm that guy, I wanted to run some statistical analysis on the results of a number of experiments; specifically, I'm wanting to track my progress on different runs of the turn-based strategy ...
John Doe's user avatar
1 vote
0 answers
122 views

I want to design a questionnaire and examine a new construct (variable) in my research with a five point scale from 1 to 5. How can I test whether the questionnaire satisfies the requirements of ...
Dr. Subhash Chander's user avatar
1 vote
0 answers
37 views

I have the following problem to analyze: I divided an area into several sectors (i.e.: S1,S2,S3,…,Sn) and there is an event that can happen in one or more sectors at the same time. I considered a ...
Rodrigo's user avatar
  • 111
0 votes
1 answer
78 views

I have a 2D matrix TD of training data that is a collection of N non-linear signals that are functions of time (hence the ...
Jonathan Frutschy's user avatar
1 vote
0 answers
145 views

I am wanting to plot a graph where I have multiple data points per category of data. For some context, I have done some analysis on different samples and now have up to 3 3 data points for each sample ...
Charllotte's user avatar
2 votes
1 answer
114 views

I'm conducting a research in which patients went through a surgery, for some the surgery was successful (outcome = 1) and for some it wasn't (outcome = 0). The risk factors were calculated using a Cox ...
AREEEL's user avatar
  • 21
1 vote
1 answer
121 views

I have data from a survey which was asked people 1) how often they used a particular tool (daily, weekly, monthly, annually, etc) and 2) many hours they usually spent using it (0 - 4 hrs, 5 - 9, 10 - ...
Arctic's user avatar
  • 81
2 votes
1 answer
114 views

I am analyzing clinical data and complex microbiome data in a longitudinal study. I already compared different groups at baseline and between baseline and "events" using linear mixed models (...
BHO_1990's user avatar
6 votes
1 answer
429 views

I have questions about the geometric structure of data sets, esp. as it relates to the relationships between predictors. Is there a name for this field?
Chris Science's user avatar
2 votes
1 answer
107 views

I have a problem with some health data that I'm trying to analyze. The main issue originates from a census variable is derived from self reported times. The variable is sleep duration, which is ...
Ender_The_Xenocide's user avatar
0 votes
0 answers
65 views

I'm trying to do a multiple linear regression analysis in Excel using the Analysis Toolpak and I am not good at math, let alone stats. So please excuse my total ignorance. I'm using the following ...
MissyM's user avatar
  • 1
1 vote
0 answers
50 views

I am a beginner to data science. I found this dataset that covers natural disaster incidents in Afghanistan from 2016 - present. Here are the 13 columns: REGION (South West, North, etc), PROV_CODE (...
Mas's user avatar
  • 11
3 votes
1 answer
179 views

Let a dataset $\mathcal{D}$ be sampled according to $F_{\mathcal{D}}$. My question is, suppose I create bootstrapped samples from $\mathcal{D}$. That is, create $\mathcal{D}_1, \ldots, \mathcal{D}_M$ ...
Your neighbor Todorovich's user avatar
2 votes
1 answer
182 views

So, I have survey responses from users. Just to make it clear, if you select an issue like Poor UI then you are prompted with 4-5 specific issues about the UI to select from. Poor UI is the main ...
doodle2611's user avatar
1 vote
1 answer
221 views

I am a bit confused with the time series dataset preparation. From the internet, I saw all examples which used tree-based models, had input features and target defined as: ...
kg__'s user avatar
  • 63
0 votes
1 answer
106 views

In my study, 40K completed household surveys. However, when we suggested visiting a nearby health center to measure their physical parameters (height, weight, blood pressure, and blood glucose), only ...
Dr bappa's user avatar
2 votes
1 answer
95 views

I'm taking the free Caltech machine learning course. I'm having trouble understanding the notation on one of the problems: In this problem, you will create your own target function f and data set D ...
Ben G's user avatar
  • 153
0 votes
1 answer
291 views

I'm working on a machine learning project to find particular key points in images. To do this, I'm using a U-net like architecture and treating it as a regression problem to produce a heat-map of ...
ocharles's user avatar
  • 103
1 vote
0 answers
63 views

I was just asked to familiarize myself with some methods looking at comparing AUROC for a few predictive scores to predict outcomes. Issue is that I have a dataset of about 200 with <5% with the ...
Mike K's user avatar
  • 11
0 votes
1 answer
107 views

Firstly i'm completely new to data science (first project) and to StackExchange, so sorry if i'm asking a stupid question or not providing adequate information in my question. Please tell if i could ...
Mathias Therkelsen's user avatar
0 votes
0 answers
89 views

I have a highly imbalanced dataset and I'd like to train a simple ANN classifier on it. My model currently is a simple 2-layer feed-forward neural network with ReLU activation in between. After a few ...
Green 绿色's user avatar

1
2
3 4 5
39