Search Results
| Search type | Search syntax |
|---|---|
| Tags | [tag] |
| Exact | "words here" |
| Author |
user:1234 user:me (yours) |
| Score |
score:3 (3+) score:0 (none) |
| Answers |
answers:3 (3+) answers:0 (none) isaccepted:yes hasaccepted:no inquestion:1234 |
| Views | views:250 |
| Code | code:"if (foo != bar)" |
| Sections |
title:apples body:"apples oranges" |
| URL | url:"*.example.com" |
| Saves | in:saves |
| Status |
closed:yes duplicate:no migrated:no wiki:no |
| Types |
is:question is:answer |
| Exclude |
-[tag] -apples |
| For more details on advanced search visit our help page | |
Results tagged with dataset
Search options answers only
not deleted
user 12359
Requests for datasets are off-topic on this site. Use this tag for questions concerning creating, processing, or maintaining datasets.
5
votes
Data Sets suitable for k-means
In complement to JEquihua's great answer, I would like to add 2 points.
Case 3 is a nice example of a case where it would be useful to have a clustering algorithm that doesn't give only the cluster a …
2
votes
Looking for redacted text corpus
For medical data, a few datasets can be found at: Physician notes with annotated PHI
1) i2b2 2006 Deidentification and Smoking Challenge's data set:
NLP Data Set #1B: 889 de-identified discharge …
8
votes
What is exactly meant by a "data set"?
In the open data discipline, dataset is the unit to measure the
information released in a public open data repository. The European
Open Data portal aggregates more than half a million datasets. …
5
votes
Plotting data from several files on one plot
One way to do it is to use points:
x <- seq(0, 2*pi, len = 51)
y1 = sin(x)
y2 = cos(x)
plot(x, y1)
points(x, y2, col = "red")
If your data files share a common axis, you can use matplot:
a <- ma …
3
votes
A suitable corpus for training skip-though vectors
Common Crawl corpus: consists of 145 TB of data from 1.81 billion webpages as of August 2015
http://www.lrec-conf.org/proceedings/lrec2018/pdf/889.pdf: see Table 1 for several summarization corpora, …
3
votes
Accepted
Why does the Ciphar 10 tutorial on TensorFlow crop the images to be 24x24?
As a side note, the CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. This means that 24x24 cropping keeps most of the image. …
17
votes
Training data is imbalanced - but should my validation set also be?
The point of the validation set is to select the epoch/iteration where the neural network is most likely to perform the best on the test set. Subsequently, it is preferable that the distribution of cl …