How small is small?

Question 1

It was mentioned the dataset can be small. How small is small? Do you have further literature hints about this? Thank you

Question 2

There is no single answer to this question since it depends on your data and your objective. For instance for MNIST, a few dozens of thousand works just fine. For a large language model you need hundreds of billions of examples. You have to look at your data, and sometimes only an expert can assess the complexity of the problem and therefore the amount of data required.

Question 3

It is a bit frustrating. I thought you could help me with further literature hints about this. I have zero experience with it. An expert is also not an expert from the start.

Question 4

this kaggle blog post explain why this is a problem https://www.kaggle.com/code/rafjaa/dealing-with-very-small-datasets

Question 5

Usually, it is hard to determine if the dataset is too small or not, intuitively we try to compare a dataset relative to another one. If the other dataset is of similar complexity and can be understand by a AI model, we can say that it is of reasonable size and that our new dataset which is comparable isn't too small. Another way to determine the smallness of a dataset, which is more costly in time, is to try to solve the task with an AI model and see if the result is good enough

Question 6

Thank you very much. I would like to suggest maybe to put on the slides further literature hints.

Question 7

This is not necessarily about dataset size but about available features in your dataset; take SVM, you can well have a working SVM model with 100-200 dataset, and only 2-3 features; if you raise the number of features to, say, 20, that will probably not work. In ML, everything is relative, and there's no black and white answer. One of the most important aspects of ML is actually understand the data which will be used, not necessarily its size.

Nathan[IDRIS] · Answer 1 · 2023-11-23T14:42:41+0000

commentée par anonyme 23 novembre 2023

commentée par A. Regnier [IDRIS] (2.2k points) 23 novembre 2023

commentée par Maxime[IDRIS] (5.6k points) 23 novembre 2023

commentée par anonyme 23 novembre 2023

commentée par anonyme 18 janvier

How small is small?

Votre réponse

1 Réponse

Votre commentaire sur cette réponse

Catégories