raging alert

Machine learner overfit

Machine learning texts are quite dense. Packaging as much information into as little as possible text. Quasi a min-max-problem: Minimizing the amount of text while trying to maximize information. Clear thoughts are reduced to a minimum. But sometimes our brains need exactly that. A passage of text that gently introduces you to an area, gives your mental voice time to speak up, and allows you to actually get into that area. Having such a voice, “reading” the text for you can reduce the cognition load of perceiving text significantly and simultaneously allowed the brain to use the remaining capacity for thinking about the topic itself. Putting effort into understanding what a person wants to say with a particular sentence, leaving you in turn with less capacity for the topic itself.

However, I do not want to say that it is totally wrong. People in specific areas always develop their own subculture, also including a specific way of communication. Once you get used to it, it can become a neat and short way of communication. But getting into it might be sometime quite hard.

The topic of machine learning became enormously popular within the last decades. As a consequence, more and more people started working in this field on quite similar topics. Ergo, the number of publications keeps raising, while the time between them shrinks, since the pressure gets bigger due to higher amount of “concurrency”. This might be one reason, though not the only one, why this field developed such an extreme and narrowed writing culture. Texts have to be dense.

The publishing process, shaped by this fast growing, is influencing in it's way the structure of a paper. A mental model of what a paper should look like is: They encompass a certain amount of mathematical symbols, paragraphs being not too long, a number of n sections (with n being 6 ≤ n ≤ 8).

Don’t get me wrong: a certain structure should be given in all disciplines. But by getting increasingly dense papers, this converges towards an identical way of writing. Putting it into machine learning terms: we try to minimize the variation of texts v given as an approximate normal distribution φ with its maximum density in center c, where the variance σ is given by the variety v of the texts. In other words, reducing the radius of this enclosure will give us samples which are converging towards the center c, leaving us with rather heterogeneous texts. This, however, seems astonishing similar to the hypersphere collapse problem within the one-class classification area.

A good example is omitting the section of limitations. This normally includes a variety of freely formulated text, thoughts and lays out where the presented work might have weaknesses and may not work perfectly. The need of publishing in a fast-growing area, is consequently followed by hiding flaws. This in combination with the decreasing of variety of texts, results consequentially in the reduction of the limitation sections in the field.

Blog Logo

Jennifer Matthiesen



A Blog about Neural Networks and Their Outputs

Back to Overview