ML algorithms for life
When planning, a good heuristic is to start visualising the desired outcome and recursively work out necessary steps: ‘I want to arrive at 8, so I need to leave at 7:30, so I need to finish breakfast at 7:10 – shoot, I need to get up at 6:30’. Here are further examples: ‘I want to meditate consistently, so I need to find an accountability mechanism, so…’ or ‘I want to reduce X-risk, so I need to find a job related to X-risk reduction, so…’
The basic algorithm is ‘I want X; to do X, I need to do Y; to do Y, I need to do Z…’ Backpropagation, no?
Computer science lingo is helpful for decision-making in real life, as highlighted in Algorithms to live by. In this post, let’s extend their ‘computer science $\to$ life situations’ mapping to the domain of ML.
Kinds of learning #
Supervised, unsupervised and reinforcement learning have close analogues in real life. Supervised learning learning is like acing tests, where the test result is your labelled data; unsupervised learning is like understanding deeper, non-examinable patterns. Reinforcement learning, where an autonomous agent learns optimal decisions through trial-and-error, is a good metaphor for early adulthood.
Similarly, it’s useful bearing in mind the distinction between test and validation sets. Ultimately, you want to minimise something like a generalisation error, the expected loss with data points drawn from a given distribution. This is different from your estimate of the generalisation error, the loss computed from the data points you happened to sample.
$$\mathbb E_{(X, Y) \in \mathcal P}[\ell(f(X), Y)] \neq \frac{1}{N} \sum_{i=1}^N \ell(f(x_i), y_i).$$Don’t be too disheartened if the right-hand side is high.
Optimisation algorithms #
Stochastic gradient descent, too, is a good metaphor for how we learn. At a given moment in time, you figure out the best next step, take a step in that direction, and iterate. The gradient is computed across minibatches – you can’t feasibly use all data – so technically speaking, you’re probably doing minibatch SGD. The learning rate gives you the magnitude of the step. Terminology to describe loss landscapes (convexity, smooth, local/global minimum) is also useful, though it’s beyond the scope of this post.
As mentioned earlier, backpropagation also puts learning into perspective. Starting from the end goal, you work out necessary modifications to your parameters.
Further ML concepts #
So far, we’ve only discussed terminology relating to the basic ML pipeline. But there’s much more ML terminology allowing us to formulate problems from everyday life more precisely.
The bias-variance tradeoff captures the fundamental tension between low bias and high variance. A complex model is more expressive, but produces more variable outputs. Similarly, there’s the risk of overfitting. The conventional wisdom version: Occam’s razor.
We can also borrow jargon from label classification. Just as a model can be well-calibrated – combining the right signals in the right way – so can humans. If wisdom is about making the right decisions in light of available data, then the wise man is he who has high AUROCs across different classification problems. Another useful notion from ML classification is the max-margin solution: selecting the max-margin solution is like applying the precautionary principle.
Finally, principle component analysis also lends itself to personal real-world applications. For example, a good prompt might be: ‘What are the principal components here?’ And in this setting, the term ‘singular value’ should perhaps be understood literally.
Beyond human-brain analogues #
The field of ML has always used real-world analogues to drive progress – indeed, neural networks are modeled on human brains, and physical intuitions help with algorithm design. Conversely, thinking in ML terms might make us more clear-sighted. Maybe ordinary people can benefit from the field of ML in unexpected ways.
This post was inspired by a PEAKS workshop I attended this fall, led by Marius Wenk.