Measuring the learning of skills
The upside of quarantine
During the beginning of Covid-19, when a lot of countries were on lockdown, I kept seeing people posting about new skills they learned as they were quarantined and forced to find new activities to pursue. Some took up drawing and languages, while others took up sports. The sudden explosion of new skills, and often really impressive ones, made me think about the process of skill acquisition.
- What makes some people learn skills faster?
- Are people differently skilled at different things, or are some people faster than others at learning all types of skills?
- Do people differ with respect to speed of skill acquisition, the maximal attainable level of skill, or both?
- How does skill acquisition taper off after an initial phase of rapid improvement?
These are profound questions that one cannot expect to answer with one method or within one answer. We can only hope to get closer to an answer one study at a time. By recording meta-data surrounding practice (such as nutrition, hours of sleep, technique used) one may be able to glance at the answer to the first question; by comparing individuals one can identify differences in how people learn skills and disentangle the processes underlying questions two and three; by aggregating data from many individuals one can identify how skill acquisition occurs over time to answer question four.
As I’ve always been fond of learning new skills, I wanted to think of a way to quantify skill acquisition and try it out on myself. Since I played football in my youth, but have barely kicked a ball in the last 15 years, I reckoned I’d try to see how many times in a row I can kick a ball without it touching the ground.
Quantifying skill
In order to quantify a skill, it has to be easily measured and observed (operationalised). For the development of any skill, irrespective of domain, there has to be fast and objective feedback whenever there is a mistake; unsuccessfully kicking a ball gives immediate emotional pain, and therefore fulfils that criterion. Kicking a ball is the perfect candidate since kicks are discretely countable, and it’s clear whether a kick was successful or unsuccessful.
I decided to write down the number of kicks in a row that I managed to do, and keep track of how many attempts I had made. I also recorded the date and time in order to compare skill acquisition, not only across attempts, but also across time. To allow comparison across training sessions, I decided to keep the number of attempts per session fixed at 50 (for reasons that will become apparent later in the post. However, some days I did more than one session (for example 2 x 50 attempts).
Since I’m not a complete beginner, and to not make it too easy, I used a small ball (9 cm in diameter). Each kick had to begin from the ground (the lift-off wasn’t counted), and only wilful contacts were counted (not if I accidentally kicked the ball against my shin). Since I was in my apartment, with nearby objects and walls, the ball was allowed to bounce against vertical objects (wall, bookshelves), but not horizontal ones (table).
Looking at the data
In total I spent 6,5 hours training, and made a total of 26’252 kicks across 13 sessions (13 * 50 = 650 attempts) over 9 days. Below is a plot of the raw data, showing the number of successful kicks for each attempt.
There is a clear increase in the average number of kicks over time. However, most attempts are still pretty low, with some infrequent spikes. When looking at the raw data, it can appear as though there is no improvement, since most attempts are with a similar number of successful kicks. For example, between attempts 200 and 350, it seems as though there is no improvement at all, while there seems to be a large improvement between 150 and 250 attempts. The variability in the data obscures the pattern because we tend to focus on easily visible and interpretable outcomes (because of cognitive heuristics) such as the best results (peaks). If I instead plot the moving average across 100 attempts, as below, one can see that there is a continuous and equal improvement throughout the first 350 attempts. One can see that I was rather unmotivated one day, when I had slept only 4h due to a night shift, which caused a decrease that took some time to recover due to the moving average.
At this point, I got tired of continuing practice since the time needed to finish each session increased from 15 min to around 50 min. Since I was doing the practice only for this analysis, I didn’t want to stop doing 50 attempts per session. It may have made more sense to use a set time for each session instead. However, since there is noise in the data (a lot of variability in the number of kicks between attempts) one needs a certain number of attempts for the underlying trend to be visible. There are two obvious trends that one can keep track of to gauge progress: average and maximal number of kicks per session.
This plot shows all the individual attempts together with the average and max for each session. There is a stable development in the average from session to session, mostly because there are 50 attempts in each, blurring out any variability from attempt to attempt. For this, it is obviously necessary to have a higher number of attempts, which may not be feasible for all activities/sports. This is because one uses a rather high-level discrete data (the sum of kicks in one attempt) rather than low-level data such as individual kicks.
Since I have recorded the number of kicks for each attempt, and each attempt finishes with one unsuccessful kick, I can transform the data into a stream of individual kicks. For example, if I kicked the ball 3 and 2 times, then my streak was Kick-Kick-Kick-Miss-Kick-Kick-Miss. Each kick is considered as one attempt with a given probability of success. Beginners may have a success rate of 75%, meaning they average 3 kicks per attempt. By practicing, one increases the success rate of each individual kick. The higher it is, the longer the streaks will be on average. And the better one gets, the rarer the failed kicks become.
I took the stream of individual kicks and calculated the percentage of successful kicks for each kick and the next 500 (mean for n to n+500). Since any development is defined by the law of diminishing returns (rapid early progress, and slower progression the better one gets), I fitted a logarithmic curve to the data. This gives an estimate of the expected success rate after a given number of kicks. By calculating the slope of the curve at each given point one can define the speed with which learning occurs. This speed will be different for different people, and depend on talent, prior skill in similar tasks, motivation, and quality of training and feedback.
When I stopped practicing, I had kicked the ball 26’252 times. My expected success rate at that point was 99.13%, or a failed kick-rate of slightly less than 1%. That gives an expected average number of kicks in a row per attempt of 114, which is close to my actual average for the last session (which was 99). By extending the curve, one can calculate the expected success rate at a higher number of kicks. For example, after 35’000 kicks, I would be expected to have a success rate of 99.55% per kick. A meager improvement in % (0.42%), but an almost two-fold improvement in expected average, to 222 kicks per attempt.
Since I have recorded the practice time, I also know how long it will take to perform a given number of kicks. With this, I was able to calculate that the expected time to reach 35’000 kicks would be an additional 124,6 minutes. Using this information, I can ask myself “is 222 kicks per attempt worth the additional 2 hours of practice?”. This makes little sense for a narrow task such as this, where increasing the number of kicks is your only goal. But, when you are an athlete, it becomes a question of training efficiency. As a dodgeball player practicing throwing accuracy and wanting to increase the amount of points scored on court, you may be better off not investing that additional time, and instead focusing on defensive skills such as dodging. This trade-off can be objectively determined after one has calculated the player’s skill profile and identifying which skills reap which benefit during game play. See this post for an example.
Since maximal kicks are a rare event (by definition 1 in 50 per session), there is a lot more variability in their results from session to session. This makes it an unreliable value for gauging progress. One can instead make use of all the kicks in a session to calculate a probability distribution for each attempt (actually for each attempt and the neighbouring attempts), and then use that distribution to find the expected “rare events”. From this, one can find whether the actual maximal result (within the session) was unexpectedly high or low given the probability distribution for that session, or one can calculate the expected number of kicks given a 5% chance. To do this, one uses a so-called Kernel Density Estimate, shown below. This illustrates the probability of kicking X kicks, and I’ve presented the sum of probabilities for kicks in ranges of 25 kicks. The first plot shows my first session (where I averaged 8.4 and had a maximum of 42 kicks). The second plot shows one of my last attempts where I had the best run of 50 attempts. This is exemplified by the area under the curve (see below for explanation) being at an all time high of 189.
These plots show the estimated distributions for the first and the last sessions (with the width equal to twice the standard deviation). On the y-axis there is the probability that a given number of kicks was performed during an attempt. For example, in the first session there was only a 2% chance I would kick more than 50 times in a row, which rose to about 85% chance in the last session. I had roughly the same probability of kicking +50 times in the beginning as I had kicking +370 in the end.
The clip below shows my progress across sessions, with probability estimates for 50 kicks at a time, iterated in steps of three kicks.
By multiplying the probabilities with the number of kicks, one gets the area under the curve, which acts as a summary of the total skill. This gives a singular metric that quantifies skill, and makes tracking of progress simple and intuitive. Below is a plot showing how my area under the curve changes across practice. This metric will be better at reflecting the skill progression, and will be less influenced by rare occurrences (such as the bad session when I hadn’t slept) since it takes all attempts into consideration, and their relation to the rest of the attempts.
Another way of presenting this data, which is more intuitive than the KDE, is by showing the cumulative probability of making a certain number of kicks in a row. The plot below shows the probability of making a certain number of kicks or more. For example, for the first session, 16% of my attempts were with 30 or more kicks. In the end this rose to 92%.
Here is a video showing the change over time for the probability estimates.
Finally, one can look at how one does across attempts within each session. For example, it takes a couple of attempts at the beginning of each session to warm up and improve performance. I’ve plotted the number of kicks according to the attempt for all sessions. I’ve also plotted the average for each attempt, and then smoothened the curve a couple of times to indicate that trend. Using this data, it is clear that, for me, the optimal number of kicks per session is a little less than 40, after which point my motivation starts to drop. This can be used to identify training lengths. For example, when teaching small children math, one can expect them to fatigue and get bored rather quickly, and any attempt to teach them after that point is futile. By keeping track of their progress (by recording correct responses) one can identify the optimal time to spend teaching them.
If you want to record data in a similar way, you can download a sample spreadsheet by clicking here. If you want to get the above results of your own practice, you can email me your filled our spreadsheet, and I’ll try to find the time to analyse them.