Beyond Moneyball - We All Dream of a Team of Arbeloas.

Posted by Acaustiq on May 15, 2012, 09:29:01 AM

Whilst under the influence of LSD, John Lennon would stare at a box of randomly blinking lights for hours, he eventually gave the maker of this technological marvel a job at Apple. Yanni Mardas (or ‘Magic Alex’) was little more than a delusional tv repairman, yet he managed to convince The Beatles that with the right financial backing he could build all manner of fantastical contraptions, ranging from an artificial sun to a force field - however his most idiotic flourish was almost certainly to ask Lennon and Harrison for the V12 engines from their cars in order to facilitate the building of a flying saucer.

Damien Comolli is very much the Yanni Mardas of football – propelled by grandiose delusions about his capabilities,failure never seems to impenetrate the bizarre little bubble he’s constructed around himself. Under Comolli, Tottenham were notable for buying big fish/small pond players (sometimes for daft money) who could almost never make the step up (Bentley and Bent being most notable). So when we started down a similar path last summer I was seriously worried.

Henderson had impressed me, as had Coates and Enrique, so I was happy with them. Never the less, looking at the squad, it was still devoid of consistent attacking plurality. As I expected nothing from Downing and Adam - we were down a huge sum of money and just as reliant on Suarez as we had been five months beforehand. Despite our post takeover excitement at the idea that we wouldn’t let Alves and Silva get away again, like a heroin addict who wins the lottery and buys a poppy field; we had somehow conspired to completely balls it up.

However, during the first few games we were a dodgy decision away from a rare opening day win, had beaten Arsenal away and were top of the league for a brief period. Downing had looked quite good, Adam wasn't the Sidwell level disaster I was anticipating and Carroll was far more threatening than he had been up until that point.

Was I wrong? Had he learnt from his mistakes at Tottenham? Had Comolli come up with some formula to account for the fact that a player who has the ball a disproportionate amount is going to look quite good both on a spreadsheet and to the easily deceived eye? Is this what had convinced the owners to hire a man with such a dire track record?

As we now know, there was no formula, no new insight or understanding. With Gerrard injured, after a few games the mid table mentality the new signings brought with them reared its ugly head, draws became ‘acceptable’ which eventually compounded into a disastrous December and ultimately relegation form in the second half of the season.

Despite the pre semi final protestations that Downing was exclusively Dalglish’s idea (I’ve no doubt he was agreeable). Comolli had tried to sign Downing for Spurs and once the owners had given him the proverbial V12 engine to pursue the same failed ideas that had got him sacked at Tottenham he was unable to contain his excitement at having ‘usurped’ the market by massively overpaying for a player it had basically rejected for eight years - declaring him to be an ‘efficient footballer’, which suggests to me a degree of involvement far beyond negotiation. Piecing together Comolli's world view from various sources reveals a rather terrifying picture of a simplistic man who believes the game can be reduced to simple arithmetic - 35% of goals come from set pieces, good free kick taker + height = goals. Oh and of course the alleged ‘chances created’ rubbish, when I heard ‘you need to buy runs’ come out of Jonah Hill’s mouth, my blood ran cold.

So, in the aftermath of this mediocrity orgy, there are basically three schools of thought regarding ‘statistics’.

1. That coarse numbers pertaining to passing, shooting, tackling, crossing and of course 'chances created' are all relevant and can help you make an informed decision about a potential signing or how to deploy a player.

2. That coarse numbers pertaining to passing, shooting, tackling, crossing are derived from heterogeneous environments therefore irrelevant and any attempt to homogenise them is a wild goose chase. However with a bit of insight and effort there are metrics that can be derived to help you make an informed decision about a potential signing or how to deploy a player, for example;

3. That they’re not worth a 20p mix.

Whilst I've had a bit of success with the second approach, I'm of the third opinion, not because JOEY JONES, STEVEN FLETCHER LID, SPOT ON LAD, DEMBELE, KNOWS WHERE THE NET IS, STEVE BRUCES HEAD FULL OF POUND COINS, but because I don't think there's a such thing as an empirically 'correct' player; it's like trying to do a massive jigsaw puzzle without knowing what the picture is supposed to be of. Take the striker chart for instance, to an outside observer winning the league might appear to be as easy getting four strikers from the green box, sitting back and watching the goals roll in - following that model we could have a strike force comprising Suarez, Falcao, Aguero and van Wolfswinkel for a combined fee of £45m-ish.

However Crouch went off the boil when Torres came in, Lewandowski's rise mirrors Barrios' decline and Ba has scored once since they bought Cisse. There's a dynamic there and trying to figure out the right balance puts you right back in the wild goose chase that you were trying to side step in the first place.

So, for me, a new approach is required.

Humans are, in theory, pattern matching machines, we know that despite both having four legs and a tail, a dog and a cow are different things because they fit different patterns and so humans view the world (and by implication, football players) through the filter of such patterns. On the other hand they tend to be a bit shit at differential calculus.

Computers though, are little more than adding machines, by design they don't know anything about patterns but they can tell you the square root of 234234923874 on demand and integrate a differential form quick sharp.

So, if it were possible to combine the calculation power of a computer with the pattern matching of a human, there may exist an opportunity for a major advantage.

Enter, Artificial Intelligence - the practice of enabling machines to recognise and classify patterns.

Let's start with a basic example, shapes are a some of the most simple patterns going, we can very quickly spot the circles, squares and triangles in the following image, but to a computer it's just a series of ones and zeroes, could be the Sistine chapel for all it knows.

However, if I 'teach' the computer the pattern of a circle, square and triangle and tell it to draw a different kind of white line around each one;

Does it actually understand the pattern though? Or is it just mirroring what I've told it, if I warp and distort with the image to try and confuse the algorithm;

And run it again;

To an outside observer the algorithm appears to have 'learnt' and therefore 'understood' the notion of square, circle and triangle.

Teaching a computer that Suarez is a circle, Messi is a square, Downing is shit and anyone who mentions Junior Hoilett in connection with this football club should be stabbed in the neck repeatedly, is conceptually not massively different. The obvious problem is that 'explaining' categories of player in the same way you would a square (four lines intersecting at right angles) is nigh on impossible, herein lies one half of the trickery - the algorithm in question can discover the categories dynamically based on the data I give it.

Let's use a high feedback example, something that can be manually validated, take the following data.

Now, we know that despite them all being cars there are two very distinct categories; supercars and not so much, can a computer figure this out? If we run the algorithm against the data and plot the results on a graph we end up with;

There are two outliers (which make an element of sense if you look closely at the data) but there are also two very obvious groups, the algorithm is judging how similar one object is (or isn’t) to another which is what we want - in an ideal world we'd go out and sign Falcao, Messi, Fabregas, etc but in this universe we need to find the next best thing, the Mosler to Messi's McLaren; something that performs in a similar fashion at a fraction of the cost.

With this in mind, I can do the same thing with opta data (although it takes much longer) whereas the above example uses all the data in question (apart from the name of the car), the opta numbers have to be reduced/distilled and organised in a very specific manner (the other half of the trickery). What I'm trying to do is extract the fundamental ‘essence’ of the player which invariably took months to get it right. So an obvious example of the kind of thing we're looking for is Enrique, a very good player for very little money (I think his late season wobble is in synchronicity with most of the team rather than it being an individual issue) using last season's data;

Enrique is closest to Cole and so is/was the best choice available based on the data.

Arbeloa is another example of the ideal signing, cheap and brilliant, this example is back to front because we already had him at the time (this is from the 08/09 season) but you can see Johnson (who spent that season at Portsmouth) is confirmed as a good replacement.

A few more examples from 08/09.

Centre backs;

Defensive midfielders;

Attacking midfielders;


You get the idea, are there players out there who'd sit in between Alonso and Mascherano, Arbeloa and Johnson, Gerrard and Fabregas, van Persie and Ronaldo?

It's not unreasonable to assume they exist and wouldn't necessarily cost daft money.

