“ I’ ve never seen a bag of money score a goal”

Yahya Yavuz
4 min readSep 1, 2020

Introduction

Why couldn’t you beat a richer club? I’ ve never seen a bag of money score a goal.

It is a famous quotation from Johann Cruyff, one of the legends in football history. He may refer to this as best tactics win the game not the expensive players and he is mostly right. However we have witnessed lately that suitcases of money makes some clubs successful.

Growing football industry can cause clubs spend higher than what they possibly win. UEFA brought out Financial Fair Play Regulations to audit the expenses of clubs an prevent them to collapse financially. To follow the rules in the regulation, valuation of a player becomes crucial either to sell or to buy.

So what makes a footballer expensive? To give an answer to that, I’ ve decided to predict the value of the football players using Fifa20 football video game dataset, provided in here.

Which position has the highest value?

To find a solution to this question, i simply take the averages of the values according to the positions footballers play.

Outliers Included

Bar plot shows that central forward (CF) players has almost doubles the closest position. CF has 8 mio € value on average. Let’ s also look at how many players are ther in the dataset in each position.

Top 3 positions in the dataset are Center Backs (CB — 17.3%), Strikers (ST — 14.1%) and Center Midfielders (CM — 12%)

Look to percentage of CF players. That’s just 0.6%. There may be bunch of football players whose values are so much high.

I decided to drop outliers and take the averages again to see if any change happens.

First rank changed from center forward to right winger. While outliers are out, wing attackers seems to have higher value. Goalkeepers still has the lowest value in the dataset.

Does overall rating or potential of the player add value?

Which one is more related with the value of a player? How they are playing or how potentially they can play? In the dataset, these 2 columns are provided. Let’s look how important they are.

Overall Rating vs Value / Potential vs Value

Above graphs don’t actually gives the exact answer. Both graphs seem same. I decided to take the correlations to reach the answer. Correlation of value with overall rating (0.64) is higher than with potential(0.58).

What happens in real football world? Actually football players have higher value if they play well and have potential. Is it the case also in dataset? What will be the correlation if a new variable is calculated by multiplication of potential and overall rating?

Scatter Matrix

Yes! The dataset also reflect the truth. Value is more correlated with derived variable (0.71).

What will be the prediction?

Every man thinks his own geese swans. All the players are valuable in their club’s eye. Is it the case in reality? Finally, lets try to develop an analytical model to predict value for the players. In this model, market value given in the dataset is selected as the target variable.

LightGBM Regressor is used to train the model. The result was almost perfect after parameter optimization (R² =0.9998) which means almost all variance is explained by the model trained.

Below graph also shows market value given in the dataset vs prediction. It is almost linear line which shows exactly what is predicted is also given.

My new created variable for this prediction is the most important variable in the model. Potential & overall rating are together more important than any other data. It’s also interesting to see kicking ability is the most important ability variable for goalkeepers.

Now, it’s time to predict the value of your favorite player.

You can visit here for details.

--

--

No responses yet