Making a simple xG model

Expected goals (xG) is a phenomenon making its way deeper and deeper into mainstream football, appearing now on numerous newspaper reports of matches, Match of the Day, and even Sky Sports’ Soccer Saturday. However, the specifics of xG remains hidden to many. While this article will not explain the reasons behind creating an xG model, it will instead take you through the steps to create a rudimentary xG model, comparing the likelihood of scoring goals from different positions on the pitch: outside the penalty area, inside the penalty area (but outside the six-yard box, excluding penalties), and inside the six-yard box.

Using Paul Riley’s guide to creating an xG model, we take data from WhoScored for the number of shots and goals from each of the three situations we are looking at from the 2016/2017 season, all the way back to when records began in the 2009/2010 season. From this, we calculate the cumulative number of shots taken and number of goals scored, and simply work out the average of each. Though this does not work out if the difference in conversion rates is statistically significant, rather than possibly down to chance, with such a large sample size (with thousands of shots for each location in our sample) we can assume that they are.


From our analysis (Figure 1), we can see clear differences in the likelihood of scoring from different places on the pitch. Perhaps most importantly, shots from within the penalty box are over three times as likely to be converted than shots from outside. This large difference could prove pivotal to how a side plays in the final third. Given how wasteful shooting from outside the penalty area seems to be, especially when compared to shooting from inside, it would not be surprising to see sides coach players to try for that final pass into the penalty area, rather than shooting from distance. This style of play, aiming for greater efficiency on the ball, can be seen in teams like Arsenal who have a reputation for trying to pass the ball into the net and never taking shots from range, with Arsène Wenger even referencing expected goals in press conferences.

Interesting, the ratio of shots converted between shots inside the 18-yard box and inside the 6-yard box is almost as large as between shots inside and outside the penalty area (roughly three times as high). Again, this shows the value of careful shot selection, and working your way through the opposition’s defence. In fact, choosing when to shoot can be a large part of what separates top-class attackers from mediocre ones, exemplified by Raheem Sterling’s shot map for the 2016/2017 season, compared to Andros Townsend’s (Figure 2). Being able to get into positions that give you a higher chance of scoring is a valuable skill, with knowing when not to shoot, but to continue working the ball into the opposition’s box a large part of that.

Figure 2


Finally, getting the ball into the opposition box has another possible pay-off, aside from far greater chances of scoring from shots within the 18-yard box and the 6-yard box. Having players on the ball in the penalty area means a far greater chance of winning a penalty than trying your luck with a shot from outside, and as the conversion rate for penalties (removed from the numbers for shots and goals within the penalty area) is so substantially high (78%), it perhaps makes even more sense for teams to try to work the ball into the box (for a comparison of the number of shots taken, scored, and percent converted, see Figure 3).

Figure 3

Location Taken Scored Conversion (%)
SixYardBox 5781 1803 31.19
PenaltyArea 40444 4422 10.93
Penalties 770 598 77.66
OutOfBox 35995 1205 3.35

Expected goals then, perhaps, can help teams play more intelligently, with those embracing the advance of statistics in the sport gaining a not-so-marginal edge on their competition.


Riley’s xG model:
ggplot2 in R Studio was used to create the bar-chart.
Ted Knutson’s tweets for the shot maps: &
Title image:


Code used:
ggplot(data) +
geom_bar(aes(x = Location, y = Conversion, fill = Location), stat = ‘identity’, show.legend = FALSE) +
labs(y = ‘Conversion (%)’, x = ‘Shot Location’, title = ‘Figure 1’, subtitle = ‘Conversion Rate by Shot Type’) +
geom_text(aes(label=Conversion, x = Location, y = Conversion), vjust = -.2) +




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s