Exploring European Football Transfer Networks

Exploring a network of football teams and the transactions they made from 2018-2021. An Edgelist maybe?

Isha Akshita Mahajan, Ankit Kumar (UMass Amherst)
2022-05-08

Load Required Packages

Load the data

I begin by converting the data into an edgelist format. I select the club from and club to columns which are our nodes in this case. The relationship that I’m interested in is where from and where to has the majority of transfers taken place across various places. After selecting the from and to columns, I select the players, the transfer fee and the season in which the transfer took place. I also remove loan and free transfer as they don’t seem relevant at the moment.

#load data from CSV
transfers <- read_csv("/Users/isha/Desktop/GitHub/transfernetworks.csv")
#select relevant data for edgelist format
data <- transfers %>% 
  select(club_from, club_to, name,fee) %>% 
  filter(fee != "free transfer") %>% 
  filter(fee != "loan transfer") %>% 
  filter (fee != 0)
data$fee <- as.numeric(data$fee) 
data <- data %>% 
filter(fee != is.na(fee))
any(is.na(data$fee))
[1] FALSE
kable(head(data))
club_from club_to name fee
Aston Villa Manchester City Jack Grealish 1.17e+08
Inter Milan Chelsea FC Romelu Lukaku 1.13e+08
Borussia Dortmund Manchester United Jadon Sancho 8.50e+07
ACF Fiorentina Juventus FC Dušan Vlahović 8.10e+07
Real Madrid Manchester United Raphaël Varane 4.00e+07
Inter Milan Paris Saint-Germain Achraf Hakimi 6.60e+07

Create Edgelist (Part 1)

The edgelist I created consists of 867 nodes and 4386 ties. The edge attributes are the players who are being transferred. The weight of the edges is the fee for which they were transferred. The edgelist is directed because the players are transferring from one team to another.

#convert data into matrix format
data.mat <- as.matrix(data)
#create igraph object from data
ig <- graph_from_data_frame(data.mat, directed = TRUE)
# add edge attribute weight i.e transfer fee
ig <- set_edge_attr(ig, "weight", value = na.omit(data$fee))
# add edge attribute season of transfer
#ig<-  set_edge_attr(ig, "season", value = na.omit(data$season))
# add node attribute i.e league the club belongs to
#set_vertex_attr(ig,"league" ,value = node_attr)
#delete edge attribute that was automatically being created in addition to weight
ig <- delete_edge_attr(ig, "fee")
#check summary of the igraph object
summary(ig)
IGRAPH 1b96174 DNW- 867 4386 -- 
+ attr: name (v/c), name (e/c), weight (e/n)
#convert ig network into intergraph object to coerce with statnet
network <- intergraph::asNetwork(ig) 
network
 Network attributes:
  vertices = 867 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = TRUE 
  bipartite = FALSE 
  total edges= 4386 
    missing edges= 0 
    non-missing edges= 4386 

 Vertex attribute names: 
    vertex.names 

 Edge attribute names not shown 
#plot intergraph object
plot(network)

Exploring Network Structures

#count the number of nodes
vcount(ig)
[1] 867
# count the number of edges
ecount(ig)
[1] 4386

The network consists of 867 nodes and 4386 edges. This means that there are 867 football clubs in our network and we are going to be explore the transfer of players that has taken place in the last four years i.e 8 transfer windows.

# look at the dyad census
dyad_census(ig)
$mut
[1] 201

$asym
[1] 3468

$null
[1] 371742

There are 201 mutual, 3468 asymmetrical and 371742 dyads

 [1] 105140687   2584373    450247     12854     17067     27123
 [7]      4389      3716      1486       339       380       182
[13]       194       306       134        28
triangles(ig)
+ 8007/867 vertices, named, from 1b96174:
   [1] SL Benfica                      
   [2] Borussia Dortmund               
   [3] Eintracht Frankfurt             
   [4] SL Benfica                      
   [5] Borussia Dortmund               
   [6] PSV Eindhoven                   
   [7] SL Benfica                      
   [8] Borussia Dortmund               
   [9] CA Boca Juniors                 
  [10] SL Benfica                      
+ ... omitted several vertices
[1] TRUE
[1] TRUE
[1] FALSE

The results suggest that the network is directed, i.e players are transferring from one club to another.

The network is weighted. The weight of the transfer is the fee for which the transfer was made.

The network is not bipartite which mean that they transfers are not in sets- there is a flow of transfers of players in the network.

Transitivity

#get global clustering cofficient: igraph
transitivity(ig, type="global")
[1] 0.1088854
#get average local clustering coefficient: igraph
transitivity(ig, type="average")
[1] 0.1058294

The global transitivity of the graph is 0.1088854 which is the ratio of triangles connected to triangles

the average transitivity 0.1058294 is the transitivity of the local triad clusters, i.e. the ratio of local triangles to all connected triangles.

Path Lengths

The average path length in the weighted network is 5689620.

[1] 5689620

The shortest_paths function enables us to look at the shortest parts between two nodes. Let’s explore some shortest between football clubs of various leagues.

shortest_paths(ig,"Chelsea FC", "Liverpool FC")$vpath[[1]]
+ 6/867 vertices, named, from 1b96174:
[1] Chelsea FC        Spartak Moscow    SC Freiburg      
[4] 1.FC Union Berlin FC Schalke 04     Liverpool FC     
shortest_paths(ig, "FC Porto", "Juventus FC")$vpath[[1]]
+ 6/867 vertices, named, from 1b96174:
[1] FC Porto       AS Roma        ACF Fiorentina Hellas Verona 
[5] SS Lazio       Juventus FC   
shortest_paths(ig, "Bayern Munich", "Aston Villa")$vpath[[1]]
+ 9/867 vertices, named, from 1b96174:
[1] Bayern Munich       TSG 1899 Hoffenheim VfB Stuttgart      
[4] SC Braga            Olympiacos Piraeus  Red Star Belgrade  
[7] UD Las Palmas       LOSC Lille          Aston Villa        
distances(ig,"Chelsea FC", "Real Madrid")
           Real Madrid
Chelsea FC       8e+05
distances(ig, "Bayern Munich", "Chelsea FC")
              Chelsea FC
Bayern Munich    1050000
[1] 5689620

The distance between Chelsea and FC Barcelona is 1.6 nodes.

Component Structure

names(igraph::components(ig))
[1] "membership" "csize"      "no"        
igraph::components(ig)$no
[1] 20
igraph::components(ig)$csize
 [1] 829   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[17]   2   2   2   2
#igraph::components(ig)$membership

There are 20 components in this network. The largest component consists of 829 nodes and the rest nineteen components comprise of two nodes each.