In this post I go over how you can extract some extra information from the play by play movement animations on stats.nba.com.
import requests
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import IFrame
sns.set_color_codes()
sns.set_style("white")
The play we will be extracting information from is this one from game 5 of the Clippers-Rockets playoff series. In the play James Harden drives to the basket, collapsing the Clippers defense, and then passes to Trevor Ariza for an easy 3.
I've embedded the movement animation below.
IFrame('http://stats.nba.com/movement/#!/?GameID=0041400235&GameEventID;=308',
width=700, height=400)
url = "http://stats.nba.com/stats/locations_getmoments/?eventid=308&gameid;=0041400235"
Lets extract our data using requests
# Get the webpage
response = requests.get(url)
# Take a look at the keys from the dict
# representing the JSON data
response.json().keys()
The data we want is found within home (home players' data), visitors (visitor players' data), and moments (the data that contains the information used by the animation above to plot the player movements).
# A dict containing home players data
home = response.json()["home"]
# A dict containig visiting players data
visitor = response.json()["visitor"]
# A list containing each moment
moments = response.json()["moments"]
Lets take a look at what information the home dict contains.
home
The visitor dict contains the same type of information, but it's info regarding the Clippers.
visitor
Now lets take a look at the moments list.
# Check the length
len(moments)
The length tells us that there are 700 items/moments that compose the animation above. But what information do these moments contain? Let's take a look at the first one.
moments[0]
First off, we see that the moment or item in moments is a list that contains a bunch of information. Lets go through each of the items in the list one by one.
- The 1st item in moments[0] is the period or quarter that this moment occurred in.
- I don't know what the 2nd item represents. Let me know if you are able to figure it out.
- The 3rd item is the time left in the game clock.
- The 4th item is the time left on th shot clock.
- I don't know what the 5th item represents.
- The 6th item is a list of 11 lists, each containing the coordinates for a player on the court or the coordinates of the ball.
- The first of these 11 lists contains information on the ball.
- The first 2 items represnt the teamid and playerid values that identify this list as the ball.
- The next 2 items are the x and y values that represent the location of the ball on the court.
- And the 5th and final item represents the radius of the ball. This value changes throughout the animation depending on the elevation of the ball. The greater the radius, the higher up the ball is. So if a player shoots the ball, the ball will increase in size, reach its maximum size at the apex of the shooting arch and then decrease in size as it falls down.
- The next 10 lists within this 6th item represent the 10 players on the court. The information within each of these lists is the same as it is for the ball.
- The first 2 items are the teamid and playerid that identify this list as a specific player.
- The next 2 items represent the x and y coordinates for the player's location on the court.
- And the last item is the radius of the player, which is irrelevant.
- The first of these 11 lists contains information on the ball.
Now that we have an idea of what the moments data represents, lets put it into a pandas DataFrame
.
First we create the column labels for the DataFrame
.
# Column labels
headers = ["team_id", "player_id", "x_loc", "y_loc",
"radius", "moment", "game_clock", "shot_clock"]
Then we create a seperate list containing the moments data for each player.
# Initialize our new list
player_moments = []
for moment in moments:
# For each player/ball in the list found within each moment
for player in moment[5]:
# Add additional information to each player/ball
# This info includes the index of each moment, the game clock
# and shot clock values for each moment
player.extend((moments.index(moment), moment[2], moment[3]))
player_moments.append(player)
# inspect our list
player_moments[0:11]
Pass in our newly created list of moments into pd.DataFrame
along with our column labels, to create our DataFrame
.
df = pd.DataFrame(player_moments, columns=headers)
df.head(11)
We are not done yet. We should add columns that contain player names and jersey numbers. First lets get all the players into one list.
# creates the players list with the home players
players = home["players"]
# Then add on the visiting players
players.extend(visitor["players"])
Using the players list we can create a dictionary with the player ID as the key and a list containing the player name and jersey number as the value.
# initialize new dictionary
id_dict = {}
# Add the values we want
for player in players:
id_dict[player['playerid']] = [player["firstname"]+" "+player["lastname"],
player["jersey"]]
id_dict
Lets update id_dict to include an id for the ball
id_dict.update({-1: ['ball', np.nan]})
Then create a player_name column and player_jersey column using the map
method on the player_id column. We will map an anonymous function, using lambda
, that returns the proper player_name and player_jersey based on the player_id value passed into the function.
In other words, what the code below does is iterate through the player IDs in the player_id column, and then passes each player ID into the anonymous function. This function then returns the player name and jersey associated with that player ID and adds those values to our DataFrame
.
df["player_name"] = df.player_id.map(lambda x: id_dict[x][0])
df["player_jersey"] = df.player_id.map(lambda x: id_dict[x][1])
df.head(11)
Plotting the Movements¶
Lets plot James Harden movements throughout the animation. We can plot the court using the court drawn onto the animation from stas.nba.com. You can find the SVG here. I converted it into a PNG file to make it easier to plot using matplotlib
. Also note that every 1 unit on the x or y-axis represents 1 foot on the basketball court.
# get Harden's movements
harden = df[df.player_name=="James Harden"]
# read in the court png file
court = plt.imread("fullcourt.png")
plt.figure(figsize=(15, 11.5))
# Plot the movemnts as scatter plot
# using a colormap to show change in game clock
plt.scatter(harden.x_loc, harden.y_loc, c=harden.game_clock,
cmap=plt.cm.Blues, s=1000, zorder=1)
# Darker colors represent moments earlier on in the game
cbar = plt.colorbar(orientation="horizontal")
cbar.ax.invert_xaxis()
# This plots the court
# zorder=0 sets the court lines underneath Harden's movements
# extent sets the x and y axis values to plot the image within.
# The original animation plots in the SVG coordinate space
# which has x=0, and y=0 at the top left.
# So, we set the axis values the same way in this plot.
# In the list we pass to extent 0,94 representing the x-axis
# values and 50,0 representing the y-axis values
plt.imshow(court, zorder=0, extent=[0,94,50,0])
# extend the x-values beyond the court b/c Harden
# goes out of bounds
plt.xlim(0,101)
plt.show()
We can also recreate most of the court by using just matplotlib Patches
. Instead of using the SVG coordinate system we will use the typical Cartesian coordinate system, so our y-values will be negative instead of positive.
from matplotlib.patches import Circle, Rectangle, Arc
# Function to draw the basketball court lines
def draw_court(ax=None, color="gray", lw=1, zorder=0):
if ax is None:
ax = plt.gca()
# Creates the out of bounds lines around the court
outer = Rectangle((0,-50), width=94, height=50, color=color,
zorder=zorder, fill=False, lw=lw)
# The left and right basketball hoops
l_hoop = Circle((5.35,-25), radius=.75, lw=lw, fill=False,
color=color, zorder=zorder)
r_hoop = Circle((88.65,-25), radius=.75, lw=lw, fill=False,
color=color, zorder=zorder)
# Left and right backboards
l_backboard = Rectangle((4,-28), 0, 6, lw=lw, color=color,
zorder=zorder)
r_backboard = Rectangle((90, -28), 0, 6, lw=lw,color=color,
zorder=zorder)
# Left and right paint areas
l_outer_box = Rectangle((0, -33), 19, 16, lw=lw, fill=False,
color=color, zorder=zorder)
l_inner_box = Rectangle((0, -31), 19, 12, lw=lw, fill=False,
color=color, zorder=zorder)
r_outer_box = Rectangle((75, -33), 19, 16, lw=lw, fill=False,
color=color, zorder=zorder)
r_inner_box = Rectangle((75, -31), 19, 12, lw=lw, fill=False,
color=color, zorder=zorder)
# Left and right free throw circles
l_free_throw = Circle((19,-25), radius=6, lw=lw, fill=False,
color=color, zorder=zorder)
r_free_throw = Circle((75, -25), radius=6, lw=lw, fill=False,
color=color, zorder=zorder)
# Left and right corner 3-PT lines
# a represents the top lines
# b represents the bottom lines
l_corner_a = Rectangle((0,-3), 14, 0, lw=lw, color=color,
zorder=zorder)
l_corner_b = Rectangle((0,-47), 14, 0, lw=lw, color=color,
zorder=zorder)
r_corner_a = Rectangle((80, -3), 14, 0, lw=lw, color=color,
zorder=zorder)
r_corner_b = Rectangle((80, -47), 14, 0, lw=lw, color=color,
zorder=zorder)
# Left and right 3-PT line arcs
l_arc = Arc((5,-25), 47.5, 47.5, theta1=292, theta2=68, lw=lw,
color=color, zorder=zorder)
r_arc = Arc((89, -25), 47.5, 47.5, theta1=112, theta2=248, lw=lw,
color=color, zorder=zorder)
# half_court
# ax.axvline(470)
half_court = Rectangle((47,-50), 0, 50, lw=lw, color=color,
zorder=zorder)
hc_big_circle = Circle((47, -25), radius=6, lw=lw, fill=False,
color=color, zorder=zorder)
hc_sm_circle = Circle((47, -25), radius=2, lw=lw, fill=False,
color=color, zorder=zorder)
court_elements = [l_hoop, l_backboard, l_outer_box, outer,
l_inner_box, l_free_throw, l_corner_a,
l_corner_b, l_arc, r_hoop, r_backboard,
r_outer_box, r_inner_box, r_free_throw,
r_corner_a, r_corner_b, r_arc, half_court,
hc_big_circle, hc_sm_circle]
# Add the court elements onto the axes
for element in court_elements:
ax.add_patch(element)
return ax
plt.figure(figsize=(15, 11.5))
# Plot the movemnts as scatter plot
# using a colormap to show change in game clock
plt.scatter(harden.x_loc, -harden.y_loc, c=harden.game_clock,
cmap=plt.cm.Blues, s=1000, zorder=1)
# Darker colors represent moments earlier on in the game
cbar = plt.colorbar(orientation="horizontal")
# invert the colorbar to have higher numbers on the left
cbar.ax.invert_xaxis()
draw_court()
plt.xlim(0, 101)
plt.ylim(-50, 0)
plt.show()
Calculating distance traveled¶
We can calculate the distance traveled by a player by getting the Euclidean distance between consecutive points and then adding those distances.
SO link about getting euclidean distance for consecutive points.
def travel_dist(player_locations):
# get the differences for each column
diff = np.diff(player_locations, axis=0)
# square the differences and add them,
# then get the square root of that sum
dist = np.sqrt((diff ** 2).sum(axis=1))
# Then return the sum of all the distances
return dist.sum()
# Harden's travel distance
dist = travel_dist(harden[["x_loc", "y_loc"]])
dist
We can get the total distance traveled by each player using groupby
and apply
. We group by player, get each of their coordinate locations, and then apply the above distance function.
player_travel_dist = df.groupby('player_name')[['x_loc', 'y_loc']].apply(travel_dist)
player_travel_dist
Calculating average speed¶
Calulcating a players average speed is pretty straight forward. All we do is just divide the distance by time.
# get the number of seconds for the play
seconds = df.game_clock.max() - df.game_clock.min()
# feet per second
harden_fps = dist / seconds
# convert to miles per hour
harden_mph = 0.681818 * harden_fps
harden_mph
We can get the average speed for each player using the player_travel_dist Series
we previously created.
player_speeds = (player_travel_dist/seconds) * 0.681818
player_speeds
Calculate the distance between players¶
Lets check out the distance between Harden and every other player throughout the play.
First get Harden's locations.
harden_loc = df[df.player_name=="James Harden"][["x_loc", "y_loc"]]
harden_loc.head()
Now lets group by player_name and get the locations for each player and the ball.
group = df[df.player_name!="James Harden"].groupby("player_name")[["x_loc", "y_loc"]]
We can apply a function, utilizing the euclidean
function from the scipy
library, on group. returning us a list for each player that contains the distance between James Harden and the player throughout the play.
from scipy.spatial.distance import euclidean
# Function to find the distance between players
# at each moment
def player_dist(player_a, player_b):
return [euclidean(player_a.iloc[i], player_b.iloc[i])
for i in range(len(player_a))]
Each player's locations are passed in as player_a in the player_dist function and Harden's locations are passed in as player_b.
harden_dist = group.apply(player_dist, player_b=(harden_loc))
harden_dist
Just note that the ball only has 690 items in its list, versus 700 for the players.
len(harden_dist["ball"])
len(harden_dist["Blake Griffin"])
Now that we know how to get the distances between players lets try to see how James Harden's drive to the basket affects some of the spacing on the floor.
Let's take another look at the moments animation. And go over what occurs during Harden's drive.
IFrame('http://stats.nba.com/movement/#!/?GameID=0041400235&GameEventID;=308',
width=700, height=400)
As Harden drives to the hoop, DeAndre Jordan moves off of Dwight Howards to defend basket and Matt Barnes switches over to cover Howards (but falls down), leaving Ariza open. Harden sees Ariza, passes him the ball, and Ariza takes the shot as Chris Paul tries to rush over to defend. All this occurs from about 11:46 left in the 3rd quarter to about 11:42, and the shot clock runs from about 10.1 seconds when Harden begins his drive to about 6.2 when Ariza releases the ball. We can actually find more information about Ariza's shot attempt in his shot logs page.
# Boolean mask used to grab the data within the proper time period
time_mask = (df.game_clock <= 706) & (df.game_clock >= 702) & \
(df.shot_clock <= 10.1) & (df.shot_clock >= 6.2)
time_df = df[time_mask]
From the animation it looks like Harden passes the ball around 7.7 to 7.8 seconds left in the quarter. We can check out the distance between him and the ball to be sure.
ball = time_df[time_df.player_name=="ball"]
harden2 = time_df[time_df.player_name=="James Harden"]
harden_ball_dist = player_dist(ball[["x_loc", "y_loc"]],
harden2[["x_loc", "y_loc"]])
plt.figure(figsize=(12,9))
x = time_df.shot_clock.unique()
y = harden_ball_dist
plt.plot(x, y)
plt.xlim(8, 7)
plt.xlabel("Shot Clock")
plt.ylabel("Distance between Harden and the Ball (feet)")
plt.vlines(7.7, 0, 30, color='gray', lw=0.7)
plt.show()
Lets plot the change in distances between some of the players during this time period. We'll plot the change between Harden and Jordan, Howard and Barnes, Ariza and Barnes, and Ariza and Paul.
# Boolean mask to get the players we want
player_mask = (time_df.player_name=="Trevor Ariza") | \
(time_df.player_name=="DeAndre Jordan") | \
(time_df.player_name=="Dwight Howard") | \
(time_df.player_name=="Matt Barnes") | \
(time_df.player_name=="Chris Paul") | \
(time_df.player_name=="James Harden")
# Group by players and get their locations
group2 = time_df[player_mask].groupby('player_name')[["x_loc", "y_loc"]]
# Get the differences in distances that we want
harden_jordan = player_dist(group2.get_group("James Harden"),
group2.get_group("DeAndre Jordan"))
howard_barnes = player_dist(group2.get_group("Dwight Howard"),
group2.get_group("Matt Barnes"))
ariza_barnes = player_dist(group2.get_group("Trevor Ariza"),
group2.get_group("Matt Barnes"))
ariza_paul = player_dist(group2.get_group("Trevor Ariza"),
group2.get_group("Chris Paul"))
# Create some lists that will help create our plot
# Distance data
distances = [ariza_barnes, ariza_paul, harden_jordan, howard_barnes]
# Labels for each line that we will plopt
labels = ["Ariza - Barnes", "Ariza - Paul", "Harden - Jordan", "Howard - Barnes"]
# Colors for each line
colors = sns.color_palette('colorblind', 4)
plt.figure(figsize=(12,9))
# Use enumerate to index the labels and colors and match
# them with the proper distance data
for i, dist in enumerate(distances):
plt.plot(time_df.shot_clock.unique(), dist, color=colors[i])
y_pos = dist[-1]
plt.text(6.15, y_pos, labels[i], fontsize=14, color=colors[i])
# Plot a line to indicate when Harden passes the ball
plt.vlines(7.7, 0, 30, color='gray', lw=0.7)
plt.annotate("Harden passes the ball", (7.7, 27),
xytext=(8.725, 26.8), fontsize=12,
arrowprops=dict(facecolor='lightgray', shrink=0.10))
# Create horizontal grid lines
plt.grid(axis='y',color='gray', linestyle='--', lw=0.5, alpha=0.5)
plt.xlim(10.1, 6.2)
plt.title("The Distance (in feet) Between Players \nFrom the Beginning"
" of Harden's Drive up until Ariza Releases his Shot", size=16)
plt.xlabel("Time Left on Shot Clock (seconds)", size=14)
# Get rid of unneeded chart lines
sns.despine(left=True, bottom=True)
plt.show()
I created a small Python module, that you can find here, containing some of the functions used in this post.
Comments, Suggestion, Questions, and Contact Info¶
Leave a comment below if you have any questions, see any issues, or have any suggestions on improving the code.
You can also email me at savvas.tjortjoglou@gmail.com or follow me on Twitter @savvas_tj.
Comments
comments powered by Disqus