Spatial Crime Analysis in Python

Oct 14, 2025·
Ruslan Klymentiev
Ruslan Klymentiev
· 17 min read
Photo by Stefancu Iulian on Unsplash

Introduction

Understanding where crime happens is just as important as understanding why it happens. Spatial crime analysis allows us to explore the geography of criminal activity, particularly uncovering spatial patterns, identifying hotspots, and informing data-driven strategies for prevention and policing.

This tutorial provides a practical introduction to spatial crime analysis using Python, guiding you through several essential techniques for understanding how crime patterns vary across space. You will learn how to:

  • import publicly available data using API requests
  • create density maps to identify spatial hot spots of crime incidents
  • generate summary statistics for neighborhood-level comparisons
  • apply global and local spatial autocorrelation methods (Moran’s I and Local Moran’s I) to detect clustering structures

Python is a powerful alternative to traditional GIS tools like QGIS because it allows you to fully automate workflows, integrate statistical analysis with spatial data, and reproduce results programmatically. This makes it ideal not only for exploration but also for scalable, repeatable crime analysis pipelines. Note that basic programming/Python knowledge is required for this tutorial.

Data and Setup

 1import pandas as pd
 2import geopandas as gpd
 3import seaborn as sns
 4import numpy as np
 5import contextily as cx
 6import requests
 7import ipywidgets as widgets
 8import matplotlib.pyplot as plt
 9from matplotlib.patches import Ellipse
10from pointpats import centrography
11from IPython.core.pylabtools import figsize
12from libpysal.weights import Queen
13from esda.moran import Moran, Moran_Local

We will use publicly available data on homicides from the Toronto Police Service (TPS). According to the dataset documentation, it includes records of all homicide occurrences in Toronto since 2004. The TPS Open Data Portal provides several download options, including CSV, GeoJSON, and Excel formats, making it easy to work with in different environments. It also offers an API endpoint, which is especially useful when you want to automate data retrieval or integrate live updates directly into your analysis scripts. In this tutorial, we’ll take advantage of the API to demonstrate how to import data programmatically in Python. This approach not only saves time but also makes your workflow more reproducible.

To work with spatial data in Python, we’ll use GeoPandas, a powerful open-source library that extends the functionality of pandas to handle geographic data. It allows you to work with vector data (like points, lines, and polygons) in the same way you’d work with tabular data in pandas, while adding built-in support for geometry operations, map plotting, and coordinate transformations.

The core data structure in GeoPandas is the GeoDataFrame, which works just like a regular pandas DataFrame but includes a special geometry column that stores spatial features, for example, the geographic coordinates of each homicide location. This makes it easy to visualize data on a map, perform spatial joins, or calculate geometric properties such as distances and areas.

 1# make an API request to get the data
 2
 3# define query parameters for the API request
 4fields_to_retrieve = '*'  # retrieve all available fields
 5file_format = 'geojson'  # request data in GeoJSON format (easy to load into GeoPandas)
 6
 7# construct the API endpoint URL
 8url = f"https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Homicides_Open_Data_ASR_RC_TBL_002/FeatureServer/0/query?outFields={fields_to_retrieve}&where=1%3D1&f={file_format}"
 9
10# send the API request
11response = requests.get(url)
12# parse the response as JSON
13data = response.json()
14# convert the obtained data into a GeoDataFrame
15homicides_df = gpd.GeoDataFrame.from_features(data['features'])
16
17# preview the first few records
18homicides_df.head()
geometryOBJECTIDEVENT_UNIQUE_IDOCC_DATEOCC_YEAROCC_MONTHOCC_DAYOCC_DOWOCC_DOYDIVISIONHOMICIDE_TYPEHOOD_158NEIGHBOURHOOD_158HOOD_140NEIGHBOURHOOD_140LONG_WGS84LAT_WGS84
0POINT (-79.393 43.685)1GO-200411187810731060000002004January3Saturday3D53Other098Rosedale-Moore Park (98)098Rosedale-Moore Park (98)-79.39282843.685026
1POINT (-79.234 43.782)2GO-200412575510735380000002004January8Thursday8D42Shooting142Woburn North (142)137Woburn (137)-79.23385243.781782
2POINT (-79.206 43.811)3GO-200413608610735380000002004January8Thursday8D42Shooting146Malvern East (146)132Malvern (132)-79.20557443.810544
3POINT (-79.434 43.67)4GO-200414862310750068000002004January25Sunday25D13Shooting171Junction-Wallace Emerson (171)093Dovercourt-Wallace Emerson-Junction (93)-79.43438743.670467
4POINT (-79.205 43.823)5GO-200414861910750068000002004January25Sunday25D42Shooting144Morningside Heights (144)131Rouge (131)-79.20495843.822997

You can still perform all the usual data analysis operations you would with a regular pandas DataFrame, such as filtering, aggregation, and plotting. For example, we can aggregate homicides by quarter and visualize trends over time to see how the number of occurrences changes across the years.

The OCC_DATE column in the dataset is stored as a timestamp in milliseconds. Here, we convert it to pandas datetime objects so that we can perform date-based operations and groupings easily.

 1# convert date column from timestamp to datetime format
 2homicides_df['OCC_DATE_conv'] = homicides_df['OCC_DATE'].apply(lambda x: pd.Timestamp(x, unit='ms'))
 3
 4# calculate the number of events by the quarter of occurrence (converted from the datetime column)
 5# and the type of homicide
 6temp_df = homicides_df\
 7    .groupby([homicides_df['OCC_DATE_conv'].dt.to_period("Q"), 'HOMICIDE_TYPE'])\
 8    .agg({"EVENT_UNIQUE_ID": "count"})\
 9    .reset_index(drop=False)\
10    .rename(columns={"EVENT_UNIQUE_ID": "count"})
11
12# convert date column back to timestamp for plotting
13temp_df['OCC_DATE_conv'] = temp_df['OCC_DATE_conv'].dt.to_timestamp()
14
15plt.figure(figsize=(10, 5))
16
17sns.lineplot(
18    data=temp_df,
19    x='OCC_DATE_conv',
20    y='count',
21    hue='HOMICIDE_TYPE',  # separate lines for each homicide type
22    marker="o",  # add markers to emphasize data points
23    linewidth=1
24)
25
26plt.title("Quarterly Homicides in Toronto")
27plt.xlabel("Date")
28plt.ylabel("Number of Homicides")
29plt.legend(title="Homicide Type")
30plt.show()

Now to the spatial analysis. As the columns LONG_WGS84 and LAT_WGS84 suggest, the location coordinates are based on the WGS 84 geographic coordinate system (EPSG:4326). We will convert those geographic coordinates into the Web Mercator projection (EPSG:3857). This projected coordinate system represents the Earth on a flat, two-dimensional plane using meters as units, the same system used by most online mapping services (like OpenStreetMap, Google Maps, or CartoDB).

1homicides_df = homicides_df \
2    .set_crs(epsg=4326) \
3    .to_crs(epsg=3857)

Pin Map and Centrography

Before moving on to more advanced analysis, it’s a good idea to visually inspect the data to ensure everything looks as expected. That’s where the .explore() function from GeoPandas comes in. It allows you to create interactive web maps directly from a GeoDataFrame with just a single line of code — making it easy to pan, zoom, and click on individual features to explore their attributes.

Note that crime locations in this dataset are offset to the nearest intersection to protect privacy. Coordinates are approximate and should not be interpreted as exact addresses.

1homicides_df.explore()

The first example computes two fundamental centrographic measures commonly used in spatial crime analysis: the mean center and the standard deviational ellipse. The mean center shows the geographic “average” location of all homicide events, essentially the spatial gravity point of the dataset. The standard deviational ellipse summarizes the shape, orientation, and dispersion of the crime pattern by indicating the primary directional trend (e.g., northeast–southwest) and spread of incidents. Plotting these measures on top of incident points and a basemap makes it easy to understand where homicides concentrate overall and whether their pattern stretches along a particular corridor or axis. This is helpful for detecting city-wide directional trends, comparing different time periods, and supporting strategic policing or resource deployment.

 1# extract x and y coordinates from geometry
 2homicides_df['x'] = homicides_df['geometry'].apply(lambda x: x.xy[0][0])
 3homicides_df['y'] = homicides_df['geometry'].apply(lambda x: x.xy[1][0])
 4
 5# compute the mean center of all homicide locations
 6# (i.e., the geographic "average" of all x/y coordinates)
 7mean_center = centrography.mean_center(homicides_df[["x", "y"]])
 8
 9# compute the parameters of the Standard Deviational Ellipse
10major, minor, rotation = centrography.ellipse(homicides_df[["x", "y"]])
11
12# create a joint plot (scatter plot + marginal distributions of x/y coordinates)
13ax = sns.jointplot(
14    x="x", y="y", data=homicides_df, height=8, s=4,
15)
16
17# add a web basemap to the jointplot's main axis
18cx.add_basemap(ax.ax_joint, source=cx.providers.CartoDB.Positron)
19# add marginal rug plots to show density along each axis
20ax.plot_marginals(sns.rugplot, color="black", height=-.1, clip_on=False)
21
22# plot the mean center on the map
23ax.ax_joint.scatter(
24    *mean_center, color="red", marker="x", s=50, label="Mean Center"
25)
26
27# add vertical and horizontal lines showing the mean center in the marginal distributions
28ax.ax_marg_x.axvline(mean_center[0], color="red")
29ax.ax_marg_y.axhline(mean_center[1], color="red")
30
31# create the standard deviational ellipse patch
32ellipse = Ellipse(
33    xy=mean_center,
34    width=major * 2,
35    height=minor * 2,
36    angle=np.rad2deg(rotation),
37    facecolor="none",
38    edgecolor="red",
39    linestyle="--"
40)
41
42# add the ellipse to the main jointplot axis
43ax.ax_joint.add_patch(ellipse)
44
45# remove axis labels for a cleaner appearance
46ax.ax_joint.set_axis_off()

Density Plot (Kernel Density Estimation)

While the mean center and standard deviational ellipse provide a useful summary of the overall spatial distribution of homicides, they don’t show where crimes are most concentrated within the city. To reveal these high-intensity areas, a more informative approach is Kernel Density Estimation (KDE). KDE creates a continuous surface that highlights hot spots, locations where incidents cluster more tightly. To create a density plot, we first need to extract the numeric coordinates from each point’s geometry, because the `kde` function in Seaborn requires regular x–y values rather than geometric objects. Once the coordinates are available, we can plot the incident points on a basemap and overlay a smooth KDE surface to highlight areas where homicides are more concentrated. This results in a clear visual representation of spatial crime intensity across the city.

 1# plot all homicide points as the base layer
 2ax = homicides_df\
 3    .plot(figsize=(10, 10), alpha=0.1)
 4
 5# add a web basemap underneath (CartoDB Positron)
 6cx.add_basemap(ax, source=cx.providers.CartoDB.Positron)
 7
 8# add Kernel Density Estimation (KDE) heatmap to show homicide hotspots
 9sns.kdeplot(homicides_df, x='x', y='y', fill=True, ax=ax, cmap='Reds', alpha=0.5)
10
11plt.title("Homicides Hot Spots (All Years)")
12plt.axis('off')
13plt.show()

The plot above shows the density plot of homicides aggregated for all available years. Sometimes it might be useful to aggregate the data for each year separately. If you are using Jupyter Notebooks, there is a way to create an interactive plot using Jupyter Widgets. Unfortunately, the result of the code below cannot be demonstrated in a post environment, but the short video demonstration is included.

 1def plot_homicides(year):
 2    # filter dataset to include only homicides from the selected year
 3    year = str(year)
 4    df_year = homicides_df.query(f"OCC_YEAR == '{year}'")
 5
 6    # make a plot
 7    fig, ax = plt.subplots(figsize=(10, 10))
 8    df_year.plot(ax=ax, alpha=0.5)
 9    cx.add_basemap(ax, source=cx.providers.CartoDB.Positron)
10    # add Kernel Density Estimation (KDE) surface on top of the basemap
11    sns.kdeplot(df_year, x='x', y='y', fill=True, ax=ax, cmap='Reds', alpha=0.5)
12
13    plt.title(f"Homicides Hot Spots ({year})", fontsize=14)
14    plt.axis('off')
15    plt.show()
16
17
18# create a list of available years from the dataset
19years = sorted(homicides_df["OCC_YEAR"].dropna().unique())
20# build an interactive slider in Jupyter using ipywidgets
21# the slider lets the user choose a year, which automatically triggers plot_homicides()
22widgets.interact(plot_homicides, year=widgets.SelectionSlider(
23   options=years,
24   description='Year:',
25   continuous_update=False,
26   layout=widgets.Layout(width='600px')
27))

Neighborhood-Level Aggregation and Choropleth Maps

In many analytical or policy-oriented contexts, we are also interested in how crime varies across administrative units, such as neighborhoods. Aggregating incidents to these regions allows us to create choropleth maps, which summarize crime counts (or rates) for each area. Of course, you could do this numerically by aggregating the data, for example:

1homicides_df\
2    .groupby('NEIGHBOURHOOD_158')\
3    .agg({"EVENT_UNIQUE_ID": "count"})\
4    .sort_values(by="EVENT_UNIQUE_ID", ascending=False)\
5    .reset_index(drop=False)\
6    .rename(columns={"EVENT_UNIQUE_ID": "count"})\
7    .head(5)

NEIGHBOURHOOD_158

count

0

Mount Olive-Silverstone-Jamestown (2)

43

1

Moss Park (73)

42

2

Glenfield-Jane Heights (25)

39

3

Black Creek (24)

28

4

Weston (113)

27

This gives a quick look at the neighborhoods with the highest number of homicides. However, for this tutorial, we want to use the data for spatial demonstrations. One challenge is that our homicide dataset only contains the names of neighborhoods and not their geometries, that are needed for visualization. To solve this, we’ll load another dataset that includes the polygon boundaries of each neighborhood, allowing us to map and analyze homicide counts spatially. By looking through available datasets on TPS Data Portal, one can see that Neighbourhood Crime Rates Open Data has the relevant information. This dataset also has information on population and neighborhood area, which might be useful for assessing population density and its relationship with crime rates across neighborhoods.

 1# we request only the relevant columns: the neighborhood name, ID, and geometric properties
 2fields_to_retrieve = 'AREA_NAME,HOOD_ID,POPULATION_2024,Shape__Area,Shape__Length'
 3
 4# set the output format to GeoJSON
 5file_format = 'geojson'
 6
 7# construct the API request URL
 8url = f"https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Neighbourhood_Crime_Rates_Open_Data/FeatureServer/0/query?where=1%3D1&outFields={fields_to_retrieve}&f={file_format}"
 9
10# make the request and convert to GeoDataFrame
11response = requests.get(url)
12data = response.json()
13neighborhoods_df = gpd.GeoDataFrame.from_features(data['features'])
14
15neighborhoods_df.sample(5)
geometryAREA_NAMEHOOD_IDShape__AreaShape__Length
137POLYGON ((-79.526 43.601, -79.526 43.601, -79....Long Branch192.261914e+067808.451620
112POLYGON ((-79.358 43.765, -79.357 43.765, -79....St.Andrew-Windfields407.345556e+0612863.420551
90POLYGON ((-79.317 43.667, -79.317 43.666, -79....Greenwood-Coxwell651.675166e+066843.006889
87POLYGON ((-79.356 43.665, -79.356 43.665, -79....North Riverdale681.786063e+065475.708476
60POLYGON ((-79.397 43.696, -79.397 43.696, -79....Yonge-St.Clair971.161371e+065873.159373

Once we load the neighborhood boundary data, we can merge it with the homicide dataset to attach the corresponding polygon geometry to each homicide event. Note that neighborhood IDs (HOOD_ID) in the boundary dataset are stored as integers (e.g., 1, 2, 3), while in the homicide dataset they are stored as zero-padded text strings (e.g., “001”, “002”, “003”). To ensure a successful merge, we first need to convert the integer IDs into 3-digit string codes.

Then we will aggregate the data to calculate the count of homicides within each neighborhood.

 1# convert integer to a zero-padded 3-digit string (e.g., 1 -> "001")
 2neighborhoods_df['HOOD_ID'] = neighborhoods_df['HOOD_ID']\
 3    .apply(lambda x: str(x).rjust(3, '0'))
 4
 5# merge neighbourhood polygons with homicide data
 6# left-join homicide records to neighbourhoods using matching HOOD_ID fields
 7# count how many homicide events fall into each neighbourhood
 8# reset index and rename the count column
 9# merge counts back into the neighbourhoods_df to retain geometry
10combined_df = neighborhoods_df\
11    .merge(homicides_df, left_on='HOOD_ID', right_on='HOOD_158', how='left')\
12    .groupby(['HOOD_ID'])\
13    .agg({'EVENT_UNIQUE_ID': 'count'})\
14    .reset_index()\
15    .rename(columns={'EVENT_UNIQUE_ID': 'HOM_TOTAL_COUNT'})\
16    .merge(neighborhoods_df)
17
18# convert the merged DataFrame into a GeoDataFrame and assign the original
19# geographic coordinate system (WGS84, EPSG:4326).
20# This ensures that the geometry column is recognized as spatial data
21# before transforming it to a projected CRS (e.g., Web Mercator EPSG:3857).
22combined_df = gpd.GeoDataFrame(
23    combined_df,
24    geometry='geometry',
25    crs='EPSG:4326'
26)
 1# reproject data to Web Mercator (EPSG:3857) for compatibility with web basemaps,
 2# then plot neighborhood polygons colored by total homicide count.
 3ax = combined_df\
 4    .to_crs(epsg=3857)\
 5    .plot(
 6        figsize=(10, 10),
 7        column="HOM_TOTAL_COUNT",  # variable used to determine color of each polygon
 8        scheme="FisherJenks",  # statistical classification method (natural breaks)
 9        cmap="Reds",
10        alpha=0.5,
11        legend=True,  # Show legend
12        legend_kwds={'loc': 'lower right', 'title': 'Total Homicide Count (2004-2025)'},
13        edgecolor='black'  # Edge color for better visibility
14    )
15
16# add a web basemap (CartoDB Positron) under the polygons
17cx.add_basemap(ax, source=cx.providers.CartoDB.Positron)
18# remove axes for a cleaner map visualization
19ax.set_axis_off()
20plt.show()

Similarly, there is a method to create an interactive map.

1combined_df.explore(
2    column="HOM_TOTAL_COUNT",
3    cmap="Reds",
4    style_kwds={'color': 'grey'},
5)

Spatial Statistics: Moran’s I and LISA

In the next part of this tutorial, we’ll move from visualization to spatial statistics, focusing on two key measures: Moran’s I and Local Moran coefficients (LISA). These statistics help us understand whether homicide counts are spatially clustered, dispersed, or randomly distributed across Toronto’s neighborhoods. For that, we will use the package ESDA: Exploratory Spatial Data Analysis.

Moran’s I is a global measure of spatial autocorrelation. A positive value indicates that similar values (e.g., high homicide counts) tend to occur near each other, while a negative value suggests a checkerboard-like pattern where high values are surrounded by low values. Values near zero imply no meaningful spatial pattern. Local Moran statistics take this a step further by identifying specific neighborhoods that contribute most to the overall pattern. They highlight “hot spots” (high values surrounded by high values), “cold spots,” and potential spatial outliers. These tools allow us to quantify and map spatial patterns rather than relying on visual intuition alone.

To compute Moran’s I and Local Moran statistics, we need a way to formally describe which neighborhoods are considered neighbors. This is where a contiguity-based spatial weights matrix comes in. By defining areas as neighbors when they share a border or corner (Queen contiguity), we capture real-world spatial relationships, or areas that touch are more likely to influence each other. These weights tell the statistical model how observations are connected in space, allowing Moran’s I to evaluate whether similar homicide values cluster among adjacent neighborhoods. Without a spatial weights matrix, the analysis would treat neighborhoods as independent, ignoring the geographic structure that is central to spatial autocorrelation.

1# create a Queen contiguity spatial weights matrix
2# neighborhoods are considered neighbors if they share a border OR a corner.
3w = Queen.from_dataframe(combined_df, geom_col='geometry', ids='AREA_NAME')
4
5# row-standardize the weights so each row sums to 1
6w.transform = 'R'
1# select the variable for which we want to measure spatial autocorrelation
2# in our case it is total homicide counts per neighborhood.
3y = combined_df["HOM_TOTAL_COUNT"]
4
5# compute Moran's I statistic using the chosen variable and weights matrix
6mi = Moran(y, w)
7
8print(f"Moran's I: {mi.I:.3f}")
9print(f"p-value: {mi.p_sim:.3f}")
1Moran's I: 0.295
2p-value: 0.001

For our dataset, the global Moran’s I value is 0.295 with a p-value of 0.001. This indicates a statistically significant positive spatial autocorrelation in homicide counts across Toronto’s neighborhoods. In practical terms, neighborhoods with high homicide counts tend to be located near other high-count neighborhoods, and the same is true for low-count areas. The low p-value confirms that this pattern is unlikely to be due to random chance and reflects a meaningful spatial structure in the data.

After computing the global Moran’s I, the next step is to explore local spatial autocorrelation using the Local Moran statistic (LISA). This helps identify where clusters occur rather than whether clustering exists overall.

 1# compute Local Moran's I for each neighborhood
 2li = Moran_Local(y, w)
 3
 4# store results in the dataframe
 5combined_df["local_I"] = li.Is  # local Moran statistic for each polygon
 6combined_df["p_val"] = li.p_sim  # local p-values
 7
 8# extract cluster quadrant codes:
 9# 1 = High-High, 2 = Low-High, 3 = Low-Low, 4 = High-Low
10combined_df["quadrant"] = li.q
11cluster_types = {1: 'HH', 2: 'LH', 3: 'LL', 4: 'HL'}
12combined_df["quadrant_label"] =  combined_df["quadrant"].apply(lambda x: cluster_types[x])

The code below visualizes only the statistically significant Local Moran clusters (p < 0.05). First, the dataset is filtered to include only neighborhoods where the Local Moran p-value indicates a meaningful spatial pattern.

 1ax = combined_df\
 2    .query("p_val < .05")\
 3    .to_crs(epsg=3857)\
 4    .plot(
 5        figsize=(10, 10),
 6        column="quadrant_label",  # variable used to determine color of each polygon
 7        categorical=True,
 8        cmap="Set1",
 9        alpha=0.5,
10        legend=True,  # Show legend
11        legend_kwds={'loc': 'lower right'},
12        edgecolor='black'  # Edge color for better visibility
13    )
14
15# add a web basemap (CartoDB Positron) under the polygons
16cx.add_basemap(ax, source=cx.providers.CartoDB.Positron)
17# remove axes for a cleaner map visualization
18ax.set_title("Local Moran Cluster Map (LISA)")
19ax.set_axis_off()
20plt.show()

The resulting Local Moran cluster map highlights neighborhoods that exhibit statistically significant spatial clustering of homicide counts. High–High (HH) areas represent neighborhoods with high homicide counts that are surrounded by similarly high-count neighbors, true “hot spot clusters.” In contrast, Low–Low (LL) areas indicate clusters of neighborhoods with consistently low homicide counts. The High–Low (HL) and Low–High (LH) categories identify spatial outliers: HL neighborhoods have high homicide counts while their neighbors have low values, whereas LH neighborhoods have low counts surrounded by higher-intensity areas.

Conclusions

In this tutorial, we explored spatial crime analysis using Python with publicly available homicide data from the Toronto Police Service. We demonstrated several key steps in analyzing and visualizing crime patterns:

  1. Data acquisition and preparation – Importing data via API requests, converting coordinates to a projected system, and merging with neighborhood polygons for spatial aggregation.
  2. Exploratory spatial analysis – Mapping point locations, calculating mean centers and standard deviational ellipses, and creating KDE surfaces to identify areas with high concentrations of incidents.
  3. Neighborhood-level analysis – Aggregating crime counts to administrative units and creating choropleth maps to compare neighborhoods.
  4. Spatial statistics – Computing global Moran’s I to assess overall spatial autocorrelation, and Local Moran (LISA) to detect significant clusters and spatial outliers.

These analyses illustrate the power of Python for integrating GIS, statistics, and data visualization in a reproducible workflow. Unlike purely GUI-based tools, Python allows analysts to automate data processing, customize visualizations, and apply advanced spatial statistics, all within a single environment. Moving forward, this workflow can be extended to include space–time analysis, hotspot detection using HDBSCAN or Gi*, spatial regression, and predictive modeling, providing even deeper insights for researchers, policy makers, and law enforcement agencies.