Abstract

This is a study of gerrymandering in Alabama. We will test three methods of shape-based compactness scores, assess representativeness of districts based on prior presidential elections and race. We will then extend prior studies by calculating representativeness of the convex hull of district polygons.

Study Metadata

  • Key words: Alabama, gerrymandering, compactness, convex hull, political representation
  • Subject: Social and Behavioral Sciences: Geography: Geographic Information Sciences
  • Date created: 2025-02-17
  • Date modified: 2025-02-17
  • Spatial Coverage: Alabama OSM:161950
  • Spatial Resolution: Census Block Groups
  • Spatial Reference System: EPSG:4269 NAD 1983 Geographic Coordinate System
  • Temporal Coverage: 2020-2024 population and voting data
  • Temporal Resolution: Decennial census

Study design

This is an original study based on literature on gerrymandering metrics.

It is an exploratory study to evaluate usefulness of a new gerrymandering metric based on the convex hull of a congressional district and the representativeness inside the convex hull compared to the congressional district.

Materials and procedure

Computational environment

I plan on using package … for …

Data and variables

WE plan on using data sources …. , … ….

Several data layers are compiled and provided in the districts geopackage.

districts_file <- here("data", "raw", "public", "districts.gpkg")
st_layers(districts_file)
## Driver: GPKG 
## Available layers:
##    layer_name geometry_type features fields crs_name
## 1 districts21 Multi Polygon        7      4   WGS 84
## 2 districts23 Multi Polygon        7      4    NAD83
## 3 precincts20 Multi Polygon     1972      8    NAD83

Districts 2021

The congressional districts enacted in 2021 were used in the 2022 mid-term elections, and then deemed unconstitutional gerrymanders by the Supreme Court.

Load the districts.

districts21 <- st_read(districts_file, layer="districts21")
## Reading layer `districts21' from data source 
##   `C:\GitHub\josephholler\OR-Gerrymander-Alabama\data\raw\public\districts.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 7 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.47323 ymin: 30.14443 xmax: -84.88825 ymax: 35.00803
## Geodetic CRS:  WGS 84

Map the districts

tmap_mode(mode = "plot")
## ℹ tmap mode set to "plot".
districts21map <- districts21 |> 
  tm_shape() +
  tm_polygons(fill_alpha = 0,
              col = "red") +
  tm_labels(text = "DISTRICT",
          col="red",
          bgcol = "white",
          bgcol_alpha = 0.5,
          on_surface = TRUE,
          just = c("center", "center")
          )
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_text()`: migrate the layer options 'just' to 'options =
## opt_tm_text(<HERE>)'[tm_text()] Argument `on_surface` unknown.
districts21map

Precincts 2020

includeMarkdown(here("data", "metadata", "precincts20.md"))
  • Title: Voting Precincts 2020
  • Abstract: Alabama voting data for 2020 elections by precinct.
  • Spatial Coverage: Alabama
  • Spatial Resolution: Voting precincts
  • Spatial Reference System: EPSG 4269 NAD 1983 geographic coordinate system
  • Temporal Coverage: precincts used for tabulating the 2020 census
  • Temporal Resolution: annual election
  • Lineage: Saved a sgeopackage format. Processing prior to download is explained in al_vest_20_validation_report.pdf and readme_al_vest_20.txt
  • Distribution: Data available at Redistricting Data Hub with free login
  • Constraints: Permitted for noncommercial and nonpartisan use only. Copright and use constrains explained in redistrictingdatahub_legal.txt
  • Data Quality: State any planned quality assessment
  • Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
    • Label: variable name as used in the data or code
    • Alias: intuitive natural language name
    • Definition: Short description or definition of the variable. Include measurement units in description.
    • Type: data type, e.g. character string, integer, real
    • Accuracy: e.g. uncertainty of measurements
    • Domain: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
    • Missing Data Value(s): Values used to represent missing data and frequency of missing data observations
    • Missing Data Frequency: Frequency of missing data observations: not yet known for data to be collected
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
VTDST20 Voting district ID
GEOID20 Unique geographic ID
G20PRETRU total votes for Trump in 2020
G20PREBID total votes for Biden in 2020

Decennial Census

We acquire decennial census data in block groups using the tidycensus package. First, query metadata for the pl public law data series.

census_metadata_file <- here("data", "metadata", "census2020pl_vars.csv")
if(file.exists(census_metadata_file)){
  census2020pl_vars <- read.csv(census_metadata_file)
} else {
  census2020pl_vars <- load_variables(2020, "pl")
  write.csv(census2020pl_vars, here("data", "metadata", "census2020pl_vars.csv"))
}

The issue in the 2023 court cases on Alabama’s gerrymandering was a racial gerrymander discriminating against people identifying as Black or African American. Therefore, we will analyze people of voting age (18 or older) identifying as Black and or African as one race in any combination with other races. This data is found in table P3.

Query the public law data series table P3 on “race for the population 18 years and over”.

blockgroup_file <- here("data", "raw", "public", "block_groups.gpkg")

# if the data is already downloaded, just load it
# otherwise, query from the census and save
if(file.exists(blockgroup_file)){
  blockgroups <- st_read(blockgroup_file)
} else {
  blockgroups <- get_decennial(geography = "block group",
                               sumfile = "pl",
                               table = "P3",
                               year = 2020,
                               state = "Alabama",
                               output = "wide",
                               geometry = TRUE,
                               keep_geo_vars = TRUE)
  st_write(blockgroups, blockgroup_file)
}
## Reading layer `block_groups' from data source 
##   `C:\GitHub\josephholler\OR-Gerrymander-Alabama\data\raw\public\block_groups.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 3925 features and 83 fields (with 1 geometry empty)
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.47323 ymin: 30.22333 xmax: -84.88908 ymax: 35.00803
## Geodetic CRS:  NAD83

Prior observations

We have previously investigated the compactness scores of Alabama’s congressional districts as well as the percentage of Biden voters from the 2020 elections and the percentage of the population 18 years or older that is not Hispanic and is Black or African American.

We have never calculated the minimum bounding circle or convex hulls of Alabama’s congressional districts.

Bias and threats to validity

This study is explicitly an investigation to the modifiable areal unit problem. Aspects of the study are extremely sensitive to the combination of edge effects and scale, whereby complex borders formed by natural features, e.g. coastlines or rivers, vary greatly in perimeter depending on the scale of analysis. We hope that in part, this study establishes a method that is more robust (less sensitive) to the threats to validity caused by scale and edge effects in studies of gerrymandering and district shapes.

Data transformations

Districts 2021

Transform Districts into NAD 1983 coordinate system, and calculate the percentage of population identifying as Black.

districts21 <- districts21 |> st_transform(4269) |> 
  mutate(pctBlack = round(BLACK / POPULATION * 100, 1))

Block groups census data

Census data needs to be transformed from the WGS 1984 geographic coordinate system to the NAD 1983 geographic coordinate system.

blockgroups <- st_transform(blockgroups, 4269)

Find the total of people identifying as Black or African American as one race or any combination of multiple races. First, make a list of all the variables inclusive of people identifying as Black or African American.

black_vars <- census2020pl_vars |> 
  dplyr::filter(str_detect(name, "P3"),
                str_detect(label, "Black")) |> 
  select(-concept)

black_vars |> kable()
X name label
151 P3_004N !!Total:!!Population of one race:!!Black or African American alone
158 P3_011N !!Total:!!Population of two or more races:!!Population of two races:!!White; Black or African American
163 P3_016N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; American Indian and Alaska Native
164 P3_017N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; Asian
165 P3_018N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; Native Hawaiian and Other Pacific Islander
166 P3_019N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; Some Other Race
174 P3_027N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; American Indian and Alaska Native
175 P3_028N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; Asian
176 P3_029N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; Native Hawaiian and Other Pacific Islander
177 P3_030N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; Some Other Race
184 P3_037N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; American Indian and Alaska Native; Asian
185 P3_038N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander
186 P3_039N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; American Indian and Alaska Native; Some Other Race
187 P3_040N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; Asian; Native Hawaiian and Other Pacific Islander
188 P3_041N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; Asian; Some Other Race
189 P3_042N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; Native Hawaiian and Other Pacific Islander; Some Other Race
195 P3_048N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; American Indian and Alaska Native; Asian
196 P3_049N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander
197 P3_050N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; American Indian and Alaska Native; Some Other Race
198 P3_051N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; Asian; Native Hawaiian and Other Pacific Islander
199 P3_052N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; Asian; Some Other Race
200 P3_053N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; Native Hawaiian and Other Pacific Islander; Some Other Race
205 P3_058N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander
206 P3_059N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; American Indian and Alaska Native; Asian; Some Other Race
207 P3_060N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander; Some Other Race
208 P3_061N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race
211 P3_064N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander
212 P3_065N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; American Indian and Alaska Native; Asian; Some Other Race
213 P3_066N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander; Some Other Race
214 P3_067N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race
216 P3_069N !!Total:!!Population of two or more races:!!Population of five races:!!Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race
218 P3_071N !!Total:!!Population of two or more races:!!Population of six races:!!White; Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race

Next, calculate new columns. Black is a sum of all 32 columns shown above, in which any of the racial categories by which someone identifies is Black or African American.
Total is a copy of the population 18 years or over, variable P3_001N.
PctBlack is calculated as Black / Total * 100
CheckPct is calculated as the percentage of the population 18 years or older that is either white of one race only (P3_003N) or Black or African American as calculated above. In Alabama, we can expect that this will be close to 100% for most block groups, and should never exceed 100%.

blockgroups_calc <- blockgroups |> 
  rowwise() |> 
  mutate(Black = sum(c_across(all_of(black_vars$name)))) |> 
  ungroup() |> 
  mutate(bgarea = st_area(geom),
         Total = P3_001N,
         PctBlack = Black / Total * 100,
         CheckPct = (Black + P3_003N) / Total * 100
         ) |> 
  select(GEOID, bgarea, Black, Total, PctBlack, CheckPct)

Save the results as blockgroups_calc.gpkg

st_write(blockgroups_calc, 
         here("data", "derived", "public", "blockgroups_calc.gpkg"),
         append=FALSE)

Map the percentage of the population 18 or over that is Black or African American.

tmap_mode(mode = "plot")
blkgrp_black_map <- tm_shape(blockgroups_calc) + 
  tm_polygons(
    fill = "PctBlack",
    col_alpha = 0.2,
    lwd = 0.1,
    col = "grey90"
  )

blkgrp_black_map

Make an interactive map of the 2021 districts over the black population

tmap_mode(mode = "view")
## ℹ tmap mode set to "view".
blkgrp_black_map +
  districts21map
## Registered S3 method overwritten by 'jsonify':
##   method     from    
##   print.json jsonlite
## Variable bgcol and bgcol_alpha not supported by view mode

Analysis

Estimate the white and black voting age populations using AWR with block groups. Why do this when POPULATION, BLACK, and WHITE variables are already in the table? First, this is the total population, but we should care more about the voting age population. Second, we may want to categorize and calculate BLACK differently from the state of Alabama.

It turns out that R optimizes the first dataset in a spatial query or overlay, with a spatial index, and not the second. Therefore, add the more complex data to st_intersection first, and you’ll see remarkably different run times.

Spatial indices in R: https://r-spatial.org/r/2017/06/22/spatial-index.html

districts21_estimates <- st_intersection(blockgroups_calc, districts21) |> 
  mutate(
    awTot = Total * as.numeric(st_area(geom) / bgarea),
    awBlack = Black * as.numeric(st_area(geom) / bgarea)
  ) |> 
  st_drop_geometry() |> 
  group_by(DISTRICT) |> 
  summarize(bgTotal = sum(awTot),
            bgBlack = sum(awBlack))

districts21_join_bg <- districts21 |> 
  left_join(districts21_estimates, by = "DISTRICT") |> 
  mutate(pctBlackbg = round(bgBlack / bgTotal * 100, 1))

Report results. We find very similar percentages of Black or African American people.

districts21_join_bg |> st_drop_geometry() |> kable()
DISTRICT POPULATION WHITE BLACK pctBlack bgTotal bgBlack pctBlackbg
1 717754 461324 186921 26.0 557342.4 142843.24 25.6
2 717755 433244 217392 30.3 558173.9 168697.51 30.2
3 717754 479432 176953 24.7 564208.0 141086.63 25.0
4 717754 582698 51929 7.2 556586.0 42949.26 7.7
5 717754 499707 124642 17.4 561381.6 101418.26 18.1
6 717754 498843 138019 19.2 551752.6 104977.72 19.0
7 717754 265204 400306 55.8 567451.7 312326.83 55.0
districts21_estimates <- st_intersection(blockgroups_calc, st_convex_hull(districts21)) |> 
  mutate(
    awTot = Total * as.numeric(st_area(geom) / bgarea),
    awBlack = Black * as.numeric(st_area(geom) / bgarea)
  ) |> 
  st_drop_geometry() |> 
  group_by(DISTRICT) |> 
  summarize(chTotal = sum(awTot),
            chBlack = sum(awBlack))

Join convex hull estimates to Districts with blockgroup estimates.

districts21_join_ch <- districts21_join_bg |> 
  left_join(districts21_estimates, by = "DISTRICT") |> 
  mutate(pctBlackch = round(chBlack / chTotal * 100, 1),
         diffPct = pctBlackbg - pctBlackch,
         absdiffPct = abs(diffPct))

Calculate compactness scores based on:

  • the area and perimeter
  • the area and the area of the convex hull
  • the area and the area of the minimum bounding circle

This block takes some time to run due to the st_minimum_bounding_circle function.

Note: To knit, will we need to replace st_perimeter() with st_length(st_cast(geom, "MULTILINESTRING"))?

districts21_results <- districts21_join_ch |> 
  mutate(
    darea = st_area(geom),
    dperim = st_length(st_cast(geom, "MULTILINESTRING")),
#    dperim2 = st_perimeter(geom),
    compact_shp = round( as.numeric((4 * pi * darea) / dperim^2), 2),
    compact_hull = round( as.numeric(darea / st_area(st_convex_hull(geom))), 2),
    compact_circ = round( as.numeric(darea / st_area(st_minimum_bounding_circle(geom))), 2)
  )

Results

Correlation matrix and small plots for gerrymandering indicators

districts21_results_cor <- districts21_results |> 
  st_drop_geometry() |> 
  select(pctBlackbg,
         diffPct,
         absdiffPct,
         compact_shp,
         compact_hull,
         compact_circ)

districts21_results_cor |> cor() |> kable()
pctBlackbg diffPct absdiffPct compact_shp compact_hull compact_circ
pctBlackbg 1.0000000 0.8661204 0.3920151 -0.0932363 0.1063590 0.7166688
diffPct 0.8661204 1.0000000 0.1621541 0.1633225 0.2565140 0.3900650
absdiffPct 0.3920151 0.1621541 1.0000000 -0.6451778 -0.7348077 0.2726012
compact_shp -0.0932363 0.1633225 -0.6451778 1.0000000 0.8326192 -0.2836411
compact_hull 0.1063590 0.2565140 -0.7348077 0.8326192 1.0000000 -0.0210585
compact_circ 0.7166688 0.3900650 0.2726012 -0.2836411 -0.0210585 1.0000000
districts21_results_cor |> pairs()

Plot representational difference against compactness

Scatterplot with (absolute) difference in representation on x axis and compactness on y axis. Plot the three different compactness scores simultaneously with different colors. Symbolize the districts with different shapes.

districts21_results_plot <- districts21_results |> 
  st_drop_geometry() |> 
  select(DISTRICT, pctBlack, absdiffPct, compact_shp, compact_hull, compact_circ) |> 
  pivot_longer(cols = starts_with("compact"))

districts21_results_plot |> ggplot() +
  aes(x = absdiffPct, y = value) +
  geom_smooth(method="lm", col = "grey30") +
  geom_label(aes(label = DISTRICT, fill = pctBlack)) +
  scale_fill_distiller(type = "div", palette = "PRGn") +

  facet_wrap(~name)
## `geom_smooth()` using formula = 'y ~ x'

There is a negative relationship between convex hull compactness and convex hull difference. There is a negative relationship between convex hull compactness and convex hull difference. There is a negative relationship between minimum bounding circle compactness and convex hull representational difference.
The exceptions are districts 5 and 7. District 7 really is gerrymandered (packed African American), but the minimum bounding circle method does not find it so. District 5 is not really gerrymandered, even though the minimum bounding circle does find it so.

Shape and convex hull exhibit a positive correlation.
Shape and minimum bounding circle exhibit a positive correlation, with the exception of District 5. Convex hull and minimum bounding circle exhibit a positive correlation, with the exception of District 5.

District 5 is a long, but otherwise compact shape.

tm_shape(districts21_results) +
  tm_polygons(fill = "pctBlackbg") +
  tm_text("DISTRICT")

Discussion

Describe how the results are to be interpreted vis a vis each hypothesis or research question.

Integrity Statement

Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research. If a prior registration does exist, explain the rationale for revising the registration here.

Acknowledgements

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI: 10.17605/OSF.IO/W29MQ

References

Cheng, Joe, Carson Sievert, Barret Schloerke, Winston Chang, Yihui Xie, and Jeff Allen. 2024. Htmltools: Tools for HTML. https://github.com/rstudio/htmltools.
Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://here.r-lib.org/.
Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
———. 2024a. Lwgeom: Bindings to Selected Liblwgeom Functions for Simple Features. https://r-spatial.github.io/lwgeom/.
———. 2024b. Sf: Simple Features for r. https://r-spatial.github.io/sf/.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With applications in R. Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Tennekes, Martijn. 2018. tmap: Thematic Maps in R.” Journal of Statistical Software 84 (6): 1–39. https://doi.org/10.18637/jss.v084.i06.
———. 2025. Tmap: Thematic Maps. https://github.com/r-tmap/tmap.
Walker, Kyle, and Matt Herman. 2025. Tidycensus: Load US Census Boundary and Attribute Data as Tidyverse and Sf-Ready Data Frames. https://walker-data.com/tidycensus/.
Wickham, Hadley. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://tidyverse.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2024. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, JJ Allaire, and Jeffrey Horner. 2024. Markdown: Render Markdown with Commonmark. https://github.com/rstudio/markdown.