Guest Blog: Facilitating Redevelopment of VAD Properties in Tulsa

Katrina Henderson and Shawnda Henderson, both incoming juniors at TU studying finance and real estate, are working as team leads in the 2025 Tulsa Undergraduate Research Challenge (TURC), directed by professors Meagan McCollum and Cayman Seagraves. Their work will continue to progress through summer 2025. This first post highlights research from the first half of their summer work.

Mark your calendars for the final presentation of the TURC Housing Policy Research team on August 8th from 11:30am-1pm. You are invited to join us for a lunch and learn event as the students showcase their research findings in Helmerich Hall, Room 219 (2900 E 5th Street).

Facilitating Redevelopment of VAD Properties in Tulsa:
An AI-Driven Peer City Analysis Tool

by Katrina E. Henderson & Shawnda M. Henderson

Every city must manage vacant, abandoned, and deteriorated (“VAD”) properties, often referred to as “blight.” (But because the latter can be a pejorative, misused term, we will try to stick with VAD.) Tulsa is no exception, meaning Tulsan quality of life could improve through better identification and then remediation of such conditions, including in its residential neighborhoods.

We hope to assist in those efforts by developing software tools useful in VAD identification. And we aim to do so in two steps. First, leveraging multiple systems of generative AI, we have coded Python solutions to identify Tulsa peer cities. By researching the VAD management experience of those most relevant peers – what has worked, what has not – we can have a better sense of what Tulsa ought to try.

Second, working with that peer data, with Tulsa data, and integrating both into novel machine learning algorithms, we can help Tulsa officials identify where they may wish to prioritize their efforts.

This post focuses on the initial step of that process, which we have completed: developing peer city identifiers.

One Goal – Two Models

While we have some background in coding and in peer city analysis, we knew the smart 2025 way to develop our desired peer city generator was to leverage AI. And since multiple systems might guide us to different strengths, we decided to use two: Open AI’s ChatGPT and Anthropic’s Claude. While some readers will have extensive experience using them to code, this was – for each of us – a new experience. So, in case our experiences can be helpful to others, we will briefly explain not only our results, but also our process.

First Peer City Generator – ChatGPT (model GPT-4o)

Our initial prompt framed the project goal: helping to reduce residential VAD within Tulsa. ChatGPT responded with an encouraging – meaning sensible – reply, and so we narrowed the discussion to the first task: peer city identification.

The initial step was to identify variables, or characteristics, on which to base peer city selections. ChatGPT not only provided a sensible list of potentials, but it also responded with potential data sources and statistical analysis techniques for gathering and analyzing the same. While we altered aspects of its strategy based on our outside research into peer city analysis, its framework – as described below – played a crucial role in our development of a peer city identification program coded in Python and then modified and run within Google’s Colabaratory environment. That framework consisted of four main steps: 1) Obtaining a US Census Bureau application programming interface (API) key and then relevant data, 2) standardizing and clustering that data, 3) visualizing the results, and 4) refining the program based on initial results.

We first obtained an API key for the US Census Bureau, allowing us (and our code) to access data from the American Community Survey (ACS), which thus permits updating of peer city variables at any time, ensuring the program’s ability to adapt to future data. Utilizing that API in conjunction with Python code, we then compiled the relevant ACS data for all cities in the United States, before culling it based upon designated population cutoffs. Again following ChatGPT’s recommendation, we then, in Python code, standardized the data using z-scores and utilized k-means clustering, an unsupervised clustering technique that forms groups or ‘clusters’ of meaningfully similar data points. After some tinkering, we chose seven clusters.

ChatGPT also assisted – again through articulating initial Python code — in visualizing our results. We used principal component analysis (PCA), a dimension reduction technique, for that visualization, resulting in the figure below, which shows the seven clusters (numbered 0 to 6) of alike cities and their ‘similarity distances.’

The nearer two points (cities) are to each other, the more those cities have in common. And while Tulsa technically falls within cluster 3, it appears on that cluster’s edge, suggesting it also shares much in common with cities of neighboring cluster 2. We thus merged Tulsa’s cluster 3 with its second-nearest cluster 2 and, using that merged cluster, calculated Tulsa’s closest peers.

Before discussing those results, we will briefly describe our experience developing our second peer city identification tool, this one developed similarly but using Claude.

Second Peer City Generator – Claude (Sonnet 4)

One might wonder why we replicated our work in a different space, and that’s a fair question. We had no particular reason to doubt the model we developed with ChatGPT’s assistance. Still, replication of results – or lack thereof – would of course increase or decrease our confidence in our peer city modeling. Furthermore, we were simply curious how each generative AI would approach and step through the task; had we more time, we would have also attempted the same with Grok, Gemini, and others.

We began Claude with similar prompts, describing our overall research topic (VAD management in Tulsa) and particular task (a peer city analysis based on relevant demographic variables). And while Claude included k-means clustering among its method options, its preferred take was straight Euclidean distance analysis. We thought a slightly different calculation might be our best result-comparison tool, and so that’s what we did: leveraging Claude as we had ChatGPT, we developed Python code, which we then compiled and ran in Google’s Colaboratory.

Unsurprisingly given the similar prompts, this second Python program begins much the same as the first. As before, the program first calls the US Census Bureau API to obtain the most recent ACS 5-year estimates of relevant demographic variables for all US cities. It then filters those cities based on population cutoffs, before standardizing the variables using z-scores. Those scaled values are then used to calculate the Euclidean distance between Tulsa and each city in the dataset, where lower distances reflect more similar peers. The program then ranks the cities by Euclidean distance, returning the list of the ‘closest’ peer candidates.

And because the model again employs multi-variable analysis, it is difficult to visualize in two-dimensional space. So, we again turned to PCA analysis, leveraging Claude to develop the necessary code. The resulting figure below plots Tulsa and its closest identified peers against the first two principle components.

Naturally, this code too required several rounds of debugging and improvement. For example, one early iteration listed the peer cities using Census API reference codes rather than city names. Similarly, we modified the original PCA graph’s format to better match that produced by our first peer city tool, allowing for more visual consistency. In short, we were impressed with the ability of both ChatGPT and Claude to develop Python code, but would be surprised if most found the very first instance of that code precisely what was desired and, at least for us, it was often easier to dabble directly with the Python than to nudge the generative AI.

Comparison & Analysis

Having developed two peer city generators, we turned to comparing and, ultimately, combining – their methods and results.

We already mentioned the difference in clustering versus straight Euclidean distance. Another divergence was in population cutoffs – not only did the two LLMs set different boundaries to filter the candidate cities, but those boundaries also shifted between code iterations, despite that not being the goal of the iteration. This is, of course, simply a result of the algorithms they each run, and such behavior is (frustratingly) familiar to anyone working with these tools to develop writing or image generation. Thus, for example, when we prompted Claude to revise the code to produce more user-friendly output… Claude also adjusted its population cutoffs. Ultimately, we manually set the same population boundaries for the two models, thereby helping ensure more consistent results; the point is merely that such behavior is something to watch when using contemporary generative AI to code.

Despite the algorithmic differences, there were encouraging similarities in results. For example, both models identified Wichita, Kansas as Tulsa’s closest peer. Additionally, several other cities ranked highly on both lists, including Kansas City, Missouri and Tampa, Florida. All three peers are mapped on the figure below.

We thus compared relevant data from those cities in order to verify their status as potential peers; for example, we compared a number of basic metrics, including total population, median household income, and poverty rate, as shown in the figures below.

Next Steps

While we continue to work to refine our two peer city identifiers, we – and other members of our teams – are also beginning to work next steps, from investigating best-peer VAD-identification and reduction efforts to gathering Tulsa data that we hope to use in our own VAD-identification algorithms. Stay tuned!