Jenna Tingum
Idle

Synopsis

Through the Data x Power fellowship, this project was developed to solve an issue within the progressive tech space - how can we easily group conversations for follow-up without adding extra steps for organizers? Throughout this project’s development though, a broader question arose around the potential for AI usage within progressive spaces and whether organizers would be willing to engage with AI tech. This final project, in addition to the model research, serves as a model for how organizers and technologists can develop tools together to ensure greater buy-in and longer-term use.

DxP

In 2021, Ford Foundation partnered with re:power Fund to create a space to build collective strategy and innovation through a 10-month fellowship for movement-centered data experts. Each year, a new cohort of twelve fellows are selected to work on a project that will address network-wide data and technology issues. They undergo a series of skill-based trainings, receive a mentor, and have access to funding that enables exploration, experimentation and completion of the project.



About This Project


The Problem

When organizers talk to their communities, they often take notes about their interactions to document and facilitate follow-up outreach. But many tools don’t allow for easy search terms to group these folks by their key issue areas. For example, if a community member is talking to an organizer about the affordability of their prescriptions and healthcare generally during this current administration, how can we quickly tag this person as caring about “healthcare” for future follow-up in upcoming program?

The Process

Many CRM tools already exist to house organizer notes, so the solution will not be developing a new tool. Instead, the solution here will be presenting the research and proven methods to implement this concept into existing tools. Knowing that this solution will require use of Artificial Intelligence and, through personal experience, that AI is a divisive topic among progressives, an additional point of research was added to understand how implementation of a tool like this would work. Ie. would organizers be too skeptical of AI to establish enough support for this to work?

The Solution

Presented on this site is a comparative analysis of different methods of Topic Modeling using Natural Language Processing (NLP) that could create tags of organizer notes and a demo of all proposed methods. The recommendation is implementing this into existing movement CRMs using a layered approach of at least 2 of the 3 NLP methods.

Project Demo

Below is a demo of all 5 methods explored in the Project Details page. Input an example of a note that an organizer might make after a conversation and see if it accurately categorizes the constituent’s main issue or issues. Find a few sample field notes below, but you’re encouraged to test the models’ limitations with your own examples.



Sample Notes

  • Talked to Lou at the door. He supports our candidate for state house but it’s clear his top issue is affordability. Talked about grocery and gas prices rising plus childcare costs with a baby on the way. Gave some resources on local childcare and charities with lower-cost supplies.
  • Diabetic, insulin costs rising is taking a toll on her and her family. Interested in our international cuisines event so left some information with her and will follow-up.
  • Talked to him on Monday 3/16 morning, isn’t affected by the recent rent hikes in the area since he owns his condo. He says his main issue is the 2nd amendment and the worry that progressive leadership might take away his guns. Feels strongly about gun ownership, big NRA member
  • Has a lot of friends involved in Planned Parenthood work. Interested in getting involved in ballot initiative work for this year. Maybe can connect us with her network.
  • Has a 7 and 9 year old and thinking about moving schools because they’re worried about the quality of education at the public school. Lots of religion being pushed that makes them uncomfortable. Was asking about our candidate’s stance on school choice and resources about vouchers.
  • Talked to Christine on Saturday afternoon. Interested in volunteering with us. Cares a lot about sustainability and wants to learn about composting. Needs more voter education on voting by mail.
Block 1
Block 1 Description
Ctrl+Enter to run


Evaluation Metrics

Ease of Implementation

This model must be easy to implement and edit as needs and issues change.


  • Set-Up
  • Cost
  • Run time
  • Buy-in

Adaptability

Organizer’s field notes can vary wildly - this tool needs to be able to account for variety.


  • Typos
  • Abbreviations
  • Slang
  • Multilingual Input

Smart Interpretation

This model needs to be able to understand context, and avoid extraneous information.


  • Implied Topics
  • Filter extraneous input

Project Details

Instead of developing an entirely new tool, research is compiled below so that existing field tools can implement a similar model into their systems. Each avenue explored has its own demo to try out the model’s performance followed by some comparative analysis.


Logistics

Implementing Topic Modeling using Natural Language Processing will ideally look something like:
  1. Start with a list of pre-determined candidates for tags, like "Economy", "Healthcare", or "Immigration".
  2. Model runs in the background and assigns issue tags to the note based on the content.
  3. Tags, one or two if relevant, are assigned to summarize the consituent's current concerns.


Issue Tags: Candidates

All the tags chosen for this project, using this Gallup Poll’s list of most important issues influencing the 2024 election as a starting point. A predetermined list of keyword candidates were chosen instead of allowing the model to assign topics randomly to ensure data cleanliness.
  • Economy
  • Democracy
  • Terrorism
  • Immigration
  • Education
  • Healthcare
  • Gun Policy
  • Abortion
  • Taxes
  • Crime
  • Foreign Affairs
  • Energy Policy
  • Race Relations
  • LGBTQ+ Rights
  • Housing

Approaches + Models

Keyword Extraction (KeyBERT)

The first and simplest approach to this problem is keyword extraction. It requires the least amount of processing power and coding effort, but isn’t very intelligent - needing the exact keyword to be present in the note text to match the list of keywords. Here are two short and relatively easy examples to demonstrate the model’s pros + cons. .
KeyBERT Demo Keyword Extraction
Description
Ctrl+Enter to run

Examples

“Talked to Lou at the door. He supports our candidate for state house but it’s clear his top issue is affordability. Talked about grocery and gas prices rising plus childcare costs with a baby on the way. Gave some resources on local childcare and charities with lower-cost supplies.”
It’s clear the main tag here should be “economy”, but the model turns up nothing. Now, try changing the word “affordability” to “the economy” and it quickly returns the correct value.

Pros & Cons

Pros

  • Simple Implementation
  • Free

Cons

  • Can’t interpret note as a whole thought, only searches for a word
  • Not a real option for nuanced organizer conversations

Pre-existing LLM (Claude)

On the opposite end of this model spectrum is a fully developed LLM that’s pre-existing and pre-packaged. This is a far more intelligent approach but requires a lot more computing power, cost, and energy. It’s also likely to have a steeper buy-in process with organizers.
My colleague Aaliyah Wood conducted a survey to gather some anecdotal feedback from organizers on using AI for field work. Here are some of their thoughts:
  • "It can definitely be useful and I do personally use it sometimes, but the litany of moral issues around it (impact on the workforce, slop, intellectual property issues, energy and environmental impacts, data privacy issues, etc.) make me feel generally pretty negative about it.
  • "From what I know about data centers and how harmful they are to the surrounding community, how much water they require, the data privacy issues around it etc, I don’t have the best opinion about it.""
  • "Bad for the environment, can stunt human learning, needs extensive human oversight. Unsure of its net positives on the world."

Aside from organizer buy-in, AI companies generally have different, more corporate priorities that often lead to differing ethical considerations than those of the progressive movement. For example, ChatGPT, another leading LLM, recently struck a deal with the Trump administration’s Pentagon to provide them with their data and tools. Claude was chosen in this case because of its established use cases in the progressive community (for example here), its more neutral brand perception, and its refusal to adhere to these same Trump administration asks that ChatGPT did.
Ultimately, though, corporate LLMs remain ethically ambiguous and ever-changing - privacy, governance, and corporate social responsibility must be considered.

Unsurprisingly, an LLM like Claude can create correct output to simple organizer notes. Try this example to test it: “Has a 7 and 9 year old and thinking about moving schools because they’re worried about the quality of education at the public school. Lots of religion being pushed that makes them uncomfortable. Was asking about our candidate’s stance on school choice and resources about vouchers.”
Claude Demo LLM
Descriptionss
Ctrl+Enter to run

Pros & Cons

Pros

  • Simple to implement
  • Stronger understanding of nuance and overall themes
  • Can attempt to handle spanglish or english misspellings

Cons

  • Costly, especially to ensure data privacy
  • Unknown Environmental Impact
  • Steeper organizer buy-in

TingTag: Unsupervised Machine Learning / NLP Options

It’s clear, then, that a happy medium is needed that incorporates the simplicity of keyword extraction and the intelligence of an LLM. We’ll explore three options below - SpaCy, Zero-Shot Classification, and Sentence Transformers.

Spacy

SpaCy is an NLP model developed by MIT that acts most similarly to keyword extraction of our three options. This model uses three different methods to understand the input and produce one or two keyword tags -
3 Layers - each attempt to match lists corresponding to each keyword
  1. Phrase Matcher - looks for exact matches of the keywords or phrases in the note
  2. Token + "Fuzzy" Matcher - looks for matches of the individual words in the keywords, allowing for some flexibility with typos and word forms (e.g., "affordable" vs. "affordability")
  3. Vector Similarity - looks at the overall meaning of the note and compares it to the meaning of the keywords using word vectors, allowing for more nuanced matches
    Ctrl+Enter to run

    Pros & Cons

    Pros

    • Speedy
    • Most transparent/explainable
    • Handles typos well

    Cons

    • These “trigger” lists need to be maintained
    • Doesn’t handle implied topics or context-dependent topics as well

    Zero Shot Classification

    Zero-Shot Classification is a type of NLP model that can classify text into categories it hasn’t seen before. It uses Natural Language Inference (NLI) to determine logical implication of a note related to the keyword candidate list. It asks the question:
    “Does this note have to do with [keyword tag] ?”
    Then it scores each result and outputs the highest score(s).

    Ctrl+Enter to run

    Pros & Cons

    Pros

    • Understands implied topics
    • Simple set-up

    Cons

    • Bulky model options
    • Slower run time
    • Doesn’t handle typos well

    Sentence Transformers

    Sentence Transformers is a type of NLP model that tests similarity of the field note and each of the keyword candidates. Turning the notes into vector embeddings, it measures the cosine similarity of the note vector and each keyword vector, outputting the keyword(s) with the highest similarity score(s).

    Ctrl+Enter to run

    Pros & Cons

    Pros

    • Understands implied topics
    • Small Model size
    • Can handle multilingual notes with a different model input

    Cons

    • Doesn’t handle typos well

Recommendations

Developers looking to implement a model like this should consider layering these approaches to find a happy medium of accuracy, cost, and simplicity. For example, using SpaCy as a first layer to catch the low-hanging fruit and then passing unclassified notes to the Sentence Transformer model for further analysis could be a good way to maximize accuracy while minimizing cost and complexity.
Other important takeways from this project development: