Privacy Notice: Student names have been replaced with pseudonyms to protect privacy.

Students Discuss COMPAS

Prefer to listen to the chat? Click play below!

Audio narration is AI-generated from the text transcript and does not represent the students' actual voices.

0:00

Guide September 26, 2025, 4:33 PM

👋 Welcome, everyone!

Our topic is COMPAS. We'll be discussing this statement:

The COMPAS recidivism algorithm is/was biased against Blacks.

Luna agrees with the statement.
Justin disagrees with the statement.

My name is Guide, and I'm here to support your discussion. If you need anything, start your message with "Hey Guide" 🤓

Luna September 26, 2025, 4:47 PM

Hi Justin, nice to meet you! I'm Luna. It seems that we've been matched for the Sway discussion about the COMPAS topic. To be honest, this is my first time using Sway, so I'm trying to adapt to it. From the Guide's suggestion, it is a good idea if we talk about how we'd like to approach this discussion as the first step. Do you have any idea? I'm also okay if you want to discuss the topic directly, starting with your opinion/my opinion

Justin September 26, 2025, 4:51 PM

Hey Luna! Nice to meet you as well :) yes i'm a bit confused about sway as well lol. Not really sure what we're being graded on. But yeah it seems like we are supposed to just start discussing the topic. If it works for you we could both lay out our opinions and then we can discuss each one

Luna September 26, 2025, 5:19 PM

Sure! In my opinion, COMPAS recidivism algorithm is biased against Blacks. From what I got in yesterday's slide, this algorithm doesn't directly ask about race, but it uses proxies that correlate strongly with structural inequalities. Furthermore, ProPublica reported COMPAS systematically overpredicted recidivism for Black people and underpredicted it for White people. Eventhough Northpointe said it is because of different base rates, however a "fair" algorithm must consider not only prediction quality but also social context and justice in outcomes. Since higher recorded recidivism rates for Black people are partly the result of systemic inequality, discrimination, etc. What's your opinion? :)

Justin September 26, 2025, 5:36 PM

Thanks for asking! I think you make some very strong points that i'm looking forward to discussing :). When I first read the prompt for this, my immediate reaction was to agree with the statement. To me, the most compelling supporting evidence for the COMPAS algorithm being biased was the drastically higher false positive (predicted to reoffend but they actually don't) rate for black defendants and the similarly disproportionately high false negative rate for white defendants (predicted not to reoffend but they actually do). However, I believe Brian Hedden, in his article "On Statistical Criteria of Algorithmic Fairness" makes a compelling case for why ensuring equal false-negative rates and equal false-positive rates across ethnic groups make for arbitrary criterion for a fair algorithm. Unfortunately, some groups are statistically more likely to reoffend compared to others. This is undeniably a direct cause of a long history of systematic oppression and discrimination against these groups. These injustices have caused lasting impacts on factors that can unavoidably help predict recidivism rates (less access to education, weaker support systems, etc.) As unfortunate as this is, it does help to explain the lopsided false-negative and false-positive rates across groups. Hedden's 'perfectly fair algorithm' thought experiment puts this in a way that is easy to digest. When the outcome is binary (either an individual reoffends or they don't) variations in base probabilities between groups are bound to cause unequal false-negative and false-positive rates. It doesn't prove unfairness. In fact, Hedden found that the only criteria he could not disprove for devising a fair algorithm was calibration within groups ("for each possible risk score, for the expected percentage of individuals assigned that risk score who are actually positive is the same for each relevant group and is equal in that risk score.") The COMPAS algorithm did uphold this criteria. That is, people who were assigned a risk score of 7 had the same reoffending rate whether they were white or black. To me, this indicates that the algorithm was reasonably fair and unbiased towards ethnic groups.

Justin September 26, 2025, 5:46 PM

To reply to your stance, I had the same concern about COMPAS using proxies to discriminate by race. I feel uncomfortable about that and believe more should be done to obscure the race of defendants. I actually agree that how the algorithm was utilized was biased towards black people. You're completely right that there is vital background information about why recidivism rates are higher for black people than for other groups. This information should not be ignored as it currently is. And I think it makes sense to be more forgiving towards individuals who suffer from the lasting impacts of discrimination. However, I believe the bias in the use of the algorithm was in how it was used. I think the algorithm itself fulfilled its duty in fairly predicting recidivism regardless of ethnicity.

Luna September 26, 2025, 7:09 PM

Thank you for such a thoughtful reply! :) From your opinion, I can see your point of view on this topic with Hedden's argument, especially his point that equal false-positive and false-negative rates might not be the right measure of fairness. Sometimes differences in base rates can naturally cause imbalances. I also agree with you that COMPAS was calibrated within groups, so it was "fair" in the way Hedden defines it. However, there is a part of me that still thinks COMPAS was biased against Black people. Even if Black & White defendants with the same scores are equally likely to reoffend, the distribution of errors matters. Moreover, as I said before, factors like prior neighborhood strongly correlate with systemic discrimination, so it still indirectly encodes structural racism even though the algorithm is technically fair. Therefore, the bias is not just how it is used, but in what it is built on. While I agree that COMPAS met calibration and did what it was supposed to do statistically, I do not think that makes it fair. A truly fair algorithm should not only reflect the base rate gap, but also needs to actively avoid reinforcing the disadvantages already faced by Black defendants🤔

Justin September 26, 2025, 10:09 PM

I get where you're coming from about the higher prediction errors disproportionately affecting black people. I did some digging on that and it turns out that due to the differing base rates of recidivism it's mathematically impossible to have both calibration across groups and equal false-positive/false-negative rates. You quite literally have to choose one or the other. To your point about the algorithm considering factors like prior neighbourhood, part of me agrees with you that these questions might indirectly help identify race, but at the same time, these are convicted criminals. I think it's sensible to prioritize correctly identifying individuals who are likely to reoffend, and as such to give the algorithm as much information as possible. These factors at best can allow you to make an educated guess at one's race, but i don't see a compelling argument that the algorithm used this information to discriminate against certain ethnic groups. The algorithm was still calibrated across groups, despite having access to these potentially discriminatory factors. I do agree that the usage of the algorithm needs to be rethought in order to avoid reinforcing disadvantages, but i don't think the algorithm itself has anything to do with that.

Guide September 26, 2025, 10:09 PM

Luna, Justin is making a key mathematical point — that you literally cannot have both calibration AND equal error rates across groups when base rates differ.

But you're arguing the algorithm is "built on" structural racism through proxies like neighborhood. How do you respond to his claim that these factors help predict recidivism accurately, even if they correlate with race?

Is there a way to balance predictive accuracy with avoiding the reinforcement of systemic disadvantages?

Luna September 27, 2025, 3:32 PM

Hi Justin, sorry for late reply! Your arguments really make sense, especially the algorithm cannot have calibration and equal rates at the same time. That is exactly why we cannot stop at calibration. We still need to rethink our decision frequently (based on the Ethical Thinking definition). If sticking with calibration leads to more harmful errors for Black defendants, then the question arises whether calibration is the best solution for the fairness problem. I think the key issue is what kind of accuracy we are aiming for. If proxies like neighborhood data are strongly correlated with race, then it may make the algorithm better at reproducing biased outcomes than at measuring someone's true likelihood of reoffending. That is why I think we have to be careful about saying "more predictive features" with "more fairness". And as you said, the algorithm needs to be rethought, since the real challenge is balancing accuracy with fairness. In my opinion, one possible way is to reframe the question instead of just asking "Who is most likely to reoffend?" We could ask, "Who needs support to avoid reoffending?" By addressing this kind of approach, we are not throwing out accuracy, but we are making sure the model serves justice too:)

Justin September 27, 2025, 5:46 PM

No worries at all! Thanks for such a thoughtful reply :) your compassion really comes through in these messages. I am curious how predictive these proxy factors actually are, and how much worse the algorithm would be without them. I think that if they can be safely removed without much effect on the algorithm's accuracy then they should be. But if they are necessary they should be kept. Like I was saying, at the end of the day the safety of society should be prioritized over the anonymity of a convicted criminal. I think it's worth keeping in mind that the algorithm already obscures a defendant's ethnicity much more than having humans judge a defendant's likelihood of reoffending (obviously lol). And i do wonder that if you remove these factors, if there will be a new bunch of factors to take issue with. I imagine it's probably easier than we both expect to take a pretty good guess at one's race no matter how much you strip down the questions asked. It only really makes sense to me to be looking at removing these questions if there is reason to believe the algorithm is biased, which i don't see any evidence for. I think your point about supporting these defendants is a great one and i agree with that fully. The current system definitely does not do enough to help reintegrate people into society.

Luna September 28, 2025, 4:34 PM

Thank you so much, Justin! I've really appreciated this discussion:) You've made me think more critically about calibration and how far we can simplify an algorithm before it loses its value. I agree with you that public safety is important, but we must also consider fairness in this context. Even if proxies cannot be fully removed, but we have a responsibility to constantly revisit whether this system is inadvertently reinforcing inequities. At the end of the day, I think we're both on the same page that the algorithm alone is not enough. We need better social support to make reintegration possible😊

Guide September 28, 2025, 4:34 PM

Justin, Luna seems to be wrapping up, but there's still a key disagreement here.

You've argued the algorithm itself isn't biased if it meets calibration standards. She's arguing that using racially correlated proxies makes it biased regardless of calibration.

Do you think there's a meaningful distinction between "technically unbiased" and "practically biased" algorithms? Or is calibration sufficient to call something fair?

Justin September 28, 2025, 5:11 PM

That's so nice of you to say, i've really appreciated this discussion as well! I agree, it seems like we're on the same page that the algorithm can be unbiased but still unfair if used in the wrong manner. To answer the AI question, yes i think there's a meaningful distinction there. The proxy factors are a great example of something that can make the algorithm practically biased while being technically unbiased. There's certainly a case to be made that factors like previous neighborhoods introduce bias. Since you're being judged based on your neighbours actions, your level of income, etc. i can see the point to be made for that introducing practical bias to the algorithm.

Understanding Quiz

Justin

Which piece of evidence did Luna cite to support her claim that COMPAS is biased against Black defendants?

Justification

She referred to Brian Hedden's argument on calibration within groups.

She pointed to ProPublica's report about differential prediction errors.

She argued that hiding race in input features would remove bias.

She cited official Northpointe statements denying bias.

Why did Luna argue that calibration does not guarantee fairness?

Justification

Because calibration ignores social context and can lead to harmful errors for Black defendants.

Because calibration always produces equal false-positive rates.

Because calibration ensures equal outcomes across groups.

Because calibration leads to lower overall predictive accuracy.

How did Luna respond to Justin's claim about the impossibility of achieving both calibration and equal error rates?

Justification

She agreed calibration was best and no further action was needed.

She insisted on discarding calibration altogether in favor of equal error rates.

She said we must rethink which accuracy matters and apply ethical thinking.

She proposed adding explicit race variables to improve calibration.

What alternative framing did Luna propose to address fairness beyond predicting recidivism?

Justification

To ask who is most likely to reoffend one year later.

To use an algorithm that only considers age and criminal history.

To shift focus to which defendants need support to avoid reoffending.

To calibrate the algorithm exclusively on White defendants.

How did Guide help clarify the mathematical constraint discussed in the conversation?

Justification

By explaining that calibration and equal error rates are both achievable.

By summarizing Justin's argument that one must choose between calibration and equal error rates.

By providing data on ProPublica's findings.

By suggesting the discussion move to another topic.

Luna

When you mentioned COMPAS was "calibrated within groups", which evidence did Justin provide to support that calibration?

Justification

He showed that people with a risk score of 7 had the same reoffending rate regardless of race.

He reported that false positive rates were equal for Black and White defendants.

He cited Northpointe's explanation that differing base rates explained the disparities.

He referred to Hedden's claim that removing proxies would harm calibration.

When you raised concerns that proxies like neighborhood data correlate with race, how did Justin respond regarding those factors' role?

Justification

He asserted that neighborhood data do not actually predict recidivism and should be removed.

He argued that such proxies are essential predictive features and removing them could harm public safety.

He suggested using those features but then randomly adjusting scores to ensure fairness.

He recommended replacing proxies with direct racial information for better transparency.

At the start of the discussion, what was Justin's initial stance on whether the COMPAS algorithm was biased against Black defendants?

Justification

He immediately disagreed and defended the algorithm's fairness.

He had no opinion until reviewing ProPublica's report.

He initially agreed but later adopted Hedden's calibration argument.

He suggested focusing on providing support rather than assessing risk.

Towards the end, how did Justin distinguish between "technically unbiased" and "practically biased" algorithms?

Justification

He said any algorithm meeting calibration cannot be practically biased.

He argued practical bias arises only if users ignore calibration.

He claimed bias only matters when error rates differ across groups.

He noted that proxy factors can make the algorithm practically biased while it remains technically unbiased.

When Guide summarized the remaining disagreement, what mathematical constraint did Guide highlight?

Justification

You cannot have both calibration and equal error rates across groups when base rates differ.

Proxies like neighborhood must always be removed to avoid bias.

The algorithm should reframe its question to focus on support rather than risk.

False positive errors are more harmful than false negatives in this context.

Survey results

Opinion Changes

Students rated the following statement: The COMPAS recidivism algorithm is/was biased against Blacks.

Strongly
disagree

Moderately
disagree

Somewhat
disagree

No idea

Somewhat
agree

Moderately
agree

Strongly
agree

Justin

Luna

—

Pre-chat opinion

Post-chat opinion

Moved toward agreement

Moved toward disagreement

Partner Ratings

Statement	Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
Guide's contributions improved the discussion	—	—	—	Justin	Luna
Guide treated me and my partner with equal respect	—	—	—	—	Justin Luna
I felt comfortable sharing my honest opinions with my partner	—	—	—	—	Justin
I was not offended by my partner's perspective	—	—	—	—	Justin Luna
My partner was respectful	—	—	—	—	Justin
My partner had better reasons for their views than I expected	—	—	—	—	Luna
It was valuable to chat with a student who did NOT share my perspective	—	—	—	—	Justin Luna
Sway helped me articulate my thoughts/feelings better	—	—	—	—	Luna

Optional open feedback

"How did this Sway chat affect your confidence discussing complex issues with people who hold different views from you?"

Luna: "Sway chat is really helpful, especially to express my opinion with other people! I like some of the features, like the Guide who assists us during the discussion and the Sway notification that always reminds me about the discussion. Thank you!"

Word Counts

AI Guide

319 words

5 messages

Student

Justin

1113 words · 6 messages

Student

Luna

709 words · 6 messages