It was early spring, 2019, and I was staying late in the office, waiting for the after-work rush to pass before I headed to the gym. My manager pulled me into a call with a customer. This was a parent. They were upset. Their kid who applied to 20 schools— in part informed by our tool— and didn’t get into any.
You can always issue a refund. You can’t undo the outcome.
A few months earlier, I had led the development of a data-driven college “chancing engine”. You’d input a student’s profile, like GPA and SAT/ACT scores, along with demographic information. The model would determine a rough probability of admission to a given college. This result then informed a “balanced” school list with “reach”, “target”, and “safety” schools. The goal was simple: maximize the upside from “reach” applications while ensuring the student lands somewhere they can live with.
Under the hood things were messy. We trained on thousands of surveys and existing student information, but were always one admissions cycle behind on hundreds of schools’ notoriously opaque selection processes. At best, the model was a linear approximation of a definitively non-linear and spiky process.
Still, we did what we were supposed to do. Built the model with the best tools, selected informative features, tested it against known acceptance outcomes, and presented the results in terms of “absolute error”. Then, we shipped it. It was a big push, and we replaced a heuristic process with a data-driven and defensible one.
What I didn’t ask was a much simpler question:
If we ship this model, and thousands of students use it, how many students could be harmed if the model is wrong?
Not “what’s the average error?”, not “does this beat the baseline?”, but how bad do failures look in the tail?
The model didn’t just produce numbers. It informed decisions and shaped how our counselors helped students plan their applications. We had a human in the loop, but the model anchored the conversation. And yet, 20 applications later, one student was left with no options.
I don’t have a clean post-mortem here. My best guess is the student had a profile that the model and our dataset didn’t cleanly cover, like changing high schools or disciplinary issues. We also never adopted a “break glass” process to opt-out students who didn’t conform to our modeling assumptions. We had no way to say, “this student is outside the model’s comfort zone.”
We listened, we issued the refund, and we moved on, but the results didn’t go away.
When you build ML products, when you ship LLM-powered features, don’t forget to ask: “how bad could the outcomes be?” Sometimes a refund doesn’t make someone whole.