Issue 5
nonfiction
Sometimes mathematicians give their theorems cute names. The birthday paradox. The traveling salesman problem. The four-color theorem. The stable marriage problem.
That last one sounds like a social issue that a conservative think tank might fund research on, as does the explanatory situation used to illustrate it: given a set of men and a set of women and a list of their ranked preferences, how can a matchmaker arrange matches such that no unmatched pair prefer each other to their official match? It’s also known more generally as the stable matching problem. Two economists, David Gale and Lloyd Shapley, wrote a paper in 1962 describing an algorithm to match college applicants with colleges such that every pairing was stable. They started by examining a simpler “special case” where there was exactly one applicant per college: “highly unnatural” for college admissions, but “fit quite readily” into the hypothetical marriage market.
The simplest Gale-Shapley algorithm, using their original gendered framework, requires the “boys” to propose to their “favorite girls,” who would then reject all but one each, stringing those along as each subsequent round of rejected boys proposed to their second-favorite girls, and so on. Gale and Shapley pointed out that this always results in a stable match; no final pair would prefer each other to their assigned match, otherwise they would have matched. They extended their algorithm back to the college admittance problem, allowing each college to keep a number of applicants on a waiting list between rounds of applications. They further showed that the final match was not just stable but optimal for the applicants: any applicant ends up with as highly-ranked a college as they would in any other stable match.
Economists call these sorts of situations where parties on two sides must be paired “matching markets.” Here, we’re looking at two-sided matching markets, where preferences matter from both parties, and where matching is done by way of a centralized algorithm. Centralized matches have been created to handle markets from high school admittance to rabbinical placements to server allocation.
The most famous centralized match has a name that would fit perfectly in a young adult dystopian novel: the Match, more formally known as the National Residency Matching Program (NRMP). The number of residency positions handled by the NRMP has quadrupled from about 10,000 at its inception to around 40,000 today. In the United States, the Match functions as a near-inescapable gateway to medical practice. Even fully-trained, experienced doctors from other countries usually need to complete a residency, though some states are considering relaxing this requirement. This residency requirement is one of the primary ways the supply of licensed doctors in the US is restricted.
After medical school, new doctors must continue their training by way of medical residencies. These positions are essentially apprenticeships in their chosen specialty that culminates in board certification. Unlike earlier hands-on phases of medical training, residency is considered to be actual gainful (if also exploitative and underpaid) employment; residency spots are limited, and competition is fierce. The stakes couldn’t be higher — years of education and hundreds of thousands in student debt all hinge on this single matching process, which also shapes a medical career for decades to come.
The fierceness of the competition initially began to escalate in the 1920s, with medical schools complaining that hospitals’ residency offers had increasingly encroached backwards into students’ final, and eventually even penultimate, year of medical school. A kind of temporal arms race had developed, with each hospital trying to secure the best candidates by offering positions earlier and earlier.
In 1945, the trend reached a head and a consortium of medical schools agreed to collude to ensure a uniform timeline for offers. This had the opposite effect, however, and offer timelines became increasingly short, leading to “exploding offers” where students sometimes had to accept an offer before hanging up the phone, leaving no time at all for comparison. The Association of American Medical Colleges agreed to coordinate further, and a centralized match debuted for the class of 1952. Today, the week the final matches are revealed is one of the most stressful weeks of medical education, which culminates in a very public Match Day where prospective residents open their offers together, take photos, and celebrate (or grieve) together with their classmates and families.
There had been a last-minute change in the proposed algorithm, when a group of students argued that the initial design slightly penalized students for ranking a long-shot hospital as their top choice. The adapted algorithm, known within the medical community as the “Boston Pool” algorithm after its trial run within Boston-area hospitals, eliminated this risk. Thirty years later, economist Al Roth wrote a paper pointing out that the Boston Pool algorithm adopted by the Match was, in fact, essentially the same as the Gale-Shapley algorithm, with the last-minute change incorporating precisely the “deferred acceptance” feature described by Gale and Shapley. Roth and Shapley won the 2012 Nobel Prize in Economics for their work on matching markets (Gale had died in 2008, making him ineligible).
Matching by algorithm is even older than the Match, though. It dates back to at least 1928, where it appears in the Preferential Bidding System (PBS) used by the National Panhellenic Conference (NPC), the national umbrella organization of the historically white undergraduate sororities. Like with residency recruitment, this was a reaction to increasing encroachment by sorority recruitment back in time, sometimes pushing back into new members’ final years of high school or prep school (hence the colloquial term “rushing”). Like with residencies, an initial agreement was made to restrict the recruitment timeline — and this has mostly stuck for the men’s fraternities, who still do not use a centralized match. The women’s sororities, however, adopted a centralized recruitment and matching process as a way to allocate the market of each campus’s population of potential new sorority women. In the PBS, sororities and potential new members (PNMs) rank each other after a series of recruitment events (today, highly-choreographed “parties”). This all culminates in a centralized list of bids offered by each sorority to their assigned list of PNMs. These are revealed simultaneously in a highly dramatic moment — Bid Day or Night, depending on the campus — that differs from medical residency Match Day mostly by the amount of Greek cheers and running that the newly-matched members perform.
There is one crucial difference between the PBS and the original Gale-Shapley algorithm, though: the PBS is unstable, in the mathematical sense, largely because there is some ambiguity in the description of the algorithm. There can be cases where the algorithm will simply fail to match one or more people because they essentially get skipped; the NPC guidelines did not specify exactly what to do in these cases. When Roth and Mongell brought up this scenario to the recruitment administrators they spoke with, the administrators all expressed surprise that it was possible, and gave different answers for how they’d handle that particular corner case. The two economists also showed that on campuses where sororities are operating under capacity, the final match is likely to be unstable, where potential members and sororities are not matched together despite preferring each other. Each person indicated they’d take a slightly different approach to these corner cases. Some of that has likely been standardized in the ensuing three decades by the assignment of centralized NPC consultants to each campus and the wide adoption of specialized software to manage the recruitment logistics and matching. The irregularities of human judgment gradually cede ground to the consistency — the sometimes brutal consistency — of code.
Using a centralized rules-based match allows organizations to claim a mantle of mathematical objectivity, conveniently glossing over the possibilities of artisanally hand-matched corner cases or instabilities. The NRMP website prominently features the “mathematical algorithm” used to do the matching (and the attendant Nobel Prize). The Panhellenic sororities lean heavily on historical data analysis to structure recruitment, complete with technical jargon for different parameters, and doing these calculations during recruitment is the main function of the NPC consultants assigned to each campus. Hurt feelings and structural inequalities are still a continual backdrop of both matches, but the pain is inflicted at an algorithmic distance. Don’t take it personally, in other words, since only an algorithm rejected you. The algorithm becomes both shield and sword — deflecting criticism while enforcing and even naturalizing institutional preferences.
Despite this technical authority and mathematical aura of fairness, the human elements remain stubbornly unpredictable. The rules for both medical residency and sorority matching are clearly explained to participants: participate in structured recruitment activities, submit any required paperwork or applications, and then thoughtfully rank your top choices on a single list. The algorithm will handle the rest. And yet, without fail, people break those rules every single time.
Some of the rules are about behavior surrounding the process. Sororities have strict rules about how they are allowed to interact with potential new members before formal recruitment is over, and about what they’re allowed to say or ask during rush events themselves, but even the most by-the-book sorority woman might break those rules without intending to by saying “see you later” at the wrong moment. Hospitals aren’t supposed to ask about candidates’ preferences or personal lives. But talk to nearly any doctor about their residency application experience, and you’ll hear story after story of residency administrators pressuring candidates or asking inappropriate questions, often couched within otherwise harmless small talk. One survey found that a full third of applicants had been explicitly asked how they would rank potential programs during interviews. A 2009 article in JAMA described the whole process as having an “atmosphere of gamesmanship” and offered both anecdotes and data of interviewers pressuring candidates, leading to a situation of “misrepresentation and mutual distrust.”
The rules that underpin the matching algorithm itself are straightforward enough to explain in a minute or two; the assignments can be made by hand, using paper or notecards. For residency applications, medical students apply to desired programs, are selected for interviews, and then everyone simply ranks their choices. The structure of these matching algorithms is designed specifically to work best when participants rank what economists call their “true preferences,” a theoretical construct that assumes perfect self-knowledge: in an ideal world where you have complete power, what would you rank first, second, etc? Some systems acknowledge the limitations of strict ranking. Sometimes you’re allowed ties; the sorority match system is designed so that throughout the process, PNMs and sororities are allowed to list a specific number tied as their top choice, creating a small opening for preference ambiguity, then must rank each in order after that.
Beyond the gamesmanship of pressuring candidates, there’s another fundamental problem: the algorithm’s core assumption collapses under scrutiny. It turns out that just ranking your “true” preferences is a difficult problem. Research has repeatedly shown that us humans do not have sensible preferences; scientists like to refer to this as “irrationality,” but it seems pretty baked-in to being human. You might prefer Harvard over University of Michigan, and Michigan over University of Hawaii, but then somehow still prefer Hawaii over Harvard, creating a cycle of inconsistency that no linear ranking can capture. Or your preferences might change moment to moment, shifting with new information or emotional states. Or you might be genuinely stumped between two choices. Deciding between a top-ranked and top-paid residency in a state far from your family and a lower-ranked program in a big city with a good support network, for example, might feel like trying to rank apples and oranges. One of the Panhellenic strategies for handling this is to emphasize the role that individual personal values must play in decision-making during recruitment orientation.
The ranked list format fails to capture another crucial psychological dimension of two-sided matching markets: their mutual nature. Would you rather be the last-choice candidate at your favorite program, or the much-desired number-one pick somewhere else? There’s no way to express nuance like that on a simple ranked list.
Consider the original algorithm chosen for the Match, before the adoption of the Boston Pool deferred acceptance approach. In that scheme, mutual number-one picks would be assigned first (a “1-1 match”). Then any hospital with remaining residency spots would get students in their second-ranked list who had chosen the hospital first (a “2-1 match”). Only then would remaining spots go to students on the hospital’s first-rank list who had listed the hospital second (a “1-2 match”). This priority system, while seemingly fair, created a strategic trap for students with ambitious first choices. In a retrospective analysis, Roth pointed out that this seems like it gives a strategic advantage to the students, since their #1 preferences were considered before the hospitals’. But in certain cases, a hospital’s residency list could fill up with its 1-1 and 2-1 matches before a student who had ranked it #2 could be assigned there, and if that student had chosen a long-shot hospital as their (true) #1 preference, the student could get boxed out of both options. Under the Boston Pool deferred acceptance model, this particular student could be tentatively paired with their second choice before that school moved on to make offers to its less-preferred candidates. Their long-shot school could come through at the very last minute (hence the “deferred” part, since that initial acceptance wouldn’t be final until all spots were filled), but if they didn’t, that #2 offer still came through.
In the lower-stakes case of the sororities, it is common enough for women to attempt to game their rankings that the Panhellenic recruitment training manual <link: "Panhellenic recruitment training manual" ref="https://npcwomen.org/wp-content/uploads/2023/07/Training-Concept_-Recruitment-Logistics-1.pdf" />
explicitly warns against a common misconception: ranking neither more nor fewer sororities improves a woman's chances at a particular sorority. There is no way to put all your eggs in one basket, as it were; you can only throw out the other eggs along with the other baskets. There are other reasons to make this specific strategic choice, since a PNM is not allowed to accept a bid to any Panhellenic sorority for a year after they’ve turned one down, but gaming the matching algorithm is not one of them. Crucially, though, Roth and Mongell found that this particular strategy is important for solving the instability in the PBS algorithm described above for the sororities operating under capacity: potential members who truly have a strong preference can only rank one sorority, fail to match, and leave themselves open to be approached by that sorority immediately after formal recruitment, if the failed match was mutual. This particular practice can actually prevent the algorithm from making unstable matches on some campuses. But then on the other hand, intentionally ranking only a single sorority can prevent a PNM from being matched by hand, should they end up as a failed algorithmic corner case. The NPC rules allow them to simply be left unmatched entirely.
The residency Match has added a variation to help some residents with another strategic problem: couples matching to nearby positions, a human complication requiring algorithmic accommodation. Without this option, couples entering the Match the same year were forced to attempt to game their preferences; perhaps you thought you had a really great interview at your favorite program, but if your spouse felt his interview at the same hospital had bombed, you might end up ranking that program lower on your list and violate your supposed true preferences, to avoid getting matched somewhere your spouse had no chance. But this single addition of matching couples together geographically transforms the Match from a straightforward calculation into a computational problem that becomes theoretically intractable — essentially impossible for even powerful computers to solve efficiently — if enough couples tried to match in the same year. So far that hasn’t happened, but there’s no guarantee it won’t.
People also try to game the system in ways beyond strategic ranking: since the COVID-19 pandemic forced hospitals to accept virtual interviews, the number of applicants for each opening has surged, resulting in a similar increase in the number of interviews. This gives today’s competitive applicants a relative advantage over their predecessors, since they no longer need to make hard choices about how many interview trips they could afford to take. This also evens the playing field between those with more time and money and those with less, although the top tier of students still receive a disproportionate number of the increased interview invitations. Some doctors have called this increase a “vicious cycle,” characterizing it as a type of prisoner’s dilemma in which neither students nor hospitals feel they can reduce the number of interviews without being uniquely penalized. The problem of managing so many interviews — mostly for the hospitals, to be clear — has become severe enough that a group of physicians has even proposed starting a pre-Match centralized match in order to “favor interviews that are more likely to lead to a match in the final Match.” In the face of algorithmic complexity, the solution appears to be yet more algorithms. It’s turtles all the way down.
Pushing responsibility onto faceless algorithms has become a common tactic to explain weird or antisocial behavior on social media. It takes only a cursory look at TikTok to see how pandering to unknown algorithmic rules can spawn increasingly bizarre behavior. Content creators and platform administrators blame-shift so often that “the algorithm made me do it” has become the digital equivalent of “the devil made me do it.” Unlike matching algorithms, these recommendation systems often rely on mind-boggling quantities of data. Most of this data was not consciously created; instead, it’s just the digital exhaust produced by our every click, scroll, and pause, a by-product of near-constant surveillance. Yet there’s a surprising parallel. Even in these cases of algorithmic rule by centralized matching — where, theoretically, it is simple to model preferences and it is possible to complete by hand — the rules do not prevent people from behaving in unexpected and sometimes destructive ways.
These simple scenarios don’t always behave nicely, defying elegant mathematical expectations even after a half-century of game theoretic study on the structure and incentives created by the rules. The number of edge and corner cases — exemplified by administrators’ surprise that the sorority match algorithm could fail — is difficult to account for without oversimplifying the messiness of the real people involved. On one hand, these match algorithms seem to work out fine, almost all of the time. But what happens when they don’t? Then human ingenuity takes over, bypassing the elegant mathematics in favor of messy, creative workarounds: attention to local context, clever solutions, and quiet cooperation.
Like all models, the models of preferences that drive centralized matching are sometimes useful, and always incomplete. Humans seem to be inherently resistant to rules and quantification of all kinds, and these fairly simple cases are no exception. Computers and ultra-complex algorithms make easy targets, but the ways we interact with any set of rules is bound to be surprising. The theorems that attempt to explain these rules may have cute names, but the humans they try to simplify remain gloriously, frustratingly unpredictable.
David Gale and Lloyd Shapley. College admissions and the stability of marriage. American Mathematical Monthly. 1962; 69:9-15.
Alvin E Roth, “The Origins, History, and Design of the Resident Match,” JAMA 289, no. 7 (February 19, 2003).
Alvin E. Roth, “The Evolution of the Labor Market for Medical Interns and Residents: A Case Study in Game Theory,” Journal of Political Economy 92, no. 6 (December 1984): 991–1016, https://doi.org/10.1086/261272.
A.E. Roth and S. Mongell, “Sorority Rush as a Two-Sided Matching Mechanism,” American Economic Review 81, no. June 1991: 441–64.
Carl Erik Fisher, “Manipulation and the Match,” JAMA 302, no. 12 (September 23, 2009): 1266–67, https://doi.org/10.1001/jama.2009.1388.
Glimcher PW. Efficiently irrational: deciphering the riddle of human choice. Trends Cogn Sci. 2022 Aug;26(8):669-687. doi: 10.1016/j.tics.2022.04.007. Epub 2022 May 25.
Eric J. Warm et al., “The Residency Match: Escaping the Prisoner’s Dilemma,” Journal of Graduate Medical Education 13, no. 5 (October 1, 2021): 616–25, https://doi.org/10.4300/JGME-D-21-00477.1.
Irene Wapnir, Itai Ashlagi, Alvin E. Roth, Erling Skancke, Akhil Vohra, Irene Lo, Marc L. Melcher; Explaining a Potential Interview Match for Graduate Medical Education. J Grad Med Educ 1 December 2021; 13 (6): 764–767. doi: https://doi.org/10.4300/JGME-D-20-01422.1
Jillian Foley is a writer and historian of technology. She earned her PhD in Conceptual and Historical Studies of Science at the University of Chicago, where she also received a degree in Computer Science. She lives in Chicago with her family.
Kevin T. Baker is a writer, editor and historian based in Chicago. He is a member of the Reboot Editorial Board.