The ATS Conspiracy

Searching for a job has become even more difficult than usual lately, and it seems that much of the problem is due to applicant tracking software (ATS). This is software that is designed to filter through applicants’ résumés looking for exactly the skills and experience the role requires. This is intended to make things easier for hiring managers: rather than sort through thousands of résumés by hand, they can instead sort through the dozens that survive the filtering process. This means a lot of applications are cut before any human being even sees them, and no feedback is given. At best, you get an auto-generated email that says “you’re not exactly what we’re looking for at this time.” At worst, you get ghosted.

Naturally, job applicants are keen to write better résumés that can survive the ATS filters. Never fear! There are a plethora of ads on LinkedIn promising “AI solutions” designed to reformat your résumé to make it through the gauntlet. Isn’t technology wonderful?

The problem, of course, is this: Either this counter-ATS software doesn’t work, in which case it’s a scam; or it does work, in which case ATS is a scam.

Think about it: the whole reason companies pay for applicant tracking software is because it promises to filter out applications from unqualified candidates, leaving only applications from qualified candidates. But if these “AI solutions” actually work, then ATS actually lets through applications from qualified candidates plus candidates who shelled out money for counter-ATS software.

If we want to be slightly more charitable, perhaps ATS filters out most unqualified candidates, but also some qualified candidates. Counter-ATS software thus prevents you from falling into the latter category, by turning a qualified by ill-formatted résumé into one that can survive filtration, but if you simply aren’t qualified, it can’t help you. This sounds slightly less scammy, but it still suggests serious flaws in ATS if it filters out enough qualified candidates that such countermeasures become necessary.

(dons tinfoil hat)

I’d like to offer a further possibility: both ATS and counter-ATS are scams.

And, just to elevate my vague suspicions to the level of conspiracy theory, what if the ATS and counter-ATS software are sold by the same company (perhaps through a network of shell corporations to disguise this fact), and were designed with each other in mind from the outset? Here’s how it would work:

  1. A company pitches software to employers that is designed to lessen the workload on hiring managers by reducing the number of applications they have to review. It does just this; what they don’t know is that it’s only looking at imperceptible differences in formatting, or maybe at a cryptographic hash hidden in the document’s metadata.
  2. At the same time, “another” company (really the same company doing business under another name) pitches software to job seekers promising to help them beat the ATS. In reality, it just subtly modifies the formatting of their résumés, or inserts the right cryptographic hash, to guarantee that it will pass through their own software.
  3. Profit!

I have no evidence that any of this is indeed the case, but it certainly feels true, which is just as good.

(doffs tinfoil hat)

OK, for real, employers have a genuine need to cull job applications to a manageable level. Back in the days when applying to a job involved printing out a résumé and mailing it to an employer at your own expense, there was a built-in incentive to apply to a small number of jobs, preferably only those you thought you had a decent shot at getting. But with the rise of Internet job sites like LinkedIn and Indeed, it became much cheaper and easier to spam employers. You could fire off applications to as many jobs as you wanted, even if you lacked the necessary qualifications. Pretty much every job I’ve applied to on LinkedIn already had at least a hundred applicants, and some jobs had many more. There needs to be some sort of triage, and it’s only reasonable that employers would be interested in automating the process.

The problem is that the process by which ATS filters résumés is opaque, and it’s unclear to me exactly how it’s optimized. In order to evaluate an applicant tracking system, you would have to have some independent way of evaluating whether an applicant was qualified or not, and use this metric to evaluate the software’s performance. For example, you could have a human hiring manager evaluate a thousand résumes, then have the software do the same, and see how often the software agrees with the human.

There are a couple of problems with this. The first is that it assumes your independent metric is accurate. No doubt many human hiring managers are very good at their jobs, but even they sometimes pass on good candidates and hire bad candidates. The only real way to be sure is to actually hire someone and see how they do on the job, but it’s difficult to A/B test this: what are you going to do, hire all the candidates, even those you suspect to be unqualified, and see how they actually perform? So you’re not likely to get a piece of software to outperform a human hiring manager. The best you can hope for is for it to perform as well as the human (including making the same occasional mistakes that the human makes).

The other problem is that, even if you have an independent metric with which to evaluate the model, there are multiple possible ways to measure quality. The naïve approach is to measure accuracy: what percent of applications were given the same score as the human, and what percent were given different scores? Accuracy works for many types of tasks, but it presupposes that all errors are alike. In practice, they’re not. There are two other metrics we often use: precision, or the percentage of applicants that made it through the filter that were qualified; and recall, or the percentage of qualified applicants that made it through the filter. Depending on the context, we might heavily prioritize one or the other. A bomb-detecting device, for example, should prioritize recall, since the consequences of misidentifying a benign piece of luggage as a bomb are much lower than the consequences of failing to identify a bomb. A spell checker might prioritize precision, since you probably want to be sure that all the words in your document are spelled correctly, even if occasionally a valid word is marked as misspelled.

I would argue that applicant tracking software should prioritize recall over precision. The reason is that if a piece of software occasionally lets bad résumés through, the human hiring manager can always double-check those that survive the filters. But if the software marks a good application as bad, chances are the humans will never even see it. After all, the whole point of using the software is to reduce the amount of manual review they need to do! Still, I can’t help but wonder whether precision or accuracy, rather than recall, is being prioritized here, given the number of complaints from ostensibly qualified candidates whose résumés are silently rejected.

I don’t know what technology lies under the hood of most applicant tracking software, and it’s in the best interest of the companies not to reveal that information, just like the internals of Google’s search rankings are jealously guarded from SEO practitioners. But most of what I’ve read posits that it’s a simple document search that looks for key words and phrases indicating the particular mix of skills and experience being sought. If this is the case, it would explain why many qualified individuals are rejected: they might have the necessary skills, but not describe them using the exact words and phrases the software expects, or use an unexpected document layout that makes those phrases hard to extract. I would argue that if you’re looking for specific skills, you shouldn’t be scanning through plain-text documents in the first place. Instead, you should have applicants fill out a web form, where they can select skills via drop-down menus and enter their experience via text boxes. It’s an example of making illegal states unrepresentable. Indeed, many online applications I’ve filled out do exactly this. Most even have a feature whereby you can upload your résumé, and the software will pre-populate the answers with its best guesses based on the plain text, but allow you to edit them manually in case something went wrong. Yet I’ve still been ghosted by employers using these web forms, even when I seemed to have the exact set of skills the job required. It’s possible that my application was rejected by a human reviewer, and that’s fine, but in such a case it would be polite to send a brief message indicating why, or at least a vague form email. Most don’t even do that, which makes me suspect there’s still some software filtering going on.

At least some applicant tracking software seems to be using “AI” (at least according to their advertisements), which is an unfortunately vague marketing term that encompasses everything from expert systems (glorified switch statements) to large language models (glorified autocomplete). Given today’s fad-driven software landscape, I can almost guarantee that most of them are making API calls to ChatGPT or one of its competitors. Those are notoriously error-prone and should not be trusted to make their own decisions – just ask Air Canada. But there’s also probably some more prosaic supervised machine learning models in use for assigning scores to applications. The problem with supervised learning is it requires a training set, which means it will tend to give higher scores to applications that are more similar to the high-scored examples from that training set. While the tech industry prizes innovation and seeks out those who “think different” (as Apple’s slogan used to say), these machine learning models virtually guarantee that all their hires will be similar to their previous hires. And of course there’s the omnipresent issue of training bias. I would be completely unsurprised to find that ATS models are biased against already underrepresented groups, particularly those for whom the language used on their application is not their native language, or who speak less prestigious dialects of that language. The only way to avoid that is to pour a lot of effort into assuring that the training set is diverse and correcting any bias that is detected. I’m not sure that effort is being made, though, because the companies who buy this software are more likely to care about precision (getting good candidates) than recall (discriminating against qualified candidates). And if the details of the models used are kept secret, and applicants aren’t even told why they’re being rejected, I would think it would be nearly impossible to mount a class-action suit charging the software companies or their corporate customers with employment discrimination.

Much like the infuriating phone menus that are rapidly replacing human customer service representatives, ATS has succeeded because it’s faster, cheaper, and more scalable than skilled humans, not because it’s better. It could, in fact, be much, much worse than a skilled human and still be worth it from a cost/benefit standpoint. In other words, it’s just another example of enshittification. Things won’t improve until we collectively realize that this race to the bottom is something we chose to do – and, more importantly, that we can choose not to do.