Bath salts, big tech, and passing the buck

I’ve had a lot of thoughts about the irresponsible, “move fast and break things” way the tech industry has progressed for many years, but I haven’t found a good way to articulate them. Recently I saw an ad on LinkedIn that made it all come into focus.

This ad was for a tool to be used by recruiters and hiring managers to assist in the hiring process. The tool promises to let you “Effortlessly generate tailored interview questions, conduct video interviews with automatic evaluation, and receive detailed feedback instantly.” The accompanying video explains that it will automatically generate suggested questions based on a job posting, and summarize the candidate’s answers. But the big value add? “In the final step, as soon as the interview is finished, detailed feedback will be generated. It will explain the answers’ correctness and completeness, and provide summarized recommendations for that particular candidate.”

This isn’t just disturbing. It’s positively Orwellian, and I can confidently say I would not want to be hired by any company that would use such a tool. Here are a few reasons why:

The tool almost certainly makes use of large language models, the cool tech of the day. As I and many others have repeatedly emphasized, large language models do not know anything. They are not designed to know anything. They are designed to predict the next word given a series of other words, based on training data largely scraped from the Web. The extent to which they generate true answers to questions is limited by the extent to which the training set contains correct answers to those questions, and even then, nobody – not even the people who built them – knows exactly what’s going on in the many hidden layers of their neural networks. It’s possible they can generate false information that wasn’t even in their training set, as probably happened with the space bears incident. And generating disinformation isn’t an example of LLMs malfunctioning. It’s an example of their doing exactly what they were designed to do: generate plausible-sounding text.
The questions the model generates are likely the type of low-quality questions that send up red flags when I get them in interviews. When you interview someone, the knowledge transfer isn’t one-way; the questions you ask tell the candidate a lot about what you think makes for a good employee. If you ask a software developer a lot of questions that could be easily answered by a Google search, but fail to ask the sort of deeper questions that indicate an ability to think about how to break down problems and translate them into code, you’re effectively saying “Google can do your job. You’re replaceable.” Low-quality human-generated questions are a dime a dozen in job interviews, which means they’re probably well-represented in the training set.
The auto-generated feedback is the worst part. Since no large language model is a domain expert, there’s no way it can accurately judge the correctness of the candidates’ answers. It might be able to do so for the “programming language trivia” types of questions like “What’s the difference between struct and class in C++?”¹, but those are the least important questions to ask, and the easiest to answer with Google. More probing questions that tease out the processes candidates would use to solve complex, real-world problems are much more open-ended and less likely to have definitive right answers. So how can this tool generate a measure of “correctness” for such answers? The answer, of course, is: it can’t. The best it can do is compute the similarity of the candidates’ answers to the answers in the training set. Which leads me to the next problem:
Training bias is a real problem. If this model is just comparing candidates’ answers to the answers it’s been trained on, who’s to say the similarity metric is based on quality, rather than on something else, like style or diction? Maybe the training answers were mostly given by people who spoke “acceptable” dialects of English, so non-native speakers, speakers of African-American vernacular English, or simply people with less formal education might score lower on average. I don’t know if this is in fact the case, but my point is, we can’t know. Not unless the code and training data are made public so that they can be examined for bias. And of course, no company wants to do that.

As can be expected, many of the comments on the ad were negative. The company responded to most of the negative comments, but their responses are even more enlightening. One commenter pointed out that using this tool to screen candidates means you won’t “necessarily get the best person for the job, just the person who performs best at this type of screening.” The company replied that “it would be important to emphasize that a type of screening is entirely the responsibility of the process owner”. A response to another concern insisted the tool “neither takes decisions instead of humans nor solely makes the final evaluation. The decision is expected to be taken by humans anyways.” Still another response is that it “doesn’t take any decisions instead if [sic] humans or anything of that kind”.

What did all of these comments have in common? None of them denied the shortcomings that were pointed out. Instead, they all essentially said, “It’s not our fault. You’re responsible for the decisions you make when using our product.” And this cop-out summarizes much of what’s wrong with the tech industry. It reminds me of the designer drugs that are labeled “bath salts” and stamped “Not for human consumption” so the manufacturers can escape blame when someone snorts them and eats someone’s face off.

It reminds me, too, of another scourge that Silicon Valley “disruptors” wrought: dockless rental scooters. I am constantly finding these things discarded in handicapped parking spaces, or blocking ramps and sidewalks, or otherwise generally making life miserable for people with disabilities. Yet I’m sure the companies that make them insist that their hands are clean, because it’s the responsibility of the people who rent them to park them properly. This may be the case, but people weren’t clogging public infrastructure with e-waste before these things came along. The scooter companies made it possible for this form of misuse to occur in the first place, it was a misuse that could have easily been anticipated, and the companies took no steps at all to prevent it. Instead, as long as they are not legally held liable (or can dismiss any fines as a reasonable cost of doing business), they don’t care.

We see the same behavior from Facebook (possibly the originators of the phrase “move fast and break things”), who built a tool that allowed dangerous misinformation to spread unchecked, gave terrorist organizations and hate groups unprecedented reach for recruitment, and did little or nothing to stop this misuse.

We see it in the cryptocurrency industry, in which the most widely-used protocols were deliberately designed to be wasteful. Mining requires expensive hardware which is pushed to its absolute limit, consuming frightening amounts of electricity and generating tons of e-waste, and it’s not a bug – it’s a feature. It’s designed to be wasteful, in order to make a 51% attack on the blockchain prohibitively expensive. As a result, the various proof-of-work blockchains consume more power than some countries and are helping to make our planet uninhabitable. But it’s not Satoshi’s fault, right?

We see it in the purveyors of LLM-powered chatbots, who created tools that could generate bullshit at an unprecedented rate, and then, when they’re used to do exactly that, sell us bullshit-detecting tools that don’t work and claim no responsibility when they’re used to exclude people who didn’t even use the bullshit generators.

Of course, it’s not just the modern-day tech industry that does this. Car and firearm manufacturers were doing it a more than a century ago. It’s a tale at least as old as capitalism: create something that will almost certainly be used to do bad things, take no steps to prevent said misuse, take no responsibility for the inevitable results, make money. If you spread the negative externalities thin enough, they magically disappear. If everyone’s a little bit responsible, nobody’s at all responsible. Then when the catastrophic results become big enough they can no longer be ignored, we shake our heads and say, “Nobody could have seen it coming. Nobody could have prevented this.”

Legend has it that, long ago, there were government regulations that could prevent the most egregious uses of technology. Today, in most of the developed world, this kind of government regulations are like white rhinos: rare, and being hunted to extinction, despite the best efforts of a dedicated few to preserve them.

Fortunately, there seems to be a growing consensus that tech companies rarely have our best interests in mind. People are thinking more critically about technology, and while figures like Elon Musk still have their cults of personality, it does seem like the general public is less likely to heap unconditional praise on industry giants than they were ten or twenty years ago. It would be great if that skepticism resulted in better government oversight, but at the very least it means more bad publicity and more expensive lawsuits when the industry steps out of line. And so my advice to the industry is to ask the important questions before you go to market. Big companies: stop firing your AI ethics boards. Startups: invest in ethical oversight early, just like you invest in your legal team. Hire people who can point out the negative consequences your product might have, and listen to them. Don’t just look for ways to shift the blame to someone else. You might not be the first to market with your world-changing product, you might not get stupidly rich quite as fast as you want, but maybe you’ll avoid being a bunch of mindless jerks who’ll be the first against the wall when the revolution comes.

Structs are public by default, classes are private by default. ↩︎

Last modified on 2023-09-12