Labeling Data is a Mistake

Published by

Had Seddiqi

on

August 15, 2022

The biggest problems in ML start with data. The best way to get good data is to build an app where users generate good data. Ideally this app is enough of a value add to be financially sustainable too.

Labeling is basically a hack to approximate that process. Instead of attracting users, you hire them for cheap. Instead of building a great product with your users in mind, you buy a labeling tool and shoehorn a problem into one of the templates. There’s no real customizability, and worse, there’s no empathy.

Aside from making labeling a terrible job, it also makes many ML problems intractable. Doctors, lawyers, and chemical engineers don’t sit down and label data one at a time. Yet many who work on those problems understand much better ways of solving it. The best teams will build their product, processes, and organizations more seriously with the technology needed to do the job. (I believe this is a major way AI startups will differentiate themselves, but it will take time to see.)

One obvious barrier to doing better is cost. Building a full-fledged product is expensive and takes time. Strong product builders require a variety of skills and experience to do it well. ML teams often don’t get the resources they need, even if they have good product sense for the problem they’re solving.

One way through this problem is tools. Often ML engineers have a good enough view into the problem to do much better than a labeling tool. But the ML skillset rarely includes UI programming or backend engineering. It’s not trivial to build an app, but these days entire startups are built singlehandedly. And the space of what ML apps would do isn’t exceptionally large–to do better than an off-the-shelf labeling tool doesn’t take much.

lI’ll have to stop here for now, as long as I want to keep my current work in stealth. 🙂

(But if this interests you, I’ll happily share my less-baked ideas.)

Labeling Data is a Mistake

Share this:

Leave a comment Cancel reply