HACKER Q&A
📣 yogoism

In-house or outsourced data annotation? (2025)


While big tech often outsources data annotation to firms like Scale AI, TURING, and Mercor, companies such as Tesla and Google run in-house teams.

Which approach do you think is better for AI and robotics development, and how will this trend evolve?

Please share your data annotation insights and experiences.


  👤 PaulShin Accepted Answer ✓
Interesting question. As the founder of an AI collaboration platform (Markhub), we live and breathe this problem every day. My take is that the best approach isn't a simple choice between in-house vs. outsourced, but a hybrid model focused on the quality and context of the data.

For our foundational models (e.g., text summarization), we start with powerful base models like Gemini and fine-tune them. But the real magic happens with our proprietary data, and for that, outsourcing is not an option.

Here's our approach: Our own product, Markhub, is our primary annotation tool.

When our early users give feedback—like circling a button on a screenshot and commenting "This color is wrong"—they are, in effect, creating a perfect piece of labeled data: [Image] + [Area of Interest] + [Instruction].

We call this "Collaborative Annotation" or "In-Workflow Labeling." The data quality is incredibly high because it's generated by domain experts (our users) as a natural byproduct of their daily work, full of real-world context. This is something an external annotation firm can never replicate.

So, to answer your question on how the trend will evolve: I believe the future isn't a binary choice between in-house and outsourced. The next wave will be tools that allow teams to create their own high-context training data simply by doing their work. The annotation process will become invisible, seamlessly integrated into the collaboration flow itself.