In-house or outsourced data annotation? (2025)

Question

While big tech often outsources data annotation to firms like Scale AI, TURING, and Mercor, companies such as Tesla and Google run in-house teams.Which approach do you think is better for AI and robotics development, and how will this trend evolve?Please share your data annotation insights and experiences.

PaulShin · Accepted Answer

Interesting question. As the founder of an AI collaboration platform (Markhub), we live and breathe this problem every day. My take is that the best approach isn't a simple choice between in-house vs. outsourced, but a hybrid model focused on the quality and context of the data.
For our foundational models (e.g., text summarization), we start with powerful base models like Gemini and fine-tune them. But the real magic happens with our proprietary data, and for that, outsourcing is not an option.
Here's our approach: Our own product, Markhub, is our primary annotation tool.
When our early users give feedback—like circling a button on a screenshot and commenting "This color is wrong"—they are, in effect, creating a perfect piece of labeled data: [Image] + [Area of Interest] + [Instruction].
We call this "Collaborative Annotation" or "In-Workflow Labeling." The data quality is incredibly high because it's generated by domain experts (our users) as a natural byproduct of their daily work, full of real-world context. This is something an external annotation firm can never replicate.
So, to answer your question on how the trend will evolve: I believe the future isn't a binary choice between in-house and outsourced. The next wave will be tools that allow teams to create their own high-context training data simply by doing their work. The annotation process will become invisible, seamlessly integrated into the collaboration flow itself.