HACKER Q&A
📣 mackwell

How do I sell unique training data?


We have all seen the recent large deals made by tech companies to purchase access to various types of data for training their models (or Reddit, Photobucket). I have also seen some articles about the industry’s ever growing need for unique media and data that seem to suggest the existence of a market and brokers in need of new sources that are not online. They seem willing to pay, but I don’t see an obvious way to sell.

I believe I have access to troves that have never and will never be online. Some quick research has not turned up any obvious marketplace online or who to talk to.

Is anyone here in this business or have any advice or resources for people like me who want to explore offering training data for sale or license?


  👤 ChrisArchitect Accepted Answer ✓
Related today:

Cloudflare's new marketplace lets websites charge AI bots for scraping

https://news.ycombinator.com/item?id=41625903


👤 mmarian
The sales process is the same as with any other b2b product. You need to figure out its value and customers.

And make sure you're confident about the value. For example, in many workflows having only 10% coverage of the population makes the data useless.

I wouldn't worry about the licensing details as a startup. It won't matter until you can afford lawyers and reputational damage for pursuing someone who's broken the license.


👤 vvnh2018
We are building a marketplace for this. Feel free to shoot me an email and set up meetings vvnh2018@gmail.com

👤 dofato89
Interested in this too.