HACKER Q&A
📣 alliewithane

TikTok scraping – maximize signal when only 5% of content is useful?


Hey everyone.

I'm working on a machine learning project that needs a lot of TikTok video data that has to do with ads and behavior. I'm gonna do a bunch of transcription and analyses on individual videos after getting their ids.

I'm facing a problem right now. I estimate only about 5% of the data fetched by tools like ensembledata through the degrees of freedom it allows (individual hashtag / keyword search.)

I understand this possibly is a confounding problem.

My question is has anyone here worked on something similar? How did you approach this?

Did you use iterative/stratified sampling?

Thank you!


  👤 huffledruidpuff Accepted Answer ✓
You can try headless browsers and crawling based on your business logic.