TikTok scraping – maximize signal when only 5% of content is useful?

Question

Hey everyone.I'm working on a machine learning project that needs a lot of TikTok video data that has to do with ads and behavior. I'm gonna do a bunch of transcription and analyses on individual videos after getting their ids.I'm facing a problem right now. I estimate only about 5% of the data fetched by tools like ensembledata through the degrees of freedom it allows (individual hashtag / keyword search.)I understand this possibly is a confounding problem.My question is has anyone here worked on something similar? How did you approach this?Did you use iterative/stratified sampling?Thank you!

huffledruidpuff · Accepted Answer

You can try headless browsers and crawling based on your business logic.