I'm working on a machine learning project that needs a lot of TikTok video data that has to do with ads and behavior. I'm gonna do a bunch of transcription and analyses on individual videos after getting their ids.
I'm facing a problem right now. I estimate only about 5% of the data fetched by tools like ensembledata through the degrees of freedom it allows (individual hashtag / keyword search.)
I understand this possibly is a confounding problem.
My question is has anyone here worked on something similar? How did you approach this?
Did you use iterative/stratified sampling?
Thank you!