Well another factor is providence: they don’t keep around exactly where they got their data from. Sometimes on a set level, but almost never on an individual sample. “We found csam somewhere on maybe reddit or imgur or pinterest” is practically worthless