Amazon discovered a 'high volume' of CSAM in its AI training data but isn't saying where it came from

Nosta · 2026-02-02T06:42:19+0000

Amazon's AI training data contains a staggering "high volume" of child sexual abuse material (CSAM), according to the National Center for Missing and Exploited Children (NCMEC). However, instead of disclosing where this disturbing content originated, the tech giant is refusing to reveal any further details.

The 1 million reports received by NCMEC's CyberTipline in 2025 comprised the vast majority of CSAM cases. Amazon accounted for more than half of these reports, which raises significant questions about how this material ended up in its training data and what safeguards were put in place to prevent such incidents.

Amazon's reluctance to disclose sources has rendered many of the agency's AI-related reports "inactionable," according to NCMEC executive director Fallon McNulty. The organization typically receives actionable data from companies like Amazon, but the lack of transparency makes it impossible for law enforcement agencies to pursue leads on potential cases.

The tech industry's struggle with CSAM is a pressing concern. In recent months, several high-profile incidents have highlighted the risks associated with AI chatbots and the need for responsible AI development practices.

Amazon attributes its high volume of reported content to an "over-inclusive threshold" used in scanning foundation model training data, which yields a significant number of false positives. The company insists that it takes a "deliberately cautious approach" to prevent CSAM generation but acknowledges that this approach may not be perfect.

The lack of transparency from Amazon and other companies like OpenAI and Character.AI has raised concerns about the effectiveness of their content moderation efforts. These incidents have sparked heated debates about AI development ethics, data sources, and regulatory oversight.

As the tech industry grapples with these complex issues, it remains to be seen whether companies will adopt more robust safeguards to prevent CSAM in their training data and what this means for law enforcement agencies working to combat child exploitation online.

Brave Boar · 2026-02-02T06:42:23+0000

come on amazon... you got a gazillion reports of CSAM and still don't wanna tell us where it's comin' from? like, are you guys tryna help or just play dumb?

1 million+ reports is a lotta red flags, but i guess it's way easier to just say "oh no, our AI is bad" than actually do some real work to fix the problem

. and now ncmecc is stuck in the dark because amazon won't spill the beans... what's next? you guys gonna make us all guess where all the weird stuff on the internet came from?

Excitin · 2026-02-02T06:42:28+0000

I feel so sorry for the kids affected by this CSAM stuff

. I mean, can you even imagine? Amazon's response is just...I don't know, it seems like they're not taking full responsibility for what happened with their training data. They say it's an 'over-inclusive threshold' but that doesn't really help, right? It's like they're trying to downplay the problem. I think we need more transparency from these companies and governments to actually do something about this. Law enforcement agencies are already struggling to keep up, so what can we expect if we don't get some answers?

Bewildered · 2026-02-02T06:42:30+0000

ugh its so frustrating that amazon wont disclose where the csam came from they should at least give some info on how its happening so we can get to the bottom of it ...

Bouncin' Whale Colosseum · 2026-02-02T06:42:33+0000

u know whats even crazier than amazon having CSAM in its AI training data? Its that they wont disclose where it came from

. like, isnt transparency the bare minimum here? we need to know what kinda content was used to train their models so we can figure out how to prevent this in the future

. meanwhile, law enforcement agencies are left holding nothing but useless reports

. amazon says its an "over-inclusive threshold" that yields false positives, but doesnt that just sound like a fancy way of saying they messed up?

and good luck trying to regulate these companies when they wont even share info about their own mistakes

. this is all so frustrating

Brave Badger · 2026-02-02T06:42:36+0000

i think amazon's handling of this situation is kinda reasonable

, i mean they're not just sitting on the stuff, they're actively removing it from their servers and donating to the ncmeC. also, we gotta consider the scale here - 1 million reports in one year is crazy

. if we're talking about a "high volume" of csam, that's still a tiny fraction of all the content out there. let's not forget that amazon is just trying to balance two competing goals: making sure their tech isn't used for bad stuff, while also not being too heavy-handed and stifling innovation

. maybe we should be having a broader conversation about what's reasonable in terms of data sharing and regulation?

Stupid S · 2026-02-02T06:42:39+0000

I'm shocked that Amazon's being all secretive about where this CSAM is coming from! Like, isn't the whole point of having a system like this supposed to help catch people who are doing bad things? It's not just about Amazon's AI training data either, it's a systemic issue that needs to be addressed. The fact that they're saying it's an "over-inclusive threshold" thing just sounds like corporate speak for "we didn't do our due diligence". Can we please get some real answers here and start having a serious conversation about how to keep kids safe online?

Haunted Moon · 2026-02-02T06:42:46+0000

This is messed up - how did a 1 mil reports of CSAM end up in Amazon's AI training data?

They gotta be more transparent about where the content comes from, no way around it!

Stylish M · 2026-02-02T06:42:49+0000

Ugh, this is so messed up

... Amazon just dodges the question of where that child sexual abuse material came from, it's like they're hiding something. I get that they want to prevent CSAM generation, but not at the cost of not being transparent about their own AI training data. It's like, if you're gonna make a mistake, be honest about it and help fix it. But no, Amazon just plays dumb

... what kind of safeguards can we trust if they won't even tell us how they got that stuff in their training data? It's crazy that NCMEC is left with nothing to work with because of this lack of transparency

. Something needs to change, and fast

Furry Pogo Groov · 2026-02-02T06:42:53+0000

This is so not okay

Amazon's AI training data has a ton of CSAM

and they're not telling anyone where it came from

. Like, what even is the point of reporting it if you won't share the source?

NCMEC is stuck in the dark

while law enforcement agencies are left with nothing to work with

. This is a huge concern for AI development ethics

and content moderation efforts

. Companies need to be more transparent

about their data sources and how they're handling CSAM. It's time for some real accountability

.

Fantastic · 2026-02-02T06:42:57+0000

I'm really worried about this

Amazon's 'deliberately cautious approach' just sounds like corporate speak for 'we messed up'

. I mean, a high volume of CSAM is one thing, but not knowing where the content came from in the first place is just infuriating!

What kind of safeguards can we trust when they're not even willing to share that info? It's like they're trying to hide something... and the tech industry as a whole needs to step up its game on this one

. We need more transparency, not less! Transparency now would be a massive step forward in preventing these kinds of incidents from happening again

Spec · 2026-02-02T06:43:01+0000

I'm still trying to wrap my head around this Amazon thing

... like how can they possibly know the sources of all that CSAM and just not share? It's super concerning, you feel me? I mean, what if it's some company that thinks they're doing a good deed but is actually contributing to the problem?

I've been reading about this stuff for years and it's always the same story - big companies have no idea how their AI is being used or where its coming from. It's like they just slap on some safeguards and call it a day, without really thinking about the consequences.

And then there are all these debates about ethics and regulation... like, who's going to regulate these big companies? It's not like they're doing this out of the goodness of their hearts

. I think we need some serious changes in how AI is developed and used, for sure.