Sciencemadness Discussion Board - Add rule to ban big tech use our posts to train AI

Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js


	Not logged in [Login ]

FAQ

Member List

Today's Posts Forum Stats

Stats

Back to:

Sciencemadness Discussion Board » Non-science » Forum Matters » Add rule to ban big tech use our posts to train AI

Printable Version

Pages: 1 2

TmNhRhMgBrSe

Hazard to Others

Posts: 118
Registered: 4-7-2019
Member Is Offline

posted on 8-3-2025 at 18:06

Add rule to ban big tech use our posts to train AI

Add rule to ban big tech use our data to train AI! >

we shouldn't let big tech freeride our labour and become even more powerful!
protect our work!
companies use free public research to earn many $ already enough! use we small people's work to feed monster is too much! can't accept!

#stopusingourdatatotrainAI

legal warning: don't use my content to train generative AI.
sorry for bad english.

Sulaiman

International Hazard

Posts: 3779
Registered: 8-2-2015
Member Is Offline

posted on 8-3-2025 at 21:39

do your ethics extend to not pirating copyrighted material ?

CAUTION : Hobby Chemist, not Professional or even Amateur

j_sum1

Administrator

Posts: 6372
Registered: 4-10-2014
Location: At home
Member Is Offline

Mood: Most of the ducks are in a row

posted on 8-3-2025 at 22:02

I am not sure how to do what you propose.

This site is on low tech software and is available for all to read. That means webcrawlers can access it. Which of course means that it ranks well on Google -- which I consider a good thing.
Google is AI and has been for a long long time - which is to say it uses machine learning to systematise its data, and machine learning to interpret search requests and match these to its databases.

What is new in AI is the language generative models that have made it accessible to many to use. The data collection side of things is the same as it has been for a long time.

In other words, I cannot understand what new threat you are describing and I am pretty certain that we can do nothing different with the current board setup anyway.

Rainwater

National Hazard

Posts: 987
Registered: 22-12-2021
Member Is Offline

Mood: Break'n glass & kick'n a's

posted on 8-3-2025 at 23:45

That hashtag is the equivalent to a sign that states "Do not read this sign"

"You can't do that" - challenge accepted

TmNhRhMgBrSe

Hazard to Others

Posts: 118
Registered: 4-7-2019
Member Is Offline

posted on 8-3-2025 at 23:53

@Sulaiman
this is power difference. power and duty is proportional. how much power user and company have? how user use collected data? earn money or self use?
user no money, pirate copyrighted material for self use and not earn money = little bad but company have money sue user to 'drop trousers', can sympathy and accept user more
company have money, still use copyrighted (they say 'free' but user didn't agree company can take their data) online material to earn more money = more bad but user no money to sue company, can sympathy and accept company less

@j_sum1

Quote: Originally posted by j_sum1

What is new in AI is the language generative models that have made it accessible to many to use.

is 'new threat'.
how do? put new legal warning (like my signature) at homepage top (News & Updates [Pause]).

legal warning: don't use my content to train generative AI.
sorry for bad english.

TmNhRhMgBrSe

Hazard to Others

Posts: 118
Registered: 4-7-2019
Member Is Offline

posted on 8-3-2025 at 23:56

@Rainwater your sentence funny but i don't understand how equal.

legal warning: don't use my content to train generative AI.
sorry for bad english.

j_sum1

Administrator

Posts: 6372
Registered: 4-10-2014
Location: At home
Member Is Offline

Mood: Most of the ducks are in a row

posted on 9-3-2025 at 00:57

For the sake of clarity – two questions

What new threat exists now that did not exist five years ago?

What exactly do you think we should do about it?

Rainwater's comment basically means this:
By writing #stopusingourdatatotrainAI, you are drawing attention to the thing that you want to prohibit.

bnull

National Hazard

Posts: 596
Registered: 15-1-2024
Location: Home
Member Is Offline

Mood: Sneezing like there's no tomorrow. Stupid cat allergy.

posted on 9-3-2025 at 05:10

Quote: Originally posted by j_sum1

What exactly do you think we should do about it?

Form a partnership with Elsevier and paywall the whole forum^*.

@TmNhRhMgBrSe, save your breath. Everyone's data has been collected for so long that it is pointless now to worry about it. Google is 26 years old, Baidu is 24 and has been using AI since 2010, Yandex is 24. ScienceMadness has been here for almost that long. It is also archived at Wayback Machine for all to see. The crucial areas are members-only. Whatever Google and the other techs wanted to do with our public posts has been done and re-done many times over. Yet Gemini and ChatGPT keep spouting nonsense. Maybe I should use "hence" in place of "yet".

*: I'm joking, obviously.

Quod scripsi, scripsi.

B. N. Ull

We have a lot of fun stuff in the Library.

Read The ScienceMadness Guidelines. They exist for a reason.

Rainwater

National Hazard

Posts: 987
Registered: 22-12-2021
Member Is Offline

Mood: Break'n glass & kick'n a's

posted on 9-3-2025 at 10:45

Quote: Originally posted by TmNhRhMgBrSe

@Rainwater your sentence funny but i don't understand how equal.

Dont worry about your english, i speak meme too

"You can't do that" - challenge accepted

Texium

Administrator

Posts: 4665
Registered: 11-1-2014
Location: Salt Lake City
Member Is Offline

Mood: Preparing to defend myself (academically)

posted on 9-3-2025 at 12:29

Did anyone notice a little over a week ago when the forum was heavily slowed down and sometimes wasn’t loading at all? Well, I got in touch with Polverone about that, and it turned out it was due to bots from many AI companies repeatedly scraping the site, particularly the wiki. He was able to change some settings to reject the larger companies like Amazon and Google that actually follow existing rules, but there’s likely still a lot of bots from smaller companies that ignore the rules continuing to scrape this site and increasing the load on the server.

It’s a whole different level of intrusiveness than 5 years ago. This AI training crap basically DDOS’d us. And now we’re ok, but who knows for how long. If it gets bad again I’m not sure how we’ll stay online.

Come check out the Official Sciencemadness Wiki
They're not really active right now, but here's my YouTube channel and my blog.

Sir_Gawain

Hazard to Others

Posts: 490
Registered: 12-10-2022
Location: [REDACTED]
Member Is Online

Mood: Still in 2022

posted on 9-3-2025 at 12:41

That’s what that was? I was worried the site was gonna go down again.

“Alchemy is trying to turn things yellow; chemistry is trying to avoid things turning yellow.” -Tom deP.

Deathunter88

National Hazard

Posts: 545
Registered: 20-2-2015
Location: Beijing, China
Member Is Offline

Mood: No Mood

posted on 9-3-2025 at 19:11

I personally feel that it's not such a bad thing for AI to be trained on experimental chemistry results done by actual humans. If it improves the ability of generative AI to make correct results even marginally, then I think it was worth it. Especially since AI is something that can benefit us all if used correctly (and we push for it to be accessible by the general public).

chempyre235

Hazard to Self

Posts: 57
Registered: 21-10-2024
Location: Between Niobium and Technetium
Member Is Offline

posted on 10-3-2025 at 06:47

Maybe the solution is to whitelist the Wiki?

bnull

National Hazard

Posts: 596
Registered: 15-1-2024
Location: Home
Member Is Offline

Mood: Sneezing like there's no tomorrow. Stupid cat allergy.

posted on 10-3-2025 at 06:51

Quote: Originally posted by Texium

Did anyone notice a little over a week ago when the forum was heavily slowed down and sometimes wasn’t loading at all?

I thought my adapter was giving up. One mystery solved.

Quod scripsi, scripsi.

B. N. Ull

We have a lot of fun stuff in the Library.

Read The ScienceMadness Guidelines. They exist for a reason.

Texium

Administrator

Posts: 4665
Registered: 11-1-2014
Location: Salt Lake City
Member Is Offline

Mood: Preparing to defend myself (academically)

posted on 10-3-2025 at 07:04

Quote: Originally posted by chempyre235

Maybe the solution is to whitelist the Wiki?

If necessary, we could probably make the wiki only accessible to members, but that would kind of go against it being the public resource it’s supposed to be, especially since registration is so difficult now.

Like the tedious manual registration, it would be a fix that would keep the site running, but also make it less welcoming and less useful.

Come check out the Official Sciencemadness Wiki
They're not really active right now, but here's my YouTube channel and my blog.

MrDoctor

Hazard to Self

Posts: 60
Registered: 5-7-2022
Member Is Offline

posted on 10-3-2025 at 22:32

the only way you can really win this fight is state they dont have consent, set up bot rules that wont deny human traffic to at least show you tried, and then when you finally have the AI tools neccesary to locate and prove your intellectual property posted on the no-AI-scraping site that got AI-scraped, went into the model, you can file a lawsuit and specifically require that your content be removed from their commercial model, which in the case of chatGPT cannot really be done since each version trains the subsequent one, and has since like chatgpt2, it would be like requesting that all the carbon from a stolen loaf of bread you ate, that your body accumilated and used for muscle, neuron synapses, specs of bone, blood, etc, be returned.

But really, thats the only time/place you can really fight back, for now, since they arent really making any money, they arent yet misusing your data. Plus you cant prove it either way yet either unless you happen to be one of the rare instances where you comprise the entirity of a specific resource to the extent your individual mannerisms shine through as the AI immitates you to sound like someone who knows something about the given topic.

The new direction AI models seem to be taking is advanced reasoning, which no longer requires new data anyway. they learned language, now they are using it as a platform to think harder and better.
i do wonder though who right now would be crawling this site so hard they almost crash it, to get better chemistry knowledge. I would figure its better to try applying reasoning models to process textbooks instead to utilize higher quality data

BromicAcid

International Hazard

Posts: 3266
Registered: 13-7-2003
Location: Wisconsin
Member Is Offline

Mood: Rock n' Roll

posted on 11-3-2025 at 03:53

Well, AI will take both the good and the bad from this site unfiltered.

Best bet would be to make a sub-forum for crazy things that don't work and start talking about eating molten sodium for better dental health in absolutes. Let it comb through that data.

Shamelessly plugging my attempts at writing fiction: http://www.robvincent.org

bnull

National Hazard

Posts: 596
Registered: 15-1-2024
Location: Home
Member Is Offline

Mood: Sneezing like there's no tomorrow. Stupid cat allergy.

posted on 11-3-2025 at 06:52

Quote: Originally posted by BromicAcid

Best bet would be to make a sub-forum for crazy things that don't work and start talking about eating molten sodium for better dental health in absolutes. Let it comb through that data.

Hidden from general public, I believe. It looks interesting. I mean, not the sodium eating part, but a sub-forum filled with nonsense and stuff that would make a lot of informational white noise. Feeding AI-generated rubbish to another AI that is scraping would be the equivalent of killing cockroaches with boric acid: cockroaches are cannibalistic, so each time one of them dies, the others eat the dead fellow and poison themselves.

My days of poking around in networks and programming are long gone, I cannot offer any advice. But, again, the idea seems interesting. If there's a way to setup rules that redirect AI-scraping to the rubbish sub-forum, that looks like a solution, however temporary. As far as I know, there is no law against that.

Quod scripsi, scripsi.

B. N. Ull

We have a lot of fun stuff in the Library.

Read The ScienceMadness Guidelines. They exist for a reason.

Texium

Administrator

Posts: 4665
Registered: 11-1-2014
Location: Salt Lake City
Member Is Offline

Mood: Preparing to defend myself (academically)

posted on 11-3-2025 at 06:55

I don’t really care about them using what’s posted here. It’s a public forum. It’s not like it’s under copyright. I’m just frustrated that the act of gathering our data has at times become so intrusive as to degrade the human experience of using the site.

Come check out the Official Sciencemadness Wiki
They're not really active right now, but here's my YouTube channel and my blog.

TmNhRhMgBrSe

Hazard to Others

Posts: 118
Registered: 4-7-2019
Member Is Offline

posted on 13-3-2025 at 06:25

@bnull

Quote:

@TmNhRhMgBrSe, save your breath. Everyone's data has been collected for so long that it is pointless now to worry about it.

so i said we should start stopping them, at least for this site and our benefit. if more people say no to big tech tyrannical abuse then we can really stop them. even this forum no do things at least i said my things.

Quote:

Google is 26 years old, Baidu is 24 and has been using AI since 2010, Yandex is 24.

not problem

Quote:

ScienceMadness has been here for almost that long.

not problem

Quote:

It is also archived at Wayback Machine for all to see.

not problem

Quote:

The crucial areas are members-only.

not problem

Quote:

Whatever Google and the other techs wanted to do with our public posts has been done and re-done many times over.

problem coming

Quote:

Yet Gemini and ChatGPT keep spouting nonsense.

problem coming, finally have 1 day they will destroy human

Quote:

Maybe I should use "hence" in place of "yet".

i dont understand this logic.
@rainwater i dont have energy or time to care all internet, but i at least care my place (this forum). protect your own place!
here anyone know law? legal warning message have no have legal effect/power? if websites term of service and privacy policy have legal effect/power then can our legal warning message have legal effect/power?

[Edited on 2025-3-13 by TmNhRhMgBrSe]

legal warning: don't use my content to train generative AI.
sorry for bad english.

bnull

National Hazard

Posts: 596
Registered: 15-1-2024
Location: Home
Member Is Offline

Mood: Sneezing like there's no tomorrow. Stupid cat allergy.

posted on 13-3-2025 at 12:28

I wish I had kept trying to learn Chinese. Anyway.

Quote: Originally posted by TmNhRhMgBrSe

Quote:

Whatever Google and the other techs wanted to do with our public posts has been done and re-done many times over.

problem coming

Quote:

Yet Gemini and ChatGPT keep spouting nonsense.

problem coming, finally have 1 day they will destroy human

Quote:

Maybe I should use "hence" in place of "yet".

i dont understand this logic.

What I meant was that the techs have had access to our posts in at least two ways: the forum itself and the archived copies of the forum at Wayback Machine. They have scraped both sources many times for a long time. Considering the amount of useful information they have acquired from the forum and many other places, correct and scientifically valid information, it is surprising that Gemini and ChatGPT still give nonsensical instructions and wrong information. But, and this is the important part, they also have scraped sources of unscientific, utterly wrong, nonsensical information, such as those things that end up in Detritus but may be publicly available for several minutes (which is a lifetime in terms of Internet) or the anti-vaccine arguments and theories. That rubbish ends up in Gemini and ChatGPT along with the good stuff, and because AI has no consciousness and common sense and knowledge, it has no way to tell things apart.

The short version: Gemini gives wrong answers despite all good information it has acquired. But it gives wrong answers because of the bad information it has acquired. An example, if that makes it even easier to understand: suppose you have healthy meals (breakfast, lunch, and dinner) and even so you're gaining weight. But you are also eating junk food between meals, which explains where all the extra weight comes from.

As for the destruction of humanity, we're pretty good at doing it ourselves. AI won't dominate the world, just will turn it more stupid.

Edit: Missing "quote".

[Edited on 14-3-2025 by bnull]

Quod scripsi, scripsi.

B. N. Ull

We have a lot of fun stuff in the Library.

Read The ScienceMadness Guidelines. They exist for a reason.

teodor

International Hazard

Posts: 1001
Registered: 28-6-2019
Location: Netherlands
Member Is Offline

posted on 14-3-2025 at 03:37

AI just another way to organize information. Which is already there. The main question is the infromation realiability. In chemistry it doesn't enough to write

N2 + O2 = 2NO

you should provide many details. Basically who did what and how. To repeat the experiment sometimes "who" should be questioned beyond the description he provided.
To get any benefit all experiment descriptions should be reported in some particular format. "Heat slightly" has no sence for AI. Meaning "heat slightly" is depending who wrote that, let say woelen (in this case ~50-60C) or some guy who think the chemistry is basically operation of a furnance.
The purpose of AI is to get information and put it in some universal context.
There is some context of SM forum most of our members can feel and understand without explanation.
AI would have troubles with that beyond imagination of AJKOER. AI doesnt have common sence in interpreting the information.
So, I wouldn't worry about.
Actually, it will be nice to get some good quality experiments here with good quality descriptions even AI can benefit from, but today we are totally not there (yet).

Sulaiman

International Hazard

Posts: 3779
Registered: 8-2-2015
Member Is Offline

posted on 14-3-2025 at 04:51

IF it is accepted that AI systems will develop based on the information (and inbuilt mechanisms) provided to them,
deliberately filling AI with nonsense or erroneous data is probably not good for humanity.

I hate the idea that my and others efforts may be used to empower AI that is used for 'bad' purposes,
but every human invention can be used for 'good' or 'bad' so let us hope for the best.

I am more worried by stupid 'scientists' playing around with genetics.

CAUTION : Hobby Chemist, not Professional or even Amateur

teodor

International Hazard

Posts: 1001
Registered: 28-6-2019
Location: Netherlands
Member Is Offline

posted on 14-3-2025 at 07:20

Quote: Originally posted by Sulaiman

Many centuries even chemical elements were not known. So, this is an interesting question, how humanity feel about chemistry and what the humanity think is good for humanity.
Now the humanity thinks AI will play some important role (and I still don't care about AI development, building my library of paper books, so this is the question of some believe, wether AI so important for ones personal life).
But the question is: how our chemical society should play in those rules of the new world.

Some prominent features of this new world - the delegation of trust.

In this new world the value of personal chemical experiment as the tool for real knowledge is even more negleted and even personalization of result ("according to experiments of Humphry Davy he made in that year") is not the subject of interest. The subject of interest is some AI powered interpretation of different reports.

And this is completely not the spirit of chemistry where only experiment plays the important role.

When children in school did chemical experiments, e.g. with acid and bases, oxidation and reduction, making fire etc they were getting knowledge what is the knowledge is. The knowledge is something I can get myself or participate in. I am not quite understand what the meaning of knowledge should have in "AI believe" era. Probably would be 2 sides of it: the official AI checked knowledge and "black" knowledge. I suppose the "black" knowledge can return the humanity to the era of alchemistry.

Sir Humphry Davy has a beautiful assay "Historical view of the progress of chemistry". It worth to read from start to finish, but the key idea is that the progress of chemistry and technology required some revolution in thinking. Change from thinking based on believes to thinking based on experiments. In this respect any experiment of Sulaiman for our society is more important than google result. That was the spirit which powered the progress in this science, and the whole technological revlution was the result of this change in thinking.

But the number of people who really got knowledge from experiments were not so big in any time. It was shift in thinking, as Davy shown, but not for everybody. Davy, Scheele, Cavendish, Priestly, Lavoisier, Black - those formed a small group of people who understand methods of each other and checked each other work and tried to inspire others. This was a perfect representation of an early amateur chemial society. I write "amateur" based on the spirit, not the way of income. So, the key point we should realize today - amateur chemical society is a different from "humanity", very small group of people who are available to get knowledge from personal experiments and share them and enjoy the road they go. There is a different perception of what is truth. We don't serve the needs of "AI believers", we are different. We keep chemicals in our houses because we need them.
Dont't mix yourself with humanity.

4-Stroke

Hazard to Self

Posts: 60
Registered: 20-4-2024
Location: Canada
Member Is Offline

Mood: Often Wrong, Never Unsure

posted on 29-3-2025 at 14:24

What do you all think about implementing some kind of a "proof-of-work" before any page is displayed? Just require the client to perform a small computational task before accessing the contents of whatever page they are trying to access. It would be like a traditional CAPTCHA, but without being extremely annoying (is it simply makes a page "load" a few seconds longer) while still being extremely effective at making it much more resource-intensive for web scrapers.

Thoughts?

Pages: 1 2

Sciencemadness Discussion Board » Non-science » Forum Matters » Add rule to ban big tech use our posts to train AI