Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js
Sciencemadness Discussion Board
Not logged in [Login ]
Go To Bottom

Printable Version  
 Pages:  1  
Author: Subject: Add rule to ban big tech use our posts to train AI
TmNhRhMgBrSe
Hazard to Others
***




Posts: 118
Registered: 4-7-2019
Member Is Offline


mad.gif posted on 8-3-2025 at 18:06
Add rule to ban big tech use our posts to train AI


Add rule to ban big tech use our data to train AI! >:(
we shouldn't let big tech freeride our labour and become even more powerful!
protect our work!
companies use free public research to earn many $ already enough! use we small people's work to feed monster is too much! can't accept!

#stopusingourdatatotrainAI




legal warning: don't use my content to train generative AI.
sorry for bad english.
View user's profile View All Posts By User
Sulaiman
International Hazard
*****




Posts: 3779
Registered: 8-2-2015
Member Is Offline


[*] posted on 8-3-2025 at 21:39


do your ethics extend to not pirating copyrighted material ?



CAUTION : Hobby Chemist, not Professional or even Amateur
View user's profile View All Posts By User
j_sum1
Administrator
********




Posts: 6372
Registered: 4-10-2014
Location: At home
Member Is Offline

Mood: Most of the ducks are in a row

[*] posted on 8-3-2025 at 22:02


I am not sure how to do what you propose.

This site is on low tech software and is available for all to read. That means webcrawlers can access it. Which of course means that it ranks well on Google -- which I consider a good thing.
Google is AI and has been for a long long time - which is to say it uses machine learning to systematise its data, and machine learning to interpret search requests and match these to its databases.

What is new in AI is the language generative models that have made it accessible to many to use. The data collection side of things is the same as it has been for a long time.

In other words, I cannot understand what new threat you are describing and I am pretty certain that we can do nothing different with the current board setup anyway.
View user's profile View All Posts By User
Rainwater
National Hazard
****




Posts: 987
Registered: 22-12-2021
Member Is Offline

Mood: Break'n glass & kick'n a's

[*] posted on 8-3-2025 at 23:45


That hashtag is the equivalent to a sign that states "Do not read this sign"



"You can't do that" - challenge accepted
View user's profile View All Posts By User
TmNhRhMgBrSe
Hazard to Others
***




Posts: 118
Registered: 4-7-2019
Member Is Offline


[*] posted on 8-3-2025 at 23:53


@Sulaiman
this is power difference. power and duty is proportional. how much power user and company have? how user use collected data? earn money or self use?
user no money, pirate copyrighted material for self use and not earn money = little bad but company have money sue user to 'drop trousers', can sympathy and accept user more
company have money, still use copyrighted (they say 'free' but user didn't agree company can take their data) online material to earn more money = more bad but user no money to sue company, can sympathy and accept company less

@j_sum1
Quote: Originally posted by j_sum1  
What is new in AI is the language generative models that have made it accessible to many to use.
is 'new threat'.
how do? put new legal warning (like my signature) at homepage top (News & Updates [Pause]).




legal warning: don't use my content to train generative AI.
sorry for bad english.
View user's profile View All Posts By User
TmNhRhMgBrSe
Hazard to Others
***




Posts: 118
Registered: 4-7-2019
Member Is Offline


[*] posted on 8-3-2025 at 23:56


@Rainwater your sentence funny but i don't understand how equal.



legal warning: don't use my content to train generative AI.
sorry for bad english.
View user's profile View All Posts By User
j_sum1
Administrator
********




Posts: 6372
Registered: 4-10-2014
Location: At home
Member Is Offline

Mood: Most of the ducks are in a row

[*] posted on 9-3-2025 at 00:57


For the sake of clarity – two questions
  • What new threat exists now that did not exist five years ago?
  • What exactly do you think we should do about it?

    Rainwater's comment basically means this:
    By writing #stopusingourdatatotrainAI, you are drawing attention to the thing that you want to prohibit.
  • View user's profile View All Posts By User
    bnull
    National Hazard
    ****




    Posts: 596
    Registered: 15-1-2024
    Location: Home
    Member Is Offline

    Mood: Sneezing like there's no tomorrow. Stupid cat allergy.

    [*] posted on 9-3-2025 at 05:10


    Quote: Originally posted by j_sum1  
  • What exactly do you think we should do about it?

  • Form a partnership with Elsevier and paywall the whole forum*. :P

    @TmNhRhMgBrSe, save your breath. Everyone's data has been collected for so long that it is pointless now to worry about it. Google is 26 years old, Baidu is 24 and has been using AI since 2010, Yandex is 24. ScienceMadness has been here for almost that long. It is also archived at Wayback Machine for all to see. The crucial areas are members-only. Whatever Google and the other techs wanted to do with our public posts has been done and re-done many times over. Yet Gemini and ChatGPT keep spouting nonsense. Maybe I should use "hence" in place of "yet".

    *: I'm joking, obviously.




    Quod scripsi, scripsi.

    B. N. Ull

    We have a lot of fun stuff in the Library.

    Read The ScienceMadness Guidelines. They exist for a reason.
    View user's profile View All Posts By User
    Rainwater
    National Hazard
    ****




    Posts: 987
    Registered: 22-12-2021
    Member Is Offline

    Mood: Break'n glass & kick'n a's

    [*] posted on 9-3-2025 at 10:45


    Quote: Originally posted by TmNhRhMgBrSe  
    @Rainwater your sentence funny but i don't understand how equal.

    Dont worry about your english, i speak meme too
    images.jpeg - 22kB

    images-1.jpeg - 45kB




    "You can't do that" - challenge accepted
    View user's profile View All Posts By User
    Texium
    Administrator
    ********




    Posts: 4665
    Registered: 11-1-2014
    Location: Salt Lake City
    Member Is Offline

    Mood: Preparing to defend myself (academically)

    [*] posted on 9-3-2025 at 12:29


    Did anyone notice a little over a week ago when the forum was heavily slowed down and sometimes wasn’t loading at all? Well, I got in touch with Polverone about that, and it turned out it was due to bots from many AI companies repeatedly scraping the site, particularly the wiki. He was able to change some settings to reject the larger companies like Amazon and Google that actually follow existing rules, but there’s likely still a lot of bots from smaller companies that ignore the rules continuing to scrape this site and increasing the load on the server.

    It’s a whole different level of intrusiveness than 5 years ago. This AI training crap basically DDOS’d us. And now we’re ok, but who knows for how long. If it gets bad again I’m not sure how we’ll stay online.




    Come check out the Official Sciencemadness Wiki
    They're not really active right now, but here's my YouTube channel and my blog.
    View user's profile Visit user's homepage View All Posts By User
    Sir_Gawain
    Hazard to Others
    ***




    Posts: 490
    Registered: 12-10-2022
    Location: [REDACTED]
    Member Is Online

    Mood: Still in 2022

    [*] posted on 9-3-2025 at 12:41


    That’s what that was? I was worried the site was gonna go down again.



    “Alchemy is trying to turn things yellow; chemistry is trying to avoid things turning yellow.” -Tom deP.
    View user's profile Visit user's homepage View All Posts By User
    Deathunter88
    National Hazard
    ****




    Posts: 545
    Registered: 20-2-2015
    Location: Beijing, China
    Member Is Offline

    Mood: No Mood

    [*] posted on 9-3-2025 at 19:11


    I personally feel that it's not such a bad thing for AI to be trained on experimental chemistry results done by actual humans. If it improves the ability of generative AI to make correct results even marginally, then I think it was worth it. Especially since AI is something that can benefit us all if used correctly (and we push for it to be accessible by the general public).
    View user's profile View All Posts By User
    chempyre235
    Hazard to Self
    **




    Posts: 57
    Registered: 21-10-2024
    Location: Between Niobium and Technetium
    Member Is Offline


    [*] posted on 10-3-2025 at 06:47


    Maybe the solution is to whitelist the Wiki?
    View user's profile View All Posts By User
    bnull
    National Hazard
    ****




    Posts: 596
    Registered: 15-1-2024
    Location: Home
    Member Is Offline

    Mood: Sneezing like there's no tomorrow. Stupid cat allergy.

    [*] posted on 10-3-2025 at 06:51


    Quote: Originally posted by Texium  
    Did anyone notice a little over a week ago when the forum was heavily slowed down and sometimes wasn’t loading at all?

    I thought my adapter was giving up. One mystery solved.




    Quod scripsi, scripsi.

    B. N. Ull

    We have a lot of fun stuff in the Library.

    Read The ScienceMadness Guidelines. They exist for a reason.
    View user's profile View All Posts By User
    Texium
    Administrator
    ********




    Posts: 4665
    Registered: 11-1-2014
    Location: Salt Lake City
    Member Is Offline

    Mood: Preparing to defend myself (academically)

    [*] posted on 10-3-2025 at 07:04


    Quote: Originally posted by chempyre235  
    Maybe the solution is to whitelist the Wiki?
    If necessary, we could probably make the wiki only accessible to members, but that would kind of go against it being the public resource it’s supposed to be, especially since registration is so difficult now.

    Like the tedious manual registration, it would be a fix that would keep the site running, but also make it less welcoming and less useful.




    Come check out the Official Sciencemadness Wiki
    They're not really active right now, but here's my YouTube channel and my blog.
    View user's profile Visit user's homepage View All Posts By User
    MrDoctor
    Hazard to Self
    **




    Posts: 60
    Registered: 5-7-2022
    Member Is Offline


    [*] posted on 10-3-2025 at 22:32


    the only way you can really win this fight is state they dont have consent, set up bot rules that wont deny human traffic to at least show you tried, and then when you finally have the AI tools neccesary to locate and prove your intellectual property posted on the no-AI-scraping site that got AI-scraped, went into the model, you can file a lawsuit and specifically require that your content be removed from their commercial model, which in the case of chatGPT cannot really be done since each version trains the subsequent one, and has since like chatgpt2, it would be like requesting that all the carbon from a stolen loaf of bread you ate, that your body accumilated and used for muscle, neuron synapses, specs of bone, blood, etc, be returned.

    But really, thats the only time/place you can really fight back, for now, since they arent really making any money, they arent yet misusing your data. Plus you cant prove it either way yet either unless you happen to be one of the rare instances where you comprise the entirity of a specific resource to the extent your individual mannerisms shine through as the AI immitates you to sound like someone who knows something about the given topic.

    The new direction AI models seem to be taking is advanced reasoning, which no longer requires new data anyway. they learned language, now they are using it as a platform to think harder and better.
    i do wonder though who right now would be crawling this site so hard they almost crash it, to get better chemistry knowledge. I would figure its better to try applying reasoning models to process textbooks instead to utilize higher quality data
    View user's profile View All Posts By User
    BromicAcid
    International Hazard
    *****




    Posts: 3266
    Registered: 13-7-2003
    Location: Wisconsin
    Member Is Offline

    Mood: Rock n' Roll

    [*] posted on 11-3-2025 at 03:53


    Well, AI will take both the good and the bad from this site unfiltered.

    Best bet would be to make a sub-forum for crazy things that don't work and start talking about eating molten sodium for better dental health in absolutes. Let it comb through that data.




    Shamelessly plugging my attempts at writing fiction: http://www.robvincent.org
    View user's profile Visit user's homepage View All Posts By User
    bnull
    National Hazard
    ****




    Posts: 596
    Registered: 15-1-2024
    Location: Home
    Member Is Offline

    Mood: Sneezing like there's no tomorrow. Stupid cat allergy.

    [*] posted on 11-3-2025 at 06:52


    Quote: Originally posted by BromicAcid  
    Best bet would be to make a sub-forum for crazy things that don't work and start talking about eating molten sodium for better dental health in absolutes. Let it comb through that data.

    Hidden from general public, I believe. It looks interesting. I mean, not the sodium eating part, but a sub-forum filled with nonsense and stuff that would make a lot of informational white noise. Feeding AI-generated rubbish to another AI that is scraping would be the equivalent of killing cockroaches with boric acid: cockroaches are cannibalistic, so each time one of them dies, the others eat the dead fellow and poison themselves.

    My days of poking around in networks and programming are long gone, I cannot offer any advice. But, again, the idea seems interesting. If there's a way to setup rules that redirect AI-scraping to the rubbish sub-forum, that looks like a solution, however temporary. As far as I know, there is no law against that.




    Quod scripsi, scripsi.

    B. N. Ull

    We have a lot of fun stuff in the Library.

    Read The ScienceMadness Guidelines. They exist for a reason.
    View user's profile View All Posts By User
    Texium
    Administrator
    ********




    Posts: 4665
    Registered: 11-1-2014
    Location: Salt Lake City
    Member Is Offline

    Mood: Preparing to defend myself (academically)

    [*] posted on 11-3-2025 at 06:55


    I don’t really care about them using what’s posted here. It’s a public forum. It’s not like it’s under copyright. I’m just frustrated that the act of gathering our data has at times become so intrusive as to degrade the human experience of using the site.



    Come check out the Official Sciencemadness Wiki
    They're not really active right now, but here's my YouTube channel and my blog.
    View user's profile Visit user's homepage View All Posts By User
    TmNhRhMgBrSe
    Hazard to Others
    ***




    Posts: 118
    Registered: 4-7-2019
    Member Is Offline


    [*] posted on 13-3-2025 at 06:25


    @bnull
    Quote:
    @TmNhRhMgBrSe, save your breath. Everyone's data has been collected for so long that it is pointless now to worry about it.
    so i said we should start stopping them, at least for this site and our benefit. if more people say no to big tech tyrannical abuse then we can really stop them. even this forum no do things at least i said my things.
    Quote:
    Google is 26 years old, Baidu is 24 and has been using AI since 2010, Yandex is 24.
    not problem
    Quote:
    ScienceMadness has been here for almost that long.
    not problem
    Quote:
    It is also archived at Wayback Machine for all to see.
    not problem
    Quote:
    The crucial areas are members-only.
    not problem
    Quote:
    Whatever Google and the other techs wanted to do with our public posts has been done and re-done many times over.
    problem coming
    Quote:
    Yet Gemini and ChatGPT keep spouting nonsense.
    problem coming, finally have 1 day they will destroy human
    Quote:
    Maybe I should use "hence" in place of "yet".
    i dont understand this logic.
    @rainwater i dont have energy or time to care all internet, but i at least care my place (this forum). protect your own place!
    here anyone know law? legal warning message have no have legal effect/power? if websites term of service and privacy policy have legal effect/power then can our legal warning message have legal effect/power?

    protect your place.png - 233kB

    [Edited on 2025-3-13 by TmNhRhMgBrSe]




    legal warning: don't use my content to train generative AI.
    sorry for bad english.
    View user's profile View All Posts By User
    bnull
    National Hazard
    ****




    Posts: 596
    Registered: 15-1-2024
    Location: Home
    Member Is Offline

    Mood: Sneezing like there's no tomorrow. Stupid cat allergy.

    [*] posted on 13-3-2025 at 12:28


    I wish I had kept trying to learn Chinese. Anyway.

    Quote: Originally posted by TmNhRhMgBrSe  
    Quote:
    Whatever Google and the other techs wanted to do with our public posts has been done and re-done many times over.
    problem coming
    Quote:
    Yet Gemini and ChatGPT keep spouting nonsense.
    problem coming, finally have 1 day they will destroy human
    Quote:
    Maybe I should use "hence" in place of "yet".
    i dont understand this logic.

    What I meant was that the techs have had access to our posts in at least two ways: the forum itself and the archived copies of the forum at Wayback Machine. They have scraped both sources many times for a long time. Considering the amount of useful information they have acquired from the forum and many other places, correct and scientifically valid information, it is surprising that Gemini and ChatGPT still give nonsensical instructions and wrong information. But, and this is the important part, they also have scraped sources of unscientific, utterly wrong, nonsensical information, such as those things that end up in Detritus but may be publicly available for several minutes (which is a lifetime in terms of Internet) or the anti-vaccine arguments and theories. That rubbish ends up in Gemini and ChatGPT along with the good stuff, and because AI has no consciousness and common sense and knowledge, it has no way to tell things apart.

    The short version: Gemini gives wrong answers despite all good information it has acquired. But it gives wrong answers because of the bad information it has acquired. An example, if that makes it even easier to understand: suppose you have healthy meals (breakfast, lunch, and dinner) and even so you're gaining weight. But you are also eating junk food between meals, which explains where all the extra weight comes from.

    As for the destruction of humanity, we're pretty good at doing it ourselves. AI won't dominate the world, just will turn it more stupid.

    Edit: Missing "quote".

    [Edited on 14-3-2025 by bnull]




    Quod scripsi, scripsi.

    B. N. Ull

    We have a lot of fun stuff in the Library.

    Read The ScienceMadness Guidelines. They exist for a reason.
    View user's profile View All Posts By User
    teodor
    International Hazard
    *****




    Posts: 1001
    Registered: 28-6-2019
    Location: Netherlands
    Member Is Offline


    [*] posted on 14-3-2025 at 03:37


    AI just another way to organize information. Which is already there. The main question is the infromation realiability. In chemistry it doesn't enough to write

    N2 + O2 = 2NO

    you should provide many details. Basically who did what and how. To repeat the experiment sometimes "who" should be questioned beyond the description he provided.
    To get any benefit all experiment descriptions should be reported in some particular format. "Heat slightly" has no sence for AI. Meaning "heat slightly" is depending who wrote that, let say woelen (in this case ~50-60C) or some guy who think the chemistry is basically operation of a furnance.
    The purpose of AI is to get information and put it in some universal context.
    There is some context of SM forum most of our members can feel and understand without explanation.
    AI would have troubles with that beyond imagination of AJKOER. AI doesnt have common sence in interpreting the information.
    So, I wouldn't worry about.
    Actually, it will be nice to get some good quality experiments here with good quality descriptions even AI can benefit from, but today we are totally not there (yet).
    View user's profile View All Posts By User
    Sulaiman
    International Hazard
    *****




    Posts: 3779
    Registered: 8-2-2015
    Member Is Offline


    [*] posted on 14-3-2025 at 04:51


    IF it is accepted that AI systems will develop based on the information (and inbuilt mechanisms) provided to them,
    deliberately filling AI with nonsense or erroneous data is probably not good for humanity.

    I hate the idea that my and others efforts may be used to empower AI that is used for 'bad' purposes,
    but every human invention can be used for 'good' or 'bad' so let us hope for the best.

    I am more worried by stupid 'scientists' playing around with genetics.




    CAUTION : Hobby Chemist, not Professional or even Amateur
    View user's profile View All Posts By User
    teodor
    International Hazard
    *****




    Posts: 1001
    Registered: 28-6-2019
    Location: Netherlands
    Member Is Offline


    [*] posted on 14-3-2025 at 07:20


    Quote: Originally posted by Sulaiman  
    IF it is accepted that AI systems will develop based on the information (and inbuilt mechanisms) provided to them,
    deliberately filling AI with nonsense or erroneous data is probably not good for humanity.


    Many centuries even chemical elements were not known. So, this is an interesting question, how humanity feel about chemistry and what the humanity think is good for humanity.
    Now the humanity thinks AI will play some important role (and I still don't care about AI development, building my library of paper books, so this is the question of some believe, wether AI so important for ones personal life).
    But the question is: how our chemical society should play in those rules of the new world.

    Some prominent features of this new world - the delegation of trust.

    In this new world the value of personal chemical experiment as the tool for real knowledge is even more negleted and even personalization of result ("according to experiments of Humphry Davy he made in that year") is not the subject of interest. The subject of interest is some AI powered interpretation of different reports.

    And this is completely not the spirit of chemistry where only experiment plays the important role.

    When children in school did chemical experiments, e.g. with acid and bases, oxidation and reduction, making fire etc they were getting knowledge what is the knowledge is. The knowledge is something I can get myself or participate in. I am not quite understand what the meaning of knowledge should have in "AI believe" era. Probably would be 2 sides of it: the official AI checked knowledge and "black" knowledge. I suppose the "black" knowledge can return the humanity to the era of alchemistry.

    Sir Humphry Davy has a beautiful assay "Historical view of the progress of chemistry". It worth to read from start to finish, but the key idea is that the progress of chemistry and technology required some revolution in thinking. Change from thinking based on believes to thinking based on experiments. In this respect any experiment of Sulaiman for our society is more important than google result. That was the spirit which powered the progress in this science, and the whole technological revlution was the result of this change in thinking.

    But the number of people who really got knowledge from experiments were not so big in any time. It was shift in thinking, as Davy shown, but not for everybody. Davy, Scheele, Cavendish, Priestly, Lavoisier, Black - those formed a small group of people who understand methods of each other and checked each other work and tried to inspire others. This was a perfect representation of an early amateur chemial society. I write "amateur" based on the spirit, not the way of income. So, the key point we should realize today - amateur chemical society is a different from "humanity", very small group of people who are available to get knowledge from personal experiments and share them and enjoy the road they go. There is a different perception of what is truth. We don't serve the needs of "AI believers", we are different. We keep chemicals in our houses because we need them.
    Dont't mix yourself with humanity.
    View user's profile View All Posts By User
    4-Stroke
    Hazard to Self
    **




    Posts: 60
    Registered: 20-4-2024
    Location: Canada
    Member Is Offline

    Mood: Often Wrong, Never Unsure

    [*] posted on 29-3-2025 at 14:24


    What do you all think about implementing some kind of a "proof-of-work" before any page is displayed? Just require the client to perform a small computational task before accessing the contents of whatever page they are trying to access. It would be like a traditional CAPTCHA, but without being extremely annoying (is it simply makes a page "load" a few seconds longer) while still being extremely effective at making it much more resource-intensive for web scrapers.

    Thoughts?
    View user's profile View All Posts By User
     Pages:  1  

      Go To Top