Ubya
International Hazard
Posts: 1247
Registered: 23-11-2017
Location: Rome-Italy
Member Is Offline
Mood: I'm a maddo scientisto!!!
|
|
anti "spam-bot" Bot?
This board always had some spam posts, but I think many of us remember the mess that was 2019 and the daily floods that the whole community had to
manage.
i'm bringing up the issue again simply because i think sending an email to an admin to be manually registered by them is quite an inefficient process.
It worked for nearly 2 years, but since i've been neck deep into automating stuff i guessed i could drop my 2 cents.
I have 2 ideas.
The first idea is to simply write a script that checks for new posts/topics and deletes them if they are spam, and eventually bans the users that
wrote them after 3 offenses (or any value really).
This would be the same bot Melgar did at the time.
The second idea is to actually keep the current system, but automating it. New users could send an email request to a specific email, and a script
would read, evaluate, create the account and then sending them the details via email.
I personally played a lot with web scraping (i have more data than i need on stuff i shouldn't have) and bots (mostly discord but integrated on a wide
range on things like woocommerce, emails, minecraft, image generation and more), so in theory these 2 ideas are already doable.
For old time lurkers creating an account with the current way probably isn't too much of an hassle, and the fact that registering now isn't a 2 minute
process tuned down a lot of people that would join just to get their homework done by us, so in a sense we have a lower number of registrations but
they are of higher value, but at the same time i don't like make the whole thing harder than it has to be.
By personal experience if i had to register now i'd be quite put off by this process
what do you guys think?
oh and a question for staff, what's the ratio of valid users you register vs users you decide aren't worth registering?
---------------------------------------------------------------------
feel free to correct my grammar, or any mistakes i make
---------------------------------------------------------------------
|
|
Oxy
Hazard to Others
Posts: 140
Registered: 1-12-2020
Member Is Offline
|
|
Quote: Originally posted by Ubya |
The first idea is to simply write a script that checks for new posts/topics and deletes them if they are spam, and eventually bans the users that
wrote them after 3 offenses (or any value really).
This would be the same bot Melgar did at the time.
|
In general this is simple solution that will work. There is one problem however - how would you classify the post content as spam? We may use some
simple algorithm but probably will not be a robust one. We may use some more sophisticated spam detection techniques but it may require some
additional resources which I don't know if we could use as we might need to pay for it.
Quote: Originally posted by Ubya |
The second idea is to actually keep the current system, but automating it. New users could send an email request to a specific email, and a script
would read, evaluate, create the account and then sending them the details via email.
|
Yes, this is simple. But I suspect that human reading is actually a potential spam protection mechanism as one who reads the mail can guess if email
address and content look legit.
|
|
Tsjerk
International Hazard
Posts: 3032
Registered: 20-4-2005
Location: Netherlands
Member Is Offline
Mood: Mood
|
|
There is a much, much simpler solution. There has been an update for XMB board 1.9.11, namely 1.9.12. Yes, 11 years after the last update some bugs
were fixed and most importantly, reCaptcha was implemented in the registration process.
I've been saying this for over a year now. Polverone doesn't read his messages.
http://www.sciencemadness.org/talk/viewthread.php?tid=18651&...
|
|
Ubya
International Hazard
Posts: 1247
Registered: 23-11-2017
Location: Rome-Italy
Member Is Offline
Mood: I'm a maddo scientisto!!!
|
|
Yeah this would solve the issue at its core, even though reCaptcha has been cracked, it is way better than having nothing
---------------------------------------------------------------------
feel free to correct my grammar, or any mistakes i make
---------------------------------------------------------------------
|
|
Ubya
International Hazard
Posts: 1247
Registered: 23-11-2017
Location: Rome-Italy
Member Is Offline
Mood: I'm a maddo scientisto!!!
|
|
Quote: Originally posted by Oxy | Quote: Originally posted by Ubya |
The first idea is to simply write a script that checks for new posts/topics and deletes them if they are spam, and eventually bans the users that
wrote them after 3 offenses (or any value really).
This would be the same bot Melgar did at the time.
|
In general this is simple solution that will work. There is one problem however - how would you classify the post content as spam? We may use some
simple algorithm but probably will not be a robust one. We may use some more sophisticated spam detection techniques but it may require some
additional resources which I don't know if we could use as we might need to pay for it.
|
Bots create posts using some rules, once you know them you can detect spam posts pretty easily.
At the time for example a lot of spam posts were in russian, nobody here writes in russian, so one of the anti spam check checks would be to see how
many russian charachters are used in a post, if it is a high percentage, there's a very high probability it is spam.
You can also look for links, the purpose of some spam bots is to increase the traffic to sketchy websites, if a post has many links towards spammy
websites, the post itself is spam (there are huge public blacklists).
Spambots also apply some SEO rules to their posts, like writing multiple times the same set of keywords to increase SEO visibility, keywords we'd
rarely use in a chemistry post, so keeping track of them and their percentage is another check.
I wouldn't worry about paying for anything advanced.
Quote: Originally posted by Oxy |
Quote: Originally posted by Ubya |
The second idea is to actually keep the current system, but automating it. New users could send an email request to a specific email, and a script
would read, evaluate, create the account and then sending them the details via email.
|
Yes, this is simple. But I suspect that human reading is actually a potential spam protection mechanism as one who reads the mail can guess if email
address and content look legit. |
That was something i need to ask the admins, like what's their criteria for detecting a spammy registration request vs a legit one.
If we implement a specific email request format, automating the reading, data extraction and registration becomes as easy as reading a form, if the
email is more vague it is still doable but the percentage of errors would be higher (imagine someone asking for a specific password while someone else
saying anything is fine, or if a user asks for a username that already exists)
---------------------------------------------------------------------
feel free to correct my grammar, or any mistakes i make
---------------------------------------------------------------------
|
|
Tsjerk
International Hazard
Posts: 3032
Registered: 20-4-2005
Location: Netherlands
Member Is Offline
Mood: Mood
|
|
Quote: Originally posted by Ubya |
Yeah this would solve the issue at its core, even though reCaptcha has been cracked, it is way better than having nothing |
I'm pretty sure the people who can get a bot past reCaptcha wouldn't spam SM. And besides, reCaptcha is an external service which is under continuous
development by Google, so it is updated automatically. I don't know what version is implemented and if that version is still maintained, but I guess
it is.
|
|
aab18011
Hazard to Self
Posts: 74
Registered: 11-7-2019
Location: Connecticut, USA
Member Is Offline
Mood: Moving out and setting up shop in my new chemistry hobbit hole
|
|
If I may add my two cents as well,
I have been messing around with programning for a while and just finished some classes on AI detection methodology.
One can use a bunch of previously detected spam accounts, messages, links (by a human eye of course) as training data for the AI. The AI can be
trained and tested until it recognizes spam 99.99% of the time.
And it would be both free and relatively easily. All you need is training data and some free time to get it learning and time to test it out.
I am the one who boils to dryness, fear me...
H He Li B C(12,14) Na S Cl Mn Fe Cu Zn Ba Ag Sn I U(238)
"I'd rather die on my feet than live on my knees" -Emiliano Zapata
|
|
Texium
Administrator
Posts: 4580
Registered: 11-1-2014
Location: Salt Lake City
Member Is Offline
Mood: PhD candidate!
|
|
All of the email requests I’ve received have looked legit. I haven’t turned any down. I assume it’s just not worth it for spammers to manually
send an email to someone just for a chance to spam a forum, or to write a script that will email specific people to trick them into making spam
accounts. As impractical as the system is for us and new members, it’s even less practical for spammers, and that’s the whole point.
I handle requests the same way I do with the wiki. I ask what they want their username to be if they didn’t tell me in their first email, then I
open registrations temporarily, make the account with a temporary password, close registration, and tell the new user their password, encouraging them
to change it when they log in. Funny enough, most accounts that I registered in this way still haven’t posted anything despite going to the trouble
to reach out to me.
|
|
woelen
Super Administrator
Posts: 8012
Registered: 20-8-2005
Location: Netherlands
Member Is Offline
Mood: interested
|
|
Requests for getting an account on sciencemadness always are personal emails, and each request differs from others. My reply also is a natural
language email, which also is (somewhat) variable. I choose a password, when making an account, and in the email response I give that password, just
being part of a sentence. I also ask the people to change their password as soon as they are in.
The entire process of manually creating accounts is a very good measure against spam. Automated bots simply cannot deal with that process, due to
variations in free-format text, no free choice of password, variable response time (sometimes just a few hours, sometimes a few days if I am away for
a while). The amount of work for me in the process is a appr. 10 minutes per week. I think that I never created a spam account.
If a software update would give us a good mechanism against automated spambots, then I would encourage the use of such a solution, but I do not think
it is worth the effort to make your own mechnisms for registration, spam-discovery and spam-removal. Melgar's solution sort of worked, but it was not
flawless and sometimes there also were issues with legit posts.
|
|
Oxy
Hazard to Others
Posts: 140
Registered: 1-12-2020
Member Is Offline
|
|
Quote: Originally posted by Ubya |
Bots create posts using some rules, once you know them you can detect spam posts pretty easily.
At the time for example a lot of spam posts were in russian, nobody here writes in russian, so one of the anti spam check checks would be to see how
many russian charachters are used in a post, if it is a high percentage, there's a very high probability it is spam.
You can also look for links, the purpose of some spam bots is to increase the traffic to sketchy websites, if a post has many links towards spammy
websites, the post itself is spam (there are huge public blacklists).
Spambots also apply some SEO rules to their posts, like writing multiple times the same set of keywords to increase SEO visibility, keywords we'd
rarely use in a chemistry post, so keeping track of them and their percentage is another check.
I wouldn't worry about paying for anything advanced.
|
I would be not so sure about counting any links as people here also add them, often more than 1 in one post. Multiple time the same keyword, hmmm...
elimination, bromine, reaction, stirred, added, filtered and many, many more.
The russian example looks like a simple one and that would actually work until someone will not mix it with latin letters and will go below the
threshold. Or someone posting procedure from Russian book, that would be false positive then. The problem will be with spammers using latin alphabet
and to be honest - I've seen here only bots which were using it. And here the script will not do it's job as it will quickly fail making a lot of
false positives or negatives. Spam will stay on the board and legit messages will be marked as potential spam waiting for mods reaction.
For this purpose I would rather look for a tool that can process natural language and train it against board database. Some spam should be collected
also. Then we may train neural network and make it very, very efficient. There are even some research papers about spam detection systems with neural
networks.
I have another idea, probably the best and the cheapest probably which works great in one board I was using a lot in past. There was a strict policy
that first posts, 10 or 20 have to be accepted by moderator. Before acceptance they were not visible, once the user passes the threshold then he could
use the board normally without any overhead due to waiting for a mod. That would require to find more mods or make existing more active but will do
the trick. Also I propose to remove all the accounts without any posts which are older than 1 or 2 months. Bots are using this type of accounts, they
had almost always 1 post after submitting the spam message. That can be even simply implemented as database job.
If you need any code/db help I would be glad to help. It would be a good idea to make the migration to newer version as Tsjerk proposed.
Recaptcha could be also implemented without migration I suppose.
Also, another idea - we could add a captcha to post form for every user with less than let's say 5 or 10 posts.
|
|
Tsjerk
International Hazard
Posts: 3032
Registered: 20-4-2005
Location: Netherlands
Member Is Offline
Mood: Mood
|
|
I implemented reCaptcha v3 on my employers website, on a request form to be precise. Version 3 works so flawlessly you don't even know it's there
anymore. You don't have to check the familiar pictures with the question "which pictures have stoplights" and such.
The software recognizes things like mouse movements and browser settings and determines whether you are a bot or not that way. My employer's website
is a WordPress one, so a simple plugin was enough. For SM it would only require an update from 1.9.11 to 1.9.12 and the reCaptcha should work out of
the box.
The use is free up to a million times a month.
Apparently there are still a lot of bots randomly trying to get into the registration form, as now and then one comes through. This must happen when
the system is temporarily activated by the admins to make someone an account.
[Edited on 17-12-2021 by Tsjerk]
|
|
aab18011
Hazard to Self
Posts: 74
Registered: 11-7-2019
Location: Connecticut, USA
Member Is Offline
Mood: Moving out and setting up shop in my new chemistry hobbit hole
|
|
Quote: Originally posted by woelen | Requests for getting an account on sciencemadness always are personal emails, and each request differs from others. My reply also is a natural
language email, which also is (somewhat) variable. I choose a password, when making an account, and in the email response I give that password, just
being part of a sentence. I also ask the people to change their password as soon as they are in.
The entire process of manually creating accounts is a very good measure against spam. Automated bots simply cannot deal with that process, due to
variations in free-format text, no free choice of password, variable response time (sometimes just a few hours, sometimes a few days if I am away for
a while). The amount of work for me in the process is a appr. 10 minutes per week. I think that I never created a spam account.
If a software update would give us a good mechanism against automated spambots, then I would encourage the use of such a solution, but I do not think
it is worth the effort to make your own mechnisms for registration, spam-discovery and spam-removal. Melgar's solution sort of worked, but it was not
flawless and sometimes there also were issues with legit posts. |
I'd strongly argue the usage of an AI. The finalized algorithm script is very small, easy to use, and very very fast. If they are trained well enough
they can literally pretend to be human. Im not saying it doesnt take some time to set up, but if you guys are interested, it could be of great use. I
can provide resources when I get some time. I can even put together maybe a small model to show you what it can possibly do.
I am the one who boils to dryness, fear me...
H He Li B C(12,14) Na S Cl Mn Fe Cu Zn Ba Ag Sn I U(238)
"I'd rather die on my feet than live on my knees" -Emiliano Zapata
|
|
Tsjerk
International Hazard
Posts: 3032
Registered: 20-4-2005
Location: Netherlands
Member Is Offline
Mood: Mood
|
|
Really, everyone, please stop thinking about hacking scripts into the forums code, training AI, migration of the forum and what not.
The 1.9.12 update is backwards compatible, meaning no one will ever notice a thing besides an 11 changing to a 12 somewhere. No one has to write any
code, that has already been done by the XMB community. Tested for over a year now.
All that has to happen is Polverone running an update on the backend.
|
|
karlos³
International Hazard
Posts: 1520
Registered: 10-1-2011
Location: yes!
Member Is Offline
Mood: oxazolidinic 8)
|
|
Keep in mind "sleeper agent" bots.
We have them experienced next door and even a few more "sophisticated" ones(asking for the admin, the first one was suspicious to me, but by the
second one I realized it was just a fucking script to make them ask for that).
And some others where you can obviously realize their are scripted with stolen or bought mail accounts, they register and go offline.... Joe and me
have annihilated literally hundreds of them, and they all had a diverse script how to acts.
Some turned to instant spamming, and some registerted and never turned up for several months, and then started spamming.
I still remember the huge battle of that sunday morning in 2019.... everywhere dead bot corpses(oh wait that was only in my mind after looking at the
mod log ).
verrückt und wissenschaftlich
|
|
Tsjerk
International Hazard
Posts: 3032
Registered: 20-4-2005
Location: Netherlands
Member Is Offline
Mood: Mood
|
|
I saw a couple of them here literally registered that day, one a couple weeks ago and one longer ago
|
|
karlos³
International Hazard
Posts: 1520
Registered: 10-1-2011
Location: yes!
Member Is Offline
Mood: oxazolidinic 8)
|
|
You were an observer of that huge battle?
We had to take turns literally to hack them into pieces.
I hope you did not get traumatized by this
verrückt und wissenschaftlich
|
|
Tsjerk
International Hazard
Posts: 3032
Registered: 20-4-2005
Location: Netherlands
Member Is Offline
Mood: Mood
|
|
No I was talking about some bots, just a couple of them, which registered here
on SM. They spammed the same day as they registered, not too long ago.
|
|
karlos³
International Hazard
Posts: 1520
Registered: 10-1-2011
Location: yes!
Member Is Offline
Mood: oxazolidinic 8)
|
|
Oh yeah that is the typically way they enter next door too.
But the SMF2 boards use a different software of course, and we have 2-3 certain clues that tell us its bots.
Its confidential though
They do not have that here (or might, not sure what functions are available to the our staff here).
Thats why we can easily get rid of the sleeper agent bots as well.
Some will slip through as always.
But it is always a learning experience.
But the few years between the different board software makes quite a difference... much harder here to introduce bots it seems, but much harder to
realize its bots.
Its the other way around with the SMF board engine's.
[Edited on 17-12-2021 by karlos³]
verrückt und wissenschaftlich
|
|
Ubya
International Hazard
Posts: 1247
Registered: 23-11-2017
Location: Rome-Italy
Member Is Offline
Mood: I'm a maddo scientisto!!!
|
|
Quote: Originally posted by Tsjerk | Really, everyone, please stop thinking about hacking scripts into the forums code, training AI, migration of the forum and what not.
The 1.9.12 update is backwards compatible, meaning no one will ever notice a thing besides an 11 changing to a 12 somewhere. No one has to write any
code, that has already been done by the XMB community. Tested for over a year now.
All that has to happen is Polverone running an update on the backend. |
That's without no doubt the best thing to do, my idea was more of a something to implement in case this drags for longer
---------------------------------------------------------------------
feel free to correct my grammar, or any mistakes i make
---------------------------------------------------------------------
|
|
j_sum1
Administrator
Posts: 6320
Registered: 4-10-2014
Location: At home
Member Is Offline
Mood: Most of the ducks are in a row
|
|
I have not noticed any spam for a long time. And trolls seem to be pretty much absent too. Ithough it seem s like a cumbersome system (and I had my
doubts at first), it seems to be working really well.
This contrasts significantly with 2019 when I would delete 120 spam messages before breakfast each morning and remove another 50-60 a couple of times
later in the day. Glad to not be still doing that.
J.
|
|
karlos³
International Hazard
Posts: 1520
Registered: 10-1-2011
Location: yes!
Member Is Offline
Mood: oxazolidinic 8)
|
|
Quote: Originally posted by j_sum1 | I have not noticed any spam for a long time. And trolls seem to be pretty much absent too. Ithough it seem s like a cumbersome system (and I had my
doubts at first), it seems to be working really well.
This contrasts significantly with 2019 when I would delete 120 spam messages before breakfast each morning and remove another 50-60 a couple of times
later in the day. Glad to not be still doing that.
J. |
Although this has not neccessarily to do with any changes!
We had such a huge wave in 2019 too, in early 2019, and the next two years barely had a percent of that amount of spambots daily, some weeks even not
a single one at all.
I don't know, maybe some botnet got taken down that year, who knows?
We just reacted, that was enough work already...
Why that happened, we have only theories.
But fact is, we had never a single spambot in the three years earlier.
Our registration is also still open, opposed to SM.
So it has nothing to with the closed registration, I suppose(except that you do not have the three spambots a week we might get at worst now, and not
our trolls, although they do not find much ground next door in any case ).
My own suspicion it was a botnet, now that I heard it was that bad here too!
I never knew, never would have expected this being that worse here too
You guys know I reported immediately whenever I noticed any spam post, but I guess I just saw a fraction of what you have gotten here too.
Wow.
Does that make more sense?
We should maybe exchange a little bit better in the future, because that time span(worst in the mid 2019's? almost over at the end of the year?), this
seemed like something that has attacked us both with the same focus?
No idea.
But it looks like this.
Me or maybe Joe will contact one of the SM staff if we experience that again, it might really was concerted? (on specific words in chemistry boards
maybe, who knows)
And who knows this either?
That number is surprisingly close to what we experienced.
Something else, the bots mostly had(despite us not needing a real email) used all google or yandex.ru mail accounts.
So its likely those were stolen, and that was definitely a concerted action.
Have you guys noticed something similar with bot flood back then?
I would assume so...
What about the theory with a bot net being taken down?
Could have happened.
It almost came to a halt and we were always open like a fucking barn
verrückt und wissenschaftlich
|
|
Tsjerk
International Hazard
Posts: 3032
Registered: 20-4-2005
Location: Netherlands
Member Is Offline
Mood: Mood
|
|
I'm pretty sure that when the registration would be opened again the boys would be back in an hour here. They already get through once every three
months or so with just the admins opening it for ten minutes to register someone.
I think it is well known XMB 1.9.11 is vulnerable, a quick Google for the term "powered by XMB 1.9.11" (use the quotes) gives you a list of boards
using the software, so these boards are targeted.
You don't need a botnet to terrorize a board, a single laptop is sufficient, you only need a lot of email addresses. But there are enough shady
parties giving away email addresses for free without too much verification whether you are a human or not.
I could probably whip up some scripts in a day, maybe two, which would spam the shit out of random XMB boards. I would use Robot Framework lovely tool for test automation with a nice Selenium Library available. Include
some Python email library so you don't have to worry about checking the verification emails via the GUI and you are good to go.
Covering the blockage of your IP address is also easy, just run your bot via TOR, normal websites are available via TOR, but the requests would come
from different IPs every time.
[Edited on 18-12-2021 by Tsjerk]
|
|
karlos³
International Hazard
Posts: 1520
Registered: 10-1-2011
Location: yes!
Member Is Offline
Mood: oxazolidinic 8)
|
|
Luckily neither our troll(s) or bots were that smart... they always, both, did that(even the trolls we knew of that they at least read that we don't
log IP's did really use that... buncha morons ).
verrückt und wissenschaftlich
|
|
Texium
Administrator
Posts: 4580
Registered: 11-1-2014
Location: Salt Lake City
Member Is Offline
Mood: PhD candidate!
|
|
Quote: Originally posted by karlos³ | Something else, the bots mostly had(despite us not needing a real email) used all google or yandex.ru mail accounts.
So its likely those were stolen, and that was definitely a concerted action.
Have you guys noticed something similar with bot flood back then?
I would assume so... | Yes, we had noticed that many of the bots were using yandex email addresses. If I’m
not mistaken, we banned that domain from registering, and that helped with stemming the flow of new spammers somewhat. It wasn’t enough though.
|
|
karlos³
International Hazard
Posts: 1520
Registered: 10-1-2011
Location: yes!
Member Is Offline
Mood: oxazolidinic 8)
|
|
Quote: Originally posted by Texium | Quote: Originally posted by karlos³ | Something else, the bots mostly had(despite us not needing a real email) used all google or yandex.ru mail accounts.
So its likely those were stolen, and that was definitely a concerted action.
Have you guys noticed something similar with bot flood back then?
I would assume so... | Yes, we had noticed that many of the bots were using yandex email addresses. If I’m
not mistaken, we banned that domain from registering, and that helped with stemming the flow of new spammers somewhat. It wasn’t enough though.
|
I'd say in total the yandex bots made out like 20-30%, but the google mails were "almost"(not sure if not all, were there even others? I can't
remember a single one, I would have noticed though) the total remainder of them.
None of them had ever used a fictional mail though.
Only the one or other smarter troll, but smart people do not keep on trolling for months or years.
I suddenly coughed and it almost sounded like "feedeechemist", whatever that word means
verrückt und wissenschaftlich
|
|