Pages:
1
..
17
18
19
20
21
..
28 |
streety
Hazard to Others
Posts: 110
Registered: 14-5-2018
Member Is Offline
|
|
It's difficult to get a true impression of the number and frequency of spam posts from occasional visits to the forum so I decided to collect some
data. I wrote a simple script that downloaded the "Today's Posts" page every 5 minutes and recorded any new topics. If any topics it had seen
previously were missing it checked whether they were deleted and recorded when that happened.
The results were quite interesting so I thought I would share what I found.
I stopped the script ~12 hours ago and at that point it had been running for 6 days, 2 hours. During that time I recorded 645 spam topics for a rate
of 106/day.
The histogram below shows the time it took to delete each spam topic.
The minimum was barely above 5 minutes so I probably missed some topics that were deleted so quickly they were gone before my script downloaded the
page.
The average was 84 minutes. 32% were deleted within 30 minutes, 47% within 60 minutes, and 79% within 120 minutes. The median was 65 minutes.
Messages posted around 6-7am board time seemed to take the longest time to be removed.
I've stopped the script for the moment but can restart it if there is continued interest in this type of analysis.
|
|
diddi
National Hazard
Posts: 723
Registered: 23-9-2014
Location: Victoria, Australia
Member Is Offline
Mood: Fluorescent
|
|
excellent info streety
Beginning construction of periodic table display
|
|
j_sum1
Administrator
Posts: 6335
Registered: 4-10-2014
Location: At home
Member Is Offline
Mood: Most of the ducks are in a row
|
|
Thanks for doing this streety. We have not had good information on numbers until now. I think the volume is pretty consistent with what I have
observed.
What I am most interested in is the time of day that spam appears. If there is a strong pattern then it is useful to know the heaviest times. It is
worth my effort to do a targeted clean up if I know that a flood is coming.
To clarify... The time of day on that graph is the time that the spam appears and not the time it is deleted???
Help me translate the times on your graph to my local time zone.
Edit
There are some interesting diagonal stripes on that graph.
Looks like a spam bot was posting at regular five minute intervals and then all posts being deleted at once.
[Edited on 8-8-2018 by j_sum1]
|
|
streety
Hazard to Others
Posts: 110
Registered: 14-5-2018
Member Is Offline
|
|
The x axis is the time posted and uses the default for the board, i.e. GMT-8. I think Sydney would be 18 hours ahead.
This is adjusted for time deleted.
This is color coded for the day. It does seem like the uptick around 6-10am occurs on multiple days. Whether that pattern will continue or not is an
open question.
[edit]
I've just looked at the users and there is a surprise there as well.
There were 216 members linked to spam posts.
I assumed they were registering and then immediately posting spam. Most accounts did follow that pattern but 15 accounts were registered days before
they started posting. For these 15 accounts the median delay was 8 days. There were two accounts around 40-50 days and one account that was registered
71 days before it started posting.
[/edit]
[Edited on 8-8-2018 by streety]
|
|
WGTR
National Hazard
Posts: 971
Registered: 29-9-2013
Location: Online
Member Is Offline
Mood: Outline
|
|
That means that some of those accounts are old enough that they are no longer "new", and require a manual delete of the spam. That explains why some
posts tend to linger after being reported.
I've noticed something before, where a bot will register and make an innocuous post like "Hi, nice post!" Or something like that, and then wait some
weeks before posting spam, presumably to get around a spam deletion script.
Do you see a pattern on the types of posts/accounts that register and then wait several weeks to post, or is it pretty much the same as those that
post immediately?
|
|
streety
Hazard to Others
Posts: 110
Registered: 14-5-2018
Member Is Offline
|
|
Nothing jumps out at me. It seems like a mix of english characters and cyrillic. The number of posts is similar per user at ~3. The topics seem
varied.
|
|
WGTR
National Hazard
Posts: 971
Registered: 29-9-2013
Location: Online
Member Is Offline
Mood: Outline
|
|
I'm afraid that spam_laurie just had a very short and meaningless existence on sciencemadness. Her life snuffed out in the flower of her youth, I
wish I could say that I miss her, but I truly don't.
|
|
Abromination
Hazard to Others
Posts: 432
Registered: 10-7-2018
Location: Alaska
Member Is Offline
Mood: 1,4 tar
|
|
Maybe im simply not a good observer, but it appears that in the last few days that wheb someone spams the spam appears in all subforums (except
detrius, ironically).
Everyone knows its a bot, that's obvious (Xrumertest maybe) but I had never noticed them doing this before. They would usually just post in genchem
and occasionally some of the others.
List of materials made by ScienceMadness.org users:
https://docs.google.com/spreadsheets/d/1nmJ8uq-h4IkXPxD5svnT...
--------------------------------
Elements Collected: H, Li, B, C, N, O, Mg, Al, Si, P, S, Fe, Ni, Cu, Zn, Ag, I, Au, Pb, Bi, Am
Last Acquired: B
Next: Na
--------------
|
|
WGTR
National Hazard
Posts: 971
Registered: 29-9-2013
Location: Online
Member Is Offline
Mood: Outline
|
|
So perhaps if a new account posts to all sub forums in a short time, the spam deletion script should nuke the account? That's one idea. There are
quite a few spam accounts that only post once or twice, but I've seen some like you say.
I've corresponded with woelen and j_sum1, and a possible solution is in the works. Everybody please hang in there til then and keep reporting
spammers as soon as they pop their slimy little heads into the daylight.
|
|
Texium
Administrator
Posts: 4619
Registered: 11-1-2014
Location: Salt Lake City
Member Is Offline
Mood: PhD candidate!
|
|
Right now our solutions are quite patchy. Ultimately we will need to migrate to newer software. That's inevitable, as we can't just keep patching up
this old ship forever. But, until we do that, we're just going to have to keep bailing her out as best as we can. Thanks for your patience, everyone.
And streety, thank you for presenting those analytics, that was very interesting. I can confirm that spammers sometimes get killed within 5 minutes
because there are times when a new one pops up as I'm refreshing Today's Posts and I hit it within the minute it was posted!
|
|
Abromination
Hazard to Others
Posts: 432
Registered: 10-7-2018
Location: Alaska
Member Is Offline
Mood: 1,4 tar
|
|
Im still trying to figure out the aim of this PHD troll. Why would he bug us? Im sure he would have better things to do than constantly come back and
interfere with the calm and order of a hugely diverse international forum of chemists.
List of materials made by ScienceMadness.org users:
https://docs.google.com/spreadsheets/d/1nmJ8uq-h4IkXPxD5svnT...
--------------------------------
Elements Collected: H, Li, B, C, N, O, Mg, Al, Si, P, S, Fe, Ni, Cu, Zn, Ag, I, Au, Pb, Bi, Am
Last Acquired: B
Next: Na
--------------
|
|
Abromination
Hazard to Others
Posts: 432
Registered: 10-7-2018
Location: Alaska
Member Is Offline
Mood: 1,4 tar
|
|
Maybe you wouldn't be rejected if you didn't behave like you do.
I almost pity you. You should be concerned if a high school student is telling you that.
List of materials made by ScienceMadness.org users:
https://docs.google.com/spreadsheets/d/1nmJ8uq-h4IkXPxD5svnT...
--------------------------------
Elements Collected: H, Li, B, C, N, O, Mg, Al, Si, P, S, Fe, Ni, Cu, Zn, Ag, I, Au, Pb, Bi, Am
Last Acquired: B
Next: Na
--------------
|
|
j_sum1
Administrator
Posts: 6335
Registered: 4-10-2014
Location: At home
Member Is Offline
Mood: Most of the ducks are in a row
|
|
You err in thinking there is some logic behind it.
There is no reason. The guy is not reasonable. That's more or less the point.
There are some indicators that our troll is also a spammer. I think he gets his jollies messing with us when he gets bored of his meaningless
existence pushing spam for pennies.
On the spam front I get the impression that we are staying ahead of things at the moment. Thanks for everyone's vigilance in reporting. I think the
spam is getting obliterated pretty quickly. I would love to see some stats if you have them, streety.
As far as a long term solution goes, we basically won't get much headway without a software tweak or migrating to a new system. Both of these require
approval and input from Polverone; who has been frequently absent of late. This means glacier-slow progress unfortunately.
We have had a bit of a discussion on a workaround involving passwords on the forums. There are numerous problems to this idea - principally that a
password would be required to view as well as to post. I think it would also blind search engines. Which would mean no spam but probably few new
members. It would place a obstacle for infrequent old members. And if Google can't find the board (which is what I suspect) then goodbye to a lot of
the board's usefulness.
|
|
j_sum1
|
Thread Closed 18-8-2018 at 17:15 |
j_sum1
|
Thread Opened 18-8-2018 at 18:34 |
streety
Hazard to Others
Posts: 110
Registered: 14-5-2018
Member Is Offline
|
|
After the previous run of 6 days I stopped the script so no new update. I have been updating it to produce something more robust and, after discussing
with woelen what would be needed, beginning to develop a script to actively delete spam posts.
I've just now set the spam monitoring script running again on a more permanent basis so will begin providing updates going forward.
The intention for the active part of the script will be to submit a spam report to trigger the automatic deletion script on the server. This might not
work for all the spam but seemed like the safest first step as very little damage can be done by sending a U2U.
As far as I know the most recent description of the automatic deletion script is from page 8 of this topic:
Quote: Originally posted by Polverone | I have tweaked the reporting process, somewhat inspired by violet sin. Before spam had to be reported by two trusted members or one moderator before
it was auto-deleted. Now it will be auto-deleted if it is reported by one trusted member and one 'semi-trusted' member. Everyone who has been
registered at least 60 days and has at least 50 posts automatically becomes a member of the 'semi-trusted' group. This should reduce the waiting time
for spam deletion, and means that any frequent user of the site can help stamp out spam. |
The idea is to make this more aggressive by
- expanding the pool of members able to trigger a deletion (but still requiring 2 members)
- automatically triggering deletion if a new member starts multiple new topics
I'm currently looking at whether we can use machine learning to go more aggressive on deletion but with having some safeguards. For example, by
triggering deletion on the report of a single user instead of two if the machine learning model also thinks a post is spam.
I'll finish with some historical statistics. For the machine learning model I have examples of spam posts but I also need examples of genuine posts. I
could use everything on the forum but the first posts from users would be best. There are 8835 members on the forum with at least one post. Somewhat
to my surprise the majority (5062) started a new topic with their first post. Most of those (4206) posted within a day of registering on the forum.
|
|
JJay
International Hazard
Posts: 3440
Registered: 15-10-2015
Member Is Offline
|
|
Question: When you see a spammer posting several new posts, which one do you report? The oldest one?
|
|
WGTR
National Hazard
Posts: 971
Registered: 29-9-2013
Location: Online
Member Is Offline
Mood: Outline
|
|
Usually I do, but it depends on which one(s) have the most view counts. Usually the oldest one looks like several people have already looked at it
(and have presumably
reported it).
|
|
fusso
International Hazard
Posts: 1922
Registered: 23-6-2017
Location: 4 ∥ universes ahead of you
Member Is Offline
|
|
Quote: Originally posted by streety | The x axis is the time posted and uses the default for the board, i.e. GMT-8. I think Sydney would be 18 hours ahead.
This is adjusted for time deleted.
This is color coded for the day. It does seem like the uptick around 6-10am occurs on multiple days. Whether that pattern will continue or not is an
open question.
[edit]
I've just looked at the users and there is a surprise there as well.
There were 216 members linked to spam posts.
I assumed they were registering and then immediately posting spam. Most accounts did follow that pattern but 15 accounts were registered days before
they started posting. For these 15 accounts the median delay was 8 days. There were two accounts around 40-50 days and one account that was registered
71 days before it started posting.
[/edit]
[Edited on 8-8-2018 by streety] | @streety can you please also analyse the frequencies of real posts?
|
|
fusso
International Hazard
Posts: 1922
Registered: 23-6-2017
Location: 4 ∥ universes ahead of you
Member Is Offline
|
|
Highlight reported posts
Quote: Originally posted by Polverone | I have tweaked the reporting process, somewhat inspired by violet sin. Before spam had to be reported by two trusted members or one moderator before
it was auto-deleted. Now it will be auto-deleted if it is reported by one trusted member and one 'semi-trusted' member. Everyone who has been
registered at least 60 days and has at least 50 posts automatically becomes a member of the 'semi-trusted' group. This should reduce the waiting time
for spam deletion, and means that any frequent user of the site can help stamp out spam. |
To make others know that a post had already been reported, put an icon before the subject if a trusted member has reported it. Put another icon there
if a semi-trusted member has reported it. This can reduce number of reports and u2us.
|
|
fusso
International Hazard
Posts: 1922
Registered: 23-6-2017
Location: 4 ∥ universes ahead of you
Member Is Offline
|
|
What should we write when reporting trolls?
Should we write "spam" or "troll" when reporting trolls?
|
|
streety
Hazard to Others
Posts: 110
Registered: 14-5-2018
Member Is Offline
|
|
fusso, what would you like to know about the legitimate posts? I can extract quite a lot of information from the backup.
The icons would be a good idea if we continue to struggle with spam. I'm hoping we are close to a major improvement.
I've been able to train a machine learning model to detect spam posts with high accuracy and have now modified the monitoring script to send spam
reports. If run by a admin or mod account the reports should trigger the automatic deletion script on the server. To avoid mistakes it will only
trigger if:
- a new user (less than 48 hours old) starts two or more new topics detected as spam
- a new user starts one topic detected as spam that is also reported by a member
As the spammers change their content the performance of the model will eventually degrade but it will be easy to update it.
|
|
diddi
National Hazard
Posts: 723
Registered: 23-9-2014
Location: Victoria, Australia
Member Is Offline
Mood: Fluorescent
|
|
just write
"spam"
Beginning construction of periodic table display
|
|
j_sum1
Administrator
Posts: 6335
Registered: 4-10-2014
Location: At home
Member Is Offline
Mood: Most of the ducks are in a row
|
|
For interest' sake...
I have not cleared my u2u trash for about four months. (Late April or early May.) The only thing in there is spam reports.
You are all doing a grand job. And even though the bots are persistent, I get the feeling we are staying ahead of the mess.
But this does illustrate the size of the problem.
|
|
fusso
International Hazard
Posts: 1922
Registered: 23-6-2017
Location: 4 ∥ universes ahead of you
Member Is Offline
|
|
Quote: Originally posted by streety | fusso, what would you like to know about the legitimate posts? I can extract quite a lot of information from the backup.
The icons would be a good idea if we continue to struggle with spam. I'm hoping we are close to a major improvement.
I've been able to train a machine learning model to detect spam posts with high accuracy and have now modified the monitoring script to send spam
reports. If run by a admin or mod account the reports should trigger the automatic deletion script on the server. To avoid mistakes it will only
trigger if:
- a new user (less than 48 hours old) starts two or more new topics detected as spam
- a new user starts one topic detected as spam that is also reported by a member
As the spammers change their content the performance of the model will eventually degrade but it will be easy to update it. | I'd like to know the distribution of posts in different time periods of a day.
|
|
streety
Hazard to Others
Posts: 110
Registered: 14-5-2018
Member Is Offline
|
|
This first figure is the frequency of posting over the complete period of the board:
This is the figure showing the frequency of posts for each hour of the day. The time is the same as the board (UTC-8) so the peak is approximately,
12pm pacific time, 3pm eastern time, 8pm UTC, 5am in Sydney.
In putting together these figures I discovered two odd posts. The post date is 1969. In the database they are represented by timestamp values of 0 and
1. They are also clearly spam but then there are 8 pages of legitimate content.
https://www.sciencemadness.org/whisper/viewthread.php?tid=21...
|
|
Diachrynic
Hazard to Others
Posts: 226
Registered: 23-9-2017
Location: western spiral arm of the galaxy
Member Is Offline
Mood: zenosyne
|
|
Can someone confirm this pattern? The bots seem to post in a certain order when spamming. Just an observation. Might be interesting to check if the
pattern holds. (I just realized there is not so much order as I had hoped with that last one. Still.)
we apologize for the inconvenience
|
|
Pages:
1
..
17
18
19
20
21
..
28 |