DubaiAmateurRocketry
National Hazard
Posts: 841
Registered: 10-5-2013
Location: LA, CA, USA
Member Is Offline
Mood: In research
|
|
response to the spams
Sometime i login, see some new posts, gets happy inside to learn something, and then figure out its some random advertisement.. hell, why not make
some rules such as new member would not be allowed to post anything for 24 hours or something ? why are they even advertising on a chemistry forum?
|
|
Amos
International Hazard
Posts: 1406
Registered: 25-3-2014
Location: Yes
Member Is Offline
Mood: No
|
|
This shouldn't be much of a problem, as the titles rarely pertain to anything that could be interpreted as chemistry, and the usernames are always
strings of random characters. They do get deleted(eventually), so just know the signs and avoid them.
|
|
Mailinmypocket
International Hazard
Posts: 1351
Registered: 12-5-2011
Member Is Offline
Mood: No Mood
|
|
Sometimes I actually check out what the spam is and it always leads me to wonder: does it actually work on some people?
It's all so poorly written and formatted not to mention the title which usually is gibberish. To go through the trouble of typing up all that shite
there must be some benefit to them on some forums or elsewhere that they post their garbage. But how and where? I could never imagine deciding to
purchase something advertised like that...
|
|
Metacelsus
International Hazard
Posts: 2539
Registered: 26-12-2012
Location: Boston, MA
Member Is Offline
Mood: Double, double, toil and trouble
|
|
It's search engine optimization, probably. Crawlers can fall for stuff that people don't.
Also, this thread should be merged to the existing one.
[Edited on 7-1-2015 by Cheddite Cheese]
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
Do we need a software engineer to devote some time to put together solution for this?
I am one such guy, and would be willing.
Here is what I would do.
1) Add a CAPTCHA to the registration page to keep bots from being able to automatically register with this known forum software.
2) Add a CAPTCHA to post threads and replies. This could possibly be required on an interval, so that one is not required everytime, but becomes
required due to a trigger, such as a time interval or frequency of posts.
3) I would also add a report button to the thread listing pages, such as "Today's Posts", so that obvious posts can be reported without having to go
into the thread, click the report button, and then enter an explaination. You would just click "report", and it would get added to the existing queue
with a template message, such as "Possible spam reported by {user} at {date/time}; posted by {user}", or my personal favorite "Spammy Spam".
Unless this forums PHP source code is obfuscated, utilizes some weird compiled CGI executables, this should be easy as stealing candy from a baby (but
now that I am a parent, if you steal candy from my baby, you will get hurt).
Do I have a taker?
|
|
Brain&Force
Hazard to Lanthanides
Posts: 1302
Registered: 13-11-2013
Location: UW-Madison
Member Is Offline
Mood: Incommensurately modulated
|
|
There is a CAPTCHA on the registration page and flood protection is enabled.
We do need a thread report button, however.
At the end of the day, simulating atoms doesn't beat working with the real things...
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
There is also the potential of using a bayesian filter to attempt to determine the likeness score against a known sample of spam, and then push it
onto a queue for review by an administrator to make the final call. This is the way the Spam folder in your email account works.
[Edited on 7-1-2015 by Loptr]
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
Quote: Originally posted by Brain&Force | There is a CAPTCHA on the registration page and flood protection is enabled.
We do need a thread report button, however. |
What about a CAPTCHA on the login page? If they are going to login, force them to prove they are human. Maybe even a question/answer based CAPTCHA?
EDIT: I used to be a moderator at GovernmentSecurity.org, and administrator of a couple of private computer-security related forums. I have
been legit for a number of years now that I started using my abilities to make a nice income, but still just as capable and security-minded
[Edited on 7-1-2015 by Loptr]
[Edited on 7-1-2015 by Loptr]
[Edited on 7-1-2015 by Loptr]
|
|
unionised
International Hazard
Posts: 5126
Registered: 1-11-2003
Location: UK
Member Is Offline
Mood: No Mood
|
|
Is it possible to have a "Mark as spam" button which deletes the post and the poster automatically if enough people mark it.
It might be prudent to weight the people reporting spam in accord with something like years of membership here or number of posts to stop someone
registering multiple accounts just to delete a valid post for "political" reasons.
So, for example, if at least three people who have at least 2000 posts between them report something as spam it's automatically sent to the bin and
the account blocked.
That way, any of us can help stem the tide, rather than only the mods, who are busy (possibly busy eating cheese nips, but that's not the point).
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
Quote: Originally posted by unionised | Is it possible to have a "Mark as spam" button which deletes the post and the poster automatically if enough people mark it.
It might be prudent to weight the people reporting spam in accord with something like years of membership here or number of posts to stop someone
registering multiple accounts just to delete a valid post for "political" reasons.
So, for example, if at least three people who have at least 2000 posts between them report something as spam it's automatically sent to the bin and
the account blocked.
That way, any of us can help stem the tide, rather than only the mods, who are busy (possibly busy eating cheese nips, but that's not the point).
|
It could work something this: if a user says somethings a spam, and it turns out to be spam, then his truth value increases. This truth value can be
affected by many things, including post count, length of membership, etc., but when the reverse happens, and a reported spam item is not spam, then
the truth value decreases, and so does their reliability in reporting spam. This could be used to determine if a spam post bypasses the verification
queue moderated by admins/whoever, and goes by-by instead. This would still allow abuse by any more ego-maniacal upper echelon that we might have, who
could use it to silence certain subjects or people.
I am sure there are published algorithms that I could scrounge up, as well, since this is not a unique problem.
One of the topics I studied in college was machine learning, and in particular collaborative filtering, which is a group of algorithms that are used
by Amazon, Ebay, etc., to push items to people because they consider the users similar. One user likes something, and another user is similar to them,
then there is a likelyhood they also will like that item.
[Edited on 7-1-2015 by Loptr]
I have been on this forum long enough, and not really contributed much so far, because I am more of a computer science nerd than a chemistry buff
(but I am trying to change that), and this is one way I CAN give back. Also, because seeing this much spam irritates me, and my real-life job is about
eliminating manual intervention and automating processes, in a certain, errm, "industry".
[Edited on 7-1-2015 by Loptr]
|
|
mayko
International Hazard
Posts: 1218
Registered: 17-1-2013
Location: Carrboro, NC
Member Is Offline
Mood: anomalous (Euclid class)
|
|
An additional strategy might target the gibberish nature of the usernames - they'd make good passwords, so perhaps one could use password-quality code
to screen and reject suspicious usernames?
Some scammers actually use that to their advantage: sending emails claiming to be Nigerian royalty is cheap; the expensive step is filtering the
responses. By making their initial volley so terrible, they enrich their responses with the ultra-gullible:
Quote: |
By sending an email that repels all but the most gullible the scammer gets the most promising marks to self-select, and tilts the true to false
positive ratio in his favor.
| (source)
Sometimes I try to get scammers to sign petitions, to argue with one another, etc. to make them work for their ill-gotten gains. This can backfire,
however, as Ray Smuckles once learned.
al-khemie is not a terrorist organization
"Chemicals, chemicals... I need chemicals!" - George Hayduke
"Wubbalubba dub-dub!" - Rick Sanchez
|
|
Brain&Force
Hazard to Lanthanides
Posts: 1302
Registered: 13-11-2013
Location: UW-Madison
Member Is Offline
Mood: Incommensurately modulated
|
|
Quote: Originally posted by Loptr | Quote: Originally posted by Brain&Force | There is a CAPTCHA on the registration page and flood protection is enabled.
We do need a thread report button, however. |
What about a CAPTCHA on the login page? If they are going to login, force them to prove they are human. Maybe even a question/answer based CAPTCHA?
EDIT: I used to be a moderator at GovernmentSecurity.org, and administrator of a couple of private computer-security related forums. I have
been legit for a number of years now that I started using my abilities to make a nice income, but still just as capable and security-minded
[Edited on 7-1-2015 by Loptr]
[Edited on 7-1-2015 by Loptr]
[Edited on 7-1-2015 by Loptr] |
The spammers are human. This is the problem.
At the end of the day, simulating atoms doesn't beat working with the real things...
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
Quote: Originally posted by Brain&Force | Quote: Originally posted by Loptr | Quote: Originally posted by Brain&Force | There is a CAPTCHA on the registration page and flood protection is enabled.
We do need a thread report button, however. |
What about a CAPTCHA on the login page? If they are going to login, force them to prove they are human. Maybe even a question/answer based CAPTCHA?
EDIT: I used to be a moderator at GovernmentSecurity.org, and administrator of a couple of private computer-security related forums. I have
been legit for a number of years now that I started using my abilities to make a nice income, but still just as capable and security-minded
[Edited on 7-1-2015 by Loptr]
[Edited on 7-1-2015 by Loptr]
[Edited on 7-1-2015 by Loptr] |
The spammers are human. This is the problem. |
They're human? I thought they were spambots. That takes a lot of human effort to accomplish any sort of blackhat SEO.
Anyway, a bayesian filter could still be put to use that would allow the posts to be screened, and be filed away into a "junk" folder for further
review.
http://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering
Why Bayesian filtering is the most effective anti-spam technology - achieving a 98%+ spam detection rate using a mathematical approach
http://www.gfi.com/whitepapers/why-bayesian-filtering.pdf
[Edited on 7-1-2015 by Loptr]
|
|
diddi
National Hazard
Posts: 723
Registered: 23-9-2014
Location: Victoria, Australia
Member Is Offline
Mood: Fluorescent
|
|
Quote: Originally posted by Loptr | Quote: Originally posted by unionised | Is it possible to have a "Mark as spam" button which deletes the post and the poster automatically if enough people mark it.
It might be prudent to weight the people reporting spam in accord with something like years of membership here or number of posts to stop someone
registering multiple accounts just to delete a valid post for "political" reasons.
So, for example, if at least three people who have at least 2000 posts between them report something as spam it's automatically sent to the bin and
the account blocked.
That way, any of us can help stem the tide, rather than only the mods, who are busy (possibly busy eating cheese nips, but that's not the point).
|
It could work something this: if a user says somethings a spam, and it turns out to be spam, then his truth value increases. This truth value can be
affected by many things, including post count, length of membership, etc., but when the reverse happens, and a reported spam item is not spam, then
the truth value decreases, and so does their reliability in reporting spam. This could be used to determine if a spam post bypasses the verification
queue moderated by admins/whoever, and goes by-by instead. This would still allow abuse by any more ego-maniacal upper echelon that we might have, who
could use it to silence certain subjects or people.
I am sure there are published algorithms that I could scrounge up, as well, since this is not a unique problem.
One of the topics I studied in college was machine learning, and in particular collaborative filtering, which is a group of algorithms that are used
by Amazon, Ebay, etc., to push items to people because they consider the users similar. One user likes something, and another user is similar to them,
then there is a likelyhood they also will like that item.
[Edited on 7-1-2015 by Loptr]
I have been on this forum long enough, and not really contributed much so far, because I am more of a computer science nerd than a chemistry buff
(but I am trying to change that), and this is one way I CAN give back. Also, because seeing this much spam irritates me, and my real-life job is about
eliminating manual intervention and automating processes, in a certain, errm, "industry".
[Edited on 7-1-2015 by Loptr] |
All this is great... lets get it in place.
Can I have 1 truth point for supporting this thread.
|
|
Chemosynthesis
International Hazard
Posts: 1071
Registered: 26-9-2013
Member Is Offline
Mood: No Mood
|
|
Even though this is an international forum, it would be very easy to subject post strings to (hopefully not butchering this with non-compsci jargon)
homology sequences of known English/French/language of intellectual commerce. I will be reading up on Bayesian filtering to see how it can applied to
homology searches later, so I am obviously deferential to others' expertise in bouncing around ideas.
It should be very easy to compare letter frequencies of known languages, maybe determine what percentage of words used appear in a robust dictionary,
etc. I have played around with this in a couple languages and am far from a computer scientist.
What I am curious about is how computationally intensive any of these posts screening techniques since they would seem not to inconvenience the
poster.
Also, if a trust/truth system is implemented, I hope it is secret for two reasons; 1) security through obscurity in preventing easy gaming of the
system and 2) just to avoid the problems of grading previous postings' veracities, and whatever social bickering/appeal to poster authority that may
entail. Trust could be a simple Boolean of X posts/time. I seems we already have some version of this concept implemented in that reported posters,
fast posters, etc. can have all of their posts completely deleted from the forum automatically in their first 30 days or so of membership.
[Edited on 8-1-2015 by Chemosynthesis]
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
The bayesian filtering would act just like the junk folder in your email client. It will get things wrong, and that is why you don't let it decide to
delete things. Instead, it would mark it for further review before being presented to the members of the forum. We also need to have this queue push
suspect posts to the reviewers, so that posts don't get forgotten about and not reviewed. This would keep the reviewers from abusing a member by
skipping the review of his post.
I don't know if I am actually in favor of members, regardless of their trust level, being able to abolish a post just because. I feel this is a duty
of the staff, and that it should be made as easy as possible for them to review, decide, and move on. I threw the trust idea out there as the result
of some work I did on collective/collaborative filtering and quorum protocols, where an agents decisions are graded on its acceptance by their peers.
I think the first step would be to put the focus on bayesian techniques, and just make sure to keep the spam signatures up to date as the spam
evolves. This should solve some of the spam problems we have here. I almost positive there is an existing technology that could be bent a little to
work in the situation, unless the trust system, which would take a bit of time just to get the design right and proofed out.
As for the computational complexity, it can very greatly, but I like to hit the lowest hanging fruit first before even thinking about making it
complicated. I am not afraid of complexity, unlike some people I know, but actually embrace it; however, only when it is justified.
CAVEAT:
I do have this to say, though. If you post a reply in a thread, and it gets flagged as spam, it's likely you will get very irritated as it might be a
little while before it shows up. It will DEFINITELY have it's pro's and con's. Also, I have never tried this with the type of communication that
occurs on a forum, so there is the possibility it might not work as well as it does in other applications.
I am just presenting an option that is available it's something that people want.
[Edited on 8-1-2015 by Loptr]
|
|
Polverone
Now celebrating 21 years of madness
Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline
Mood: Waiting for spring
|
|
Hi Loptr,
Sorry I haven't kept up with this thread lately. I appreciate your offer. I would be glad if you could take the time to add a CAPTCHA to registration,
and to add convenient spam reporting buttons where appropriate. You can fork my github repository of the XMB forum here, and then I can reintegrate
your changes if you submit pull requests that I can review: https://github.com/mattbernst/xmbforum
I have already implemented Bayesian filtering for posts and profiles, trained on hand-selected samples of link spam submitted here, and implemented on
top of scikit-learn. The spam that you see linger for an extended period is the stuff that survives the automated spam filter. My filtering is
conservative because unlike with email that you can retrieve from the junk folder, there is no going back with database deletions. At least I haven't
added the additional functionality that I would need for "undo." The volume of content is too high to perform manual human review on every flagged
post.
One other thing that could be useful if you could locate it: reliable language detection via open source softwave. If someone shows up here and
initially posts in Chinese or Japanese it is always spam. I tried implementing language detection using NLTK, but NLTK's algorithms seem to only
properly deal with phonetic languages. It misidentified a number of sample Chinese spam posts as various European languages.
[Edited on 1-10-2015 by Polverone]
PGP Key and corresponding e-mail address
|
|
|