Sciencemadness Discussion Board - response to the spams


	Not logged in [Login ]

FAQ

Member List

Today's Posts Forum Stats

Stats

Back to:

Sciencemadness Discussion Board » Fundamentals » Miscellaneous » response to the spams

Printable Version

DubaiAmateurRocketry

National Hazard

Posts: 841
Registered: 10-5-2013
Location: LA, CA, USA
Member Is Offline

Mood: In research

posted on 7-1-2015 at 07:14

response to the spams

Sometime i login, see some new posts, gets happy inside to learn something, and then figure out its some random advertisement.. hell, why not make some rules such as new member would not be allowed to post anything for 24 hours or something ? why are they even advertising on a chemistry forum?

Amos

International Hazard

Posts: 1406
Registered: 25-3-2014
Location: Yes
Member Is Offline

Mood: No

posted on 7-1-2015 at 07:22

This shouldn't be much of a problem, as the titles rarely pertain to anything that could be interpreted as chemistry, and the usernames are always strings of random characters. They do get deleted(eventually), so just know the signs and avoid them.

Mailinmypocket

International Hazard

Posts: 1351
Registered: 12-5-2011
Member Is Offline

Mood: No Mood

posted on 7-1-2015 at 07:49

Sometimes I actually check out what the spam is and it always leads me to wonder: does it actually work on some people?

It's all so poorly written and formatted not to mention the title which usually is gibberish. To go through the trouble of typing up all that shite there must be some benefit to them on some forums or elsewhere that they post their garbage. But how and where? I could never imagine deciding to purchase something advertised like that...

Metacelsus

International Hazard

Posts: 2539
Registered: 26-12-2012
Location: Boston, MA
Member Is Offline

Mood: Double, double, toil and trouble

posted on 7-1-2015 at 07:58

It's search engine optimization, probably. Crawlers can fall for stuff that people don't.

Also, this thread should be merged to the existing one.

[Edited on 7-1-2015 by Cheddite Cheese]

As below, so above.

My blog: https://denovo.substack.com

Loptr

International Hazard

Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

posted on 7-1-2015 at 08:00

Do we need a software engineer to devote some time to put together solution for this?

I am one such guy, and would be willing.

Here is what I would do.

1) Add a CAPTCHA to the registration page to keep bots from being able to automatically register with this known forum software.

2) Add a CAPTCHA to post threads and replies. This could possibly be required on an interval, so that one is not required everytime, but becomes required due to a trigger, such as a time interval or frequency of posts.

3) I would also add a report button to the thread listing pages, such as "Today's Posts", so that obvious posts can be reported without having to go into the thread, click the report button, and then enter an explaination. You would just click "report", and it would get added to the existing queue with a template message, such as "Possible spam reported by {user} at {date/time}; posted by {user}", or my personal favorite "Spammy Spam".

Unless this forums PHP source code is obfuscated, utilizes some weird compiled CGI executables, this should be easy as stealing candy from a baby (but now that I am a parent, if you steal candy from my baby, you will get hurt).

Do I have a taker?

Brain&Force

Hazard to Lanthanides

Posts: 1302
Registered: 13-11-2013
Location: UW-Madison
Member Is Offline

Mood: Incommensurately modulated

posted on 7-1-2015 at 08:07

There is a CAPTCHA on the registration page and flood protection is enabled.

We do need a thread report button, however.

At the end of the day, simulating atoms doesn't beat working with the real things...

Loptr

International Hazard

Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

posted on 7-1-2015 at 08:08

There is also the potential of using a bayesian filter to attempt to determine the likeness score against a known sample of spam, and then push it onto a queue for review by an administrator to make the final call. This is the way the Spam folder in your email account works.

[Edited on 7-1-2015 by Loptr]

Loptr

International Hazard

Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

posted on 7-1-2015 at 08:12

Quote: Originally posted by Brain&Force

There is a CAPTCHA on the registration page and flood protection is enabled.

We do need a thread report button, however.

What about a CAPTCHA on the login page? If they are going to login, force them to prove they are human. Maybe even a question/answer based CAPTCHA?

EDIT: I used to be a moderator at GovernmentSecurity.org, and administrator of a couple of private computer-security related forums. I have been legit for a number of years now that I started using my abilities to make a nice income, but still just as capable and security-minded

[Edited on 7-1-2015 by Loptr]

[Edited on 7-1-2015 by Loptr]

[Edited on 7-1-2015 by Loptr]

unionised

International Hazard

Posts: 5126
Registered: 1-11-2003
Location: UK
Member Is Offline

Mood: No Mood

posted on 7-1-2015 at 08:13

Is it possible to have a "Mark as spam" button which deletes the post and the poster automatically if enough people mark it.
It might be prudent to weight the people reporting spam in accord with something like years of membership here or number of posts to stop someone registering multiple accounts just to delete a valid post for "political" reasons.
So, for example, if at least three people who have at least 2000 posts between them report something as spam it's automatically sent to the bin and the account blocked.
That way, any of us can help stem the tide, rather than only the mods, who are busy (possibly busy eating cheese nips, but that's not the point).

Loptr

International Hazard

Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

posted on 7-1-2015 at 08:25

Quote: Originally posted by unionised

It could work something this: if a user says somethings a spam, and it turns out to be spam, then his truth value increases. This truth value can be affected by many things, including post count, length of membership, etc., but when the reverse happens, and a reported spam item is not spam, then the truth value decreases, and so does their reliability in reporting spam. This could be used to determine if a spam post bypasses the verification queue moderated by admins/whoever, and goes by-by instead. This would still allow abuse by any more ego-maniacal upper echelon that we might have, who could use it to silence certain subjects or people.

I am sure there are published algorithms that I could scrounge up, as well, since this is not a unique problem.

One of the topics I studied in college was machine learning, and in particular collaborative filtering, which is a group of algorithms that are used by Amazon, Ebay, etc., to push items to people because they consider the users similar. One user likes something, and another user is similar to them, then there is a likelyhood they also will like that item.

[Edited on 7-1-2015 by Loptr]

I have been on this forum long enough, and not really contributed much so far, because I am more of a computer science nerd than a chemistry buff (but I am trying to change that), and this is one way I CAN give back. Also, because seeing this much spam irritates me, and my real-life job is about eliminating manual intervention and automating processes, in a certain, errm, "industry".

[Edited on 7-1-2015 by Loptr]

mayko

International Hazard

Posts: 1218
Registered: 17-1-2013
Location: Carrboro, NC
Member Is Offline

Mood: anomalous (Euclid class)

posted on 7-1-2015 at 08:55

An additional strategy might target the gibberish nature of the usernames - they'd make good passwords, so perhaps one could use password-quality code to screen and reject suspicious usernames?

Quote: Originally posted by Mailinmypocket

Sometimes I actually check out what the spam is and it always leads me to wonder: does it actually work on some people?

Some scammers actually use that to their advantage: sending emails claiming to be Nigerian royalty is cheap; the expensive step is filtering the responses. By making their initial volley so terrible, they enrich their responses with the ultra-gullible:

Quote:

By sending an email that repels all but the most gullible the scammer gets the most promising marks to self-select, and tilts the true to false positive ratio in his favor.

(source)

Sometimes I try to get scammers to sign petitions, to argue with one another, etc. to make them work for their ill-gotten gains. This can backfire, however, as Ray Smuckles once learned.

al-khemie is not a terrorist organization
"Chemicals, chemicals... I need chemicals!" - George Hayduke
"Wubbalubba dub-dub!" - Rick Sanchez

Brain&Force

Hazard to Lanthanides

Posts: 1302
Registered: 13-11-2013
Location: UW-Madison
Member Is Offline

Mood: Incommensurately modulated

posted on 7-1-2015 at 09:03

Quote: Originally posted by Loptr

Quote: Originally posted by Brain&Force

There is a CAPTCHA on the registration page and flood protection is enabled.

We do need a thread report button, however.

The spammers are human. This is the problem.

At the end of the day, simulating atoms doesn't beat working with the real things...

Loptr

International Hazard

Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

posted on 7-1-2015 at 11:55

Quote: Originally posted by Brain&Force

Quote: Originally posted by Loptr

Quote: Originally posted by Brain&Force

There is a CAPTCHA on the registration page and flood protection is enabled.

We do need a thread report button, however.

The spammers are human. This is the problem.

They're human? I thought they were spambots. That takes a lot of human effort to accomplish any sort of blackhat SEO.

Anyway, a bayesian filter could still be put to use that would allow the posts to be screened, and be filed away into a "junk" folder for further review.

http://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering

Why Bayesian filtering is the most effective anti-spam technology - achieving a 98%+ spam detection rate using a mathematical approach
http://www.gfi.com/whitepapers/why-bayesian-filtering.pdf

[Edited on 7-1-2015 by Loptr]

diddi

National Hazard

Posts: 723
Registered: 23-9-2014
Location: Victoria, Australia
Member Is Offline

Mood: Fluorescent

posted on 7-1-2015 at 13:33

Quote: Originally posted by Loptr

Quote: Originally posted by unionised

All this is great... lets get it in place.
Can I have 1 truth point for supporting this thread.

Chemosynthesis

International Hazard

Posts: 1071
Registered: 26-9-2013
Member Is Offline

Mood: No Mood

posted on 8-1-2015 at 04:03

Even though this is an international forum, it would be very easy to subject post strings to (hopefully not butchering this with non-compsci jargon) homology sequences of known English/French/language of intellectual commerce. I will be reading up on Bayesian filtering to see how it can applied to homology searches later, so I am obviously deferential to others' expertise in bouncing around ideas.
It should be very easy to compare letter frequencies of known languages, maybe determine what percentage of words used appear in a robust dictionary, etc. I have played around with this in a couple languages and am far from a computer scientist.
What I am curious about is how computationally intensive any of these posts screening techniques since they would seem not to inconvenience the poster.

Also, if a trust/truth system is implemented, I hope it is secret for two reasons; 1) security through obscurity in preventing easy gaming of the system and 2) just to avoid the problems of grading previous postings' veracities, and whatever social bickering/appeal to poster authority that may entail. Trust could be a simple Boolean of X posts/time. I seems we already have some version of this concept implemented in that reported posters, fast posters, etc. can have all of their posts completely deleted from the forum automatically in their first 30 days or so of membership.

[Edited on 8-1-2015 by Chemosynthesis]

Loptr

International Hazard

Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

posted on 8-1-2015 at 07:40

The bayesian filtering would act just like the junk folder in your email client. It will get things wrong, and that is why you don't let it decide to delete things. Instead, it would mark it for further review before being presented to the members of the forum. We also need to have this queue push suspect posts to the reviewers, so that posts don't get forgotten about and not reviewed. This would keep the reviewers from abusing a member by skipping the review of his post.

I don't know if I am actually in favor of members, regardless of their trust level, being able to abolish a post just because. I feel this is a duty of the staff, and that it should be made as easy as possible for them to review, decide, and move on. I threw the trust idea out there as the result of some work I did on collective/collaborative filtering and quorum protocols, where an agents decisions are graded on its acceptance by their peers.

I think the first step would be to put the focus on bayesian techniques, and just make sure to keep the spam signatures up to date as the spam evolves. This should solve some of the spam problems we have here. I almost positive there is an existing technology that could be bent a little to work in the situation, unless the trust system, which would take a bit of time just to get the design right and proofed out.

As for the computational complexity, it can very greatly, but I like to hit the lowest hanging fruit first before even thinking about making it complicated. I am not afraid of complexity, unlike some people I know, but actually embrace it; however, only when it is justified.

CAVEAT:

I do have this to say, though. If you post a reply in a thread, and it gets flagged as spam, it's likely you will get very irritated as it might be a little while before it shows up. It will DEFINITELY have it's pro's and con's. Also, I have never tried this with the type of communication that occurs on a forum, so there is the possibility it might not work as well as it does in other applications.

I am just presenting an option that is available it's something that people want.

[Edited on 8-1-2015 by Loptr]

Polverone

Now celebrating 21 years of madness

Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline

Mood: Waiting for spring

posted on 10-1-2015 at 14:06

Hi Loptr,

Sorry I haven't kept up with this thread lately. I appreciate your offer. I would be glad if you could take the time to add a CAPTCHA to registration, and to add convenient spam reporting buttons where appropriate. You can fork my github repository of the XMB forum here, and then I can reintegrate your changes if you submit pull requests that I can review: https://github.com/mattbernst/xmbforum

I have already implemented Bayesian filtering for posts and profiles, trained on hand-selected samples of link spam submitted here, and implemented on top of scikit-learn. The spam that you see linger for an extended period is the stuff that survives the automated spam filter. My filtering is conservative because unlike with email that you can retrieve from the junk folder, there is no going back with database deletions. At least I haven't added the additional functionality that I would need for "undo." The volume of content is too high to perform manual human review on every flagged post.

One other thing that could be useful if you could locate it: reliable language detection via open source softwave. If someone shows up here and initially posts in Chinese or Japanese it is always spam. I tried implementing language detection using NLTK, but NLTK's algorithms seem to only properly deal with phonetic languages. It misidentified a number of sample Chinese spam posts as various European languages.

[Edited on 1-10-2015 by Polverone]

PGP Key and corresponding e-mail address

Sciencemadness Discussion Board » Fundamentals » Miscellaneous » response to the spams