Ongoing Issues with the Spam Filter

104 posts / 0 new
Last post
MegB
Ongoing Issues with the Spam Filter

~

MegB

A continuation of the spam filter thread ...

Doug

It really is a pain. It might be okay if it would reliably accept the CAPTCHA solutions but it doesn't.

NDPP

I have repeatedly attempted to post, without success - rejected each time with a prompt telling me the previous CAPTCHA entry was incorrect. This is not working folks...

Unionist

[b]PLEASE TURN IT OFF[/b].

There is at least one babbler who is unable to get past CATPCHA because of vision issues, who has PMed me. That's unacceptable.

TURN IT OFF and put it back when it's working properly. And put it on the registration page.

We all understand and sympathize with the mods' problem of wasting hours dealing with the spammers, but a solution which excludes babblers is no solution at all.

MegB

For people with vision issues there is an audio link for the CAPTCHA code.  Unfortunately it doesn't work any better than the alpha-numeric one.

Tech support has been notified of the issues, but will likely be unable to do anything until Monday.  In the meantime, please keep trying.

Boom Boom Boom Boom's picture

I support the idea of a spam filter, but the best place for it would be the Registration Page.

As for me, I'm 61 with vision and hearing issues, and CAPTCHA is really difficult for me. I've been trying to post all day - I'm trying again, will keep trying, but I hope the techies put the spam filter just on the Registration Page.

 

 

ETA: finally got an easy CAPTCHA!

 

ETA: is there a reason why CAPTCHA letters and numbers are so difficult to read?

 

ETA: EVERY post I try to send ends up with this message attached: We are sorry, but the spam filter on this site decided that your submission could be spam. Please fill in the CAPTCHA below to get your submission accepted.

Aristotleded24

Let's see if this post makes it through....

Unionist

[On behalf of Aristotleded24:]

Aristotleded24 wrote:

NDPP wrote:
I have repeatedly attempted to post, without success - rejected each time with a prompt telling me the previous CAPTCHA entry was incorrect. This is not working folks...

I'm having the same problem. Several of us have been here for years, you should know which IP addresses are safe.

N.Beltov N.Beltov's picture

test Czech Slovak Czech Slovak check check check ...

ETA: I'm not finding this spam protection particularly onerous. Annoying, yes.

Boom Boom Boom Boom's picture

testing, one two three....

I fully support the use of a spam filter for rabble/babble, because moderators should not have to spend time deleting spam. However, I believe just using the spam filter on the Registration Page - not every babble thread - would solve the main problem of  spambots, because they would not be able to register.

I have problems with CAPTCHA because I am both hearing and vision impaired, by the way.

 

ETA: looks like the spam filter is turned off - didn't get a CAPTCHA message with this post.Smile

6079_Smith_W

I just tried to post a link and it wouldn't even give me the option of using captcha. I had to chop it up so the hyperlink was broken.

MegB

Boom Boom wrote:

testing, one two three....

I fully support the use of a spam filter for rabble/babble, because moderators should not have to spend time deleting spam. However, I believe just using the spam filter on the Registration Page - not every babble thread - would solve the main problem of  spambots, because they would not be able to register.

I have problems with CAPTCHA because I am both hearing and vision impaired, by the way.

ETA: looks like the spam filter is turned off - didn't get a CAPTCHA message with this post.Smile

The software is 'learning' ... but it's taking a while.  I expect that in a few days the CAPTCHA thingy will be a rare occurance.

Boom Boom Boom Boom's picture

Thanks, RW and CF - sorry I've been such a pest on this issue.  I look forward to the days of spam-free babble and Mods using their talents on other issues! Smile

 

Son of a gun - this edited post triggered CAPTCHA, whereas the original post did not! Frown

I wonder if it is the 'smiley faces' that trigger CAPTCHA???

 

 

We are sorry, but the spam filter on this site decided that your submission could be spam. Please fill in the CAPTCHA below to get your submission accepted.

Fidel

testing 123 ...

 

 

wage zombie

Boom Boom wrote:

ETA: is there a reason why CAPTCHA letters and numbers are so difficult to read?

Yes, it's by design. The idea is to try to ensure that the poster is human and not a bot.  So letters are often printed out slanted, with different fonts and thicknesses for each letter.  The bots that try to convert image to text try to figure out what letters are in the image.  Unfortunately, making it harder for bots to read means making it harder for humans to read.

KenS

It has gone beyond being hard.

I've never had a problem doing them anywhere else. But for about 24 hours I could not get through no matter how many times I tried.

Just guessing my access was made worse by slow speed connection.

Something changed in the last couple hours. I still always get a Captcha, but I get through.

Just occured to me that my low speed is less sluggish than normal, so maybe thats why I get through now.

MegB

KenS wrote:

It has gone beyond being hard.

I've never had a problem doing them anywhere else. But for about 24 hours I could not get through no matter how many times I tried.

Just guessing my access was made worse by slow speed connection.

Something changed in the last couple hours. I still always get a Captcha, but I get through.

Just occured to me that my low speed is less sluggish than normal, so maybe thats why I get through now.

I'm pretty sure it has more to do with the fact that you're attempting to post, logging out, and then trying again.  The software is learning that you aren't a spammer.  That seems to be the case with many babblers, but the connection speed is an interesting idea.  I wish I knew more about the actual software ... what version it is, how many versions past beta-testing it is, etc.

KenS

Greek to me.

When it wasnt working I would every few hours try just a simple line. Never worked until a couple hours ago, when my connection is faster, whether or not there is a causal relation.

KenS

Hey,

since everything on-line is slo-mo for me... I noticed while waiting for the squirrels to turn the cage that on the last CAPTCHA I had got the last letter wrong. Definitely. But it let me through anyway.

MegB

KenS wrote:

Hey,

since everything on-line is slo-mo for me... I noticed while waiting for the squirrels to turn the cage that on the last CAPTCHA I had got the last letter wrong. Definitely. But it let me through anyway.

 

That's bizarre.

KenS

CAPTCHA gone for me.

[Havent tried anything 'weird' like a hyperlink.]

Boom Boom Boom Boom's picture

One time today CAPTCHA would not let me through, and on my second try I did the exact same code as the first time, and got through.

Another time, just a few hours ago, I deliberately typed in the wrong CAPTCHA code, and got through the second time.

Weird, for sure.

Boom Boom Boom Boom's picture

I'm getting CAPTCH'd whenever I go back to edit a post, which strikes me as unfair, because it's basically the same post that went through before with just some changes.

Unionist

KenS wrote:

Hey,

since everything on-line is slo-mo for me... I noticed while waiting for the squirrels to turn the cage that on the last CAPTCHA I had got the last letter wrong. Definitely. But it let me through anyway.

Good filter! It allows for human error! Error-free typing can only be done by a rrobot.

 

Bubbles

This spam filter seems to have it backwards. Why is this anti-spam robot being educated by humans and not by spam robots. Does it not make more sense for it to learn what robotic spam is, as oposed to human babblings.  It would be a lot quicker, since robots can generate spam a lot faster then humans can babble. Also humans are far more complex than robots.

Slumberjack

A few days of teething pains and a little patience will be worth it if they're able to come up with something that will benefit everyone here, including the beleaguered mods.

M. Spector M. Spector's picture

Could someone please explain how anti-spam robots are supposed to "learn" what is spam and what isn't, through some sort of trial and error process?

So far the anti-spam robots have failed to "learn" a single thing, as far as I can see. Why aren't they [b]programmed[/b] by humans to know some basic truths about spam? Like the fact that anyone who is logged into a user account that has previously passed not one, not two, but a dozen Captcha tests is unlikely to be a spambot? 

If you were going to "teach" a spam-fighting robot certain basic truths, wouldn't it be pretty close to the top of your list on the "curriculum" to teach it how to recognize posters who have already proven their credentials as non-spammers?

And hasn't anti-spam technology already progressed far beyond the obvious limitations of these robot drone gate-keepers? My ISP manages to stop 99% of the spam that comes to my email inbox. How hard can it be?

It's not the robots that need teaching, it's the people who program them and set them to do tasks that they are clearly unsuited for.

ETA: One more question, if the robots will permit: What's with this "Site Upgrade"? Is it just a euphemism for a robot-controlled stop-and-frisk program?

Catchfire Catchfire's picture

Thanks everyone for the feedback about the spam filter. Popular, it ain't. I find it annoying as well, and their inner workings seem arcane to me. However, the spam that was driving Rebeccan and me towards early retirement, madness and death, has almost completely stopped.

The filter learns by flagging: every time Rebecca or I flag a spam post, the filter "learns" what spam looks like here--one of the problems may arise because many of our spammers are actually humans, and so their posts resembler ours. The deterrant of the CAPTCHA may make their work less profitable, which is why spam is down--as opposed to the filter blocking their posts. That's all fine, but if it is also deterring babblers, that's obviously not ideal (although it depends who you are...). Find out more about how it works here.

We don't have the capacity to ban or otherwise flag IP addresses, so unfortunately, longtime babblers will continue to be treated with suspicion by our border patrol.

Finally, while we would all like our non-profit website and open-source software to immediately let in all the good posts and filter out all the spam with 100% accuracy, that's simply not feasible. The filter will do some growing, and if it becomes ultimately unworkable, we'll turn it off. I don't personally find it particularly onerous (although I suspect this post will be flagged ETA: Yep!), and I've already noticed an improvement. So again: please bear with us.

M. Spector M. Spector's picture

Catchfire wrote:

We don't have the capacity to ban or otherwise flag IP addresses, so unfortunately, longtime babblers will continue to be treated with suspicion by our border patrol.

Nonsense. IP addresses don't enter into it. It doesn't matter whether I'm accessing babble from my home, office, school, wi-fi café, or helicopter. All the robot needs to know is that I'm logged in as "M. Spector" and therefore not a spambot. Once "M. Spector" has passed one Captcha test, that account should [b]automatically[/b] be put on the robot's safe list.

Now you're going to tell me there is no safe list? 

Catchfire Catchfire's picture

M. Spector, it's not "nonsense." It's about the capacity of the software. Currently, there is no "safe list" of which I am aware. Unless you have a background in software or web development, such accusations will likely continue to have little bearing on reality.

wage zombie

M. Spector wrote:

Could someone please explain how anti-spam robots are supposed to "learn" what is spam and what isn't, through some sort of trial and error process?

There are many different ways that anti-spam tools can learn what is spam.  I can't tell if you actually want to hear about that or if it's just a rhetorical question.

Quote:

So far the anti-spam robots have failed to "learn" a single thing, as far as I can see. Why aren't they [b]programmed[/b] by humans to know some basic truths about spam? Like the fact that anyone who is logged into a user account that has previously passed not one, not two, but a dozen Captcha tests is unlikely to be a spambot? 

It depends how the system is designed.  A system that gave free reign provided that exactly one dozen captchas were entered would be a pretty easy system to game.  Additionally, bots that try to "read" captcha images might do better with some captchas than others.

Quote:

If you were going to "teach" a spam-fighting robot certain basic truths, wouldn't it be pretty close to the top of your list on the "curriculum" to teach it how to recognize posters who have already proven their credentials as non-spammers?

Sure, that would be a great strategy. I suspect babble is not currently built to handle such a solution right now.  If there were a user role within Drupal for "trusted babblers", then it would probably be easy to enough to turn it off for users with that role.  I doubt there is such a role.  Could they add such a role?  Sure, but then it becomes a much more complex problem.  What other benefits come with the role?  How do babblers become "trusted"?  Would it be a manual process, with babblers one by one getting added by mods, or some automatic process based on number of posts?  If it were automatic, would they need to allow some people to be added manually?, etc.  All of these things are possible to do, but they go beyond setting up a spam filter which was probably urgently needed.

Quote:

And hasn't anti-spam technology already progressed far beyond the obvious limitations of these robot drone gate-keepers? My ISP manages to stop 99% of the spam that comes to my email inbox. How hard can it be?

Apples to oranges, not really a comparison that can be made.

I would agree that there are much better ways to deal with spam.  Here's one of the keynotes from this year's drupalcon, he talks about better ways of handling spam: http://www.archive.org/details/drupalconchi_day2_keynote_clay_shirky (The 1st 12 minutes are about the conference, the talk is about 40 mins.

Quote:

It's not the robots that need teaching, it's the people who program them and set them to do tasks that they are clearly unsuited for.

Still winning hearts and minds of workers everywhere eh Spector?

Quote:

ETA: One more question, if the robots will permit: What's with this "Site Upgrade"? Is it just a euphemism for a robot-controlled stop-and-frisk program?

Yeah, babble is a police state, and we are having our rights taken away right now by unrasonable search and seizure.  What's really disgusting about all of this is that the NDP has yet to put out a statement on this oppression.

Aristotleded24

You know, it's been a few days, there has been some discussion and explanation from the mods, and I share Spector's frustrations on this issue. For some, it is a minor annoyance, others have had much more severe difficulty. In my case, I've got a post for the front page blog that I have been waiting to post for a couple of days now, and I still can't get it through. I do not consider this issue to have been resolved.

Catchfire Catchfire's picture

Aristotled, I don't consider the issue resolved either. I sympathize with your frustrations, and I've send the info you sent me to the techies. They don't work weekends, though, so it will be a few days before we can address everythiing to satisfaction. 

Boom Boom Boom Boom's picture

I don't understand why CAPTCHA is so selective - I consistently get CAPTCHA'd on half the babble threads I post on, not on the other half.

M. Spector M. Spector's picture

wage zombie wrote:

There are many different ways that anti-spam tools can learn what is spam. I can't tell if you actually want to hear about that or if it's just a rhetorical question.

I'd be delighted if you could explain to me exactly what it is that the anti-spam robot "learns" after it has challenged a dozen posts made by the same registered user on no grounds whatsoever (not even the presence of a "suspicious" hyperlink) and has been proven wrong each and every time by the successful negotiation of the Captcha test.

wage zombie wrote:

It depends how the system is designed. A system that gave free reign (sic) provided that exactly one dozen captchas were entered would be a pretty easy system to game. Additionally, bots that try to "read" captcha images might do better with some captchas than others.

That's exactly what I object to. It's obvious that no amount of proof of bona fides (if that's what Captcha is really all about) will suffice to call off the robots. Unless somebody comes to their senses and reprograms or removes the robot, there will never be an end to the baseless challenges to the credentials of every poster on rabble and babble.

wage zombie wrote:

How do babblers become "trusted"?

Isn't that what we are being told the robot is supposed to be "learning" to do? And what level of trust do you consider sufficient to be allowed to post on babble without being challenged by the robot? Would I have to answer 100 Captchas correctly in order to persuade it that I am not a spam robot? 1000? Would any number satisfy you (or it)?

wage zombie wrote:

I would agree that there are much better ways to deal with spam.

Thank you. I agree entirely.

wage zombie wrote:

Yeah, babble is a police state, and we are having our rights taken away right now by unrasonable search and seizure. What's really disgusting about all of this is that the NDP has yet to put out a statement on this oppression.

You said that, not me. But calling this a "Site Upgrade" is as ironic as calling torture an "enhanced interrogation technique".

N.Beltov N.Beltov's picture

wage zombie wrote:
Yeah, babble is a police state, and we are having our rights taken away right now by unrasonable search and seizure.  What's really disgusting about all of this is that the NDP has yet to put out a statement on this oppression.

lol!

wage zombie

M. Spector wrote:

That's exactly what I object to. It's obvious that no amount of proof of bona fides (if that's what Captcha is really all about) will suffice to call off the robots. Unless somebody comes to their senses and reprograms or removes the robot, there will never be an end to the baseless challenges to the credentials of every poster on rabble and babble.

It's clear that it's not workable as it is.  Nobody disputes that.  They will fix it, and enough complaints have already come forth.  I doubt further complaints will have any effect on the time it takes to fix it.  Because the time it takes to fix it actually does not depend on how annoyed people get.

Quote:

wage zombie wrote:
How do babblers become "trusted"?

Isn't that what we are being told the robot is supposed to be "learning" to do?

No.  The robot is learning to judge whether content is spam.

And you seem to have misunderstood my question "how do babblers become trusted?"  This would be for rabble to define.  I'd say having a hundred lifetime babble posts would be a fine, yet arbitrary value, as a gateway to a "trusted babbler".  That would be for rabble to decide.  Or mods could just bless people.  But the how is not just that, it is, how is it going to work, ie in the code.

Quote:

And what level of trust do you consider sufficient to be allowed to post on babble without being challenged by the robot? Would I have to answer 100 Captchas correctly in order to persuade it that I am not a spam robot? 1000? Would any number satisfy you (or it)?

No.  Having a certain number of captchas to pass is a poor system.  You suggested turning the filter off after a doman passed captchas.  I'm saying that a) that's not how the filter is designed to work and b) while that might seem like a good, simple system that would make this less uncomfortable for us right now, there's actually a lot more to the problem than that.

Again, I think there are better ways to deal with spam.  Having a recommendation engine, where content gets voted up (or down), is one such way.  However, a recommendation engine is a lot more complex of a thing to add on, and it could be a controversial proposal on babble.  The link I provided a few posts up is a video where someone is talking about these ideas.

Quote:

wage zombie wrote:

I would agree that there are much better ways to deal with spam.

Thank you. I agree entirely.

I'm not saying that the current situation is workable.

Most software projects fail, and it is actually very, very hard to "get it right", despite the idea that it's just clicking around on buttons.

 

Fidel

Does anyone know javascript? Because I've seen a few ideal candidates for hacking on userscripts.org but don't have time. One is pretty big, and you'd need to be a frickin idiot savant to figure out the neural net end of it, which they don't have working for their own purposes according to the comments. Might just need some tweaking. But I'm currently tied-up with four thousand things on the go, so I'm out of the loop.

M. Spector M. Spector's picture

wage zombie wrote:

I doubt further complaints will have any effect on the time it takes to fix it. Because the time it takes to fix it actually does not depend on how annoyed people get.

Since you seem to have assumed the role of complaints gatekeper, I will point out to you that my interventions here have not been about repeating complaints made by others, but about challenging the official disinformation campaign concerning the so-called "spam filter". Specifically, I have challenged the following:

  • The Orwellian characterization of this new regime as a "Site Upgrade".
  • The assurances that the spam robot is "learning" how to do its job properly, and until it does, we all just have to kowtow to its demands.
  • The casual acceptance of the principle of guilty-until-proven-innocent-and-not-even-then [there is no "safe list", remember?] by people who should know better.
  • The red herring of IP addresses, when the identity of the poster is not in question.

It's all about rabble treating its readers and contributors with disrespect. That's not something the IP tech pepople can fix.

wage zombie wrote:

No. Having a certain number of captchas to pass is a poor system. You suggested turning the filter off after a doman passed captchas. I'm saying that a) that's not how the filter is designed to work and b) while that might seem like a good, simple system that would make this less uncomfortable for us right now, there's actually a lot more to the problem than that.

I didn't suggest "turning the filter off after a domain passed captchas". I suggest allocating a teensy bit of memory to the spam filter so that it can have a "safe list" of posters' NAMES (NOT domains or IP addresses!) who have passed the Captcha test and don't have to be treated like suspected spammers forever after.

I did suggest turning off the filter altogether until someone can get it right.

Snert Snert's picture

Quote:
The casual acceptance of the principle of guilty-until-proven-innocent-and-not-even-then

 

Do you regard having to prove that you are who you are BY LOGGING IN to be a similarly unacceptable assumption of your guilt?

pogge

Catchfire wrote:

We don't have the capacity to ban or otherwise flag IP addresses

You do in a stock installation of Drupal 7. Something to look forward to!

Snert Snert's picture

Quote:

We don't have the capacity to ban or otherwise flag IP addresses

 

Is that new? I seem to recall it being done back in the days of Michelle, for example (and I'm talking about post-upgrade).

Lard Tunderin Jeezus Lard Tunderin Jeezus's picture

As another ongoing issue, I find that the site has been unable to connect with alarming regularity since the upgrade. It was painfully slow to begin with, but is approaching unusable as of late...

M. Spector M. Spector's picture

Snert wrote:

Quote:
The casual acceptance of the principle of guilty-until-proven-innocent-and-not-even-then

Do you regard having to prove that you are who you are BY LOGGING IN to be a similarly unacceptable assumption of your guilt?

I would if I had to log in again every time I wanted to preview, submit, or edit a post. That's why the software REMEMBERS your log-in until you log out. It's not exactly rocket surgery to write software that does that.

Logging in is simply proof of your identity. The problem with the so-called spam filter is not that it wants me to confirm my identity every time, but that it doesn't [b]care[/b] about my identity. The fact that I'm logged in and have passed the Captcha test repeatedly counts for nothing. I remain under constant suspicion.

If I try to make a post with too many hyperlinks, I don't even get the opportunity to pass the Captcha test once again. And nobody has told me how many hyperlinks I am allowed to include in a post. Only the robot knows for sure.  

Catchfire Catchfire's picture

Pogge: yes, I am looking forward to the installation of the next drupal upgrade. There are a lot of goodies that will make babblers' lives easier.

Snert: I'm not sure, but banning IP addresses might have been possible pre-upgrade, but Michelle was never able to ban them post-upgrade. We can see them (so we can suspend sock puppets in advance), but we can't ban them.

M. Spector: I find your tone aggressive, abusive and disingenuous. The accusation that Rebbeca and I have "disrespect" for babblers is offensive. Usually you confine your posts to things you know a great deal about; rather than, as in this case, nothing. That probably has something to do with why your posts now read like a paranoid caricature, which lack your usual grounding in historical context. I have no idea how you expect us mods to respond to you in good faith while you continue your utterly disrespectful attack on our motives, capabilities and word.

KenS

Just venting. [Altho maybe there is a surprise in here for the techies.]

Talk about seriously irritating:

Click on the Post after doing the CAPTCHA, and end up at the CAPTCHA provider's website.

[With the additional irritation of losing the post if I didnt copy it.]

This has happened several times now.

What LTJ said: a few times the site just will not load, at the same time my slo speed pulls up the most difficult of sites for it.

Catchfire Catchfire's picture

LTJ: we haven't implemented a site upgrade, although there have been a number of changes to the site. The spam filter is external software, so it is likely not affecting load times. I agree that the site is slow in general, so if you are noticing a change in load times, it is more likely to do with the incredible increase in web traffic rabble hs experienced. Last month was our highest ever, with a 39% increase from the previous month. Fixing the speed of the site is a long-term goal.

KenS: that hasn't happened to me. What button or link are you clicking on to take you to the mollom site?

KenS

Catchfire: it is when I am doing something like copying the text before I do the Captcha [because I'm likely to lose it]. There may be other things I am doing at the same time while waiting for the usual slo-mo loading... not sure. But I'm pretty sure it only happens either before or in the middle of doing the Captcha- before I hit any button.

Life, the unive...

has happened to me too KenS-  a number of times in fact.  I am going to give up.  I'll check back in a few weeks.  Oh and I have had a number of posts rejected over and over even though I keep typing the captcha right each time a new one pops up.  The site is becoming 'read only'

Slumberjack

I usually copy the text before posting, just in case things go all wonky.  Usually two or three attempts are required to get through the Captcha, but the good news for all of my readers is that I'm not in the least deterred by it.

Pages

Topic locked