Giant function or crazy simple RegExp?

Working with some old code I ran across a 37-line JavaScript function that seems to just strip non-alphanumeric characters from a text <input>. It took a string as its argument and returned a sanitized string with underscores in place of the bogus characters. There was also a function to throw up a custom modal dialog telling the user their input was being ‘corrected’ before submission (but not giving them a chance to stop and change it if they’d like).

The string replace method was called 30 separate times, like:

value = value.replace(/[&]/,"_"); // x30 to catch 30 different illegal characters

…30 separate times:

value = value.replace(/[&]/, "_");
value = value.replace(/[%]/, "_");  

That’ll get the job done, in a very verbose and tedious way… But… How about just chaining the .replace() method, like:

value = value.replace(/[&]/,"_").replace(/[%]/,"_");  //etc.

But even that’s a bit silly. I thought, “Why not just put all the characters into a SINGLE regex and do a global search for those characters and replace them all at once.” That makes sense, right?

value = value.replace(/[\`~!@#$%^&*()\-_=+\[\]{};:'"<>?,.\/\\|\s]/g, "_");

That smashes it down to a single line (down from 30). But… What if we don’t have all the characters we want to block accounted for there? Someone could try to sneak in a © or ¢. There’s got to be a way to only ALLOW the proper (alpha, number, and underscore) characters. Of course there is!

value = value.replace(/\W/g, "_");

Is that really all I needed? Alright then. 30 lines down to ONE super-simple RegExp replace and we’re good to go.

Quite silly.

Here are some resources that helped set all of this silliness straight:

Kind of:


The last bit of info that did the trick (the “Character Classes” section): yes, really, an ancient JavaScript Kit link