Archive for the ‘Regular Expressions’ category

C#: Lower Case All XML Tags with Regex

April 24th, 2009

Sometimes when accepting an XML document from an uncontrolled source using Linq to XML, it’s useful to convert all tags and attributes to lower case before processing the XML. This is because Linq to XML is case-sensitive and you can’t always rely on the program producing the XML to follow your casing standard for elements and attributes.

So here’s a quick and dirty single line of code that will accomplish just this in C# using a regular expression:

Regex.Replace(
    xml, 
    @"<[^<>]+>",
    m => { return m.Value.ToLower(); }, 
    RegexOptions.Multiline | RegexOptions.Singleline);

And here’s that functionality all nice and wrapped up inside of an extension for XElement:

public static class XElementExt
{
    public static string LowerCaseTags(string xml)
    {
        return Regex.Replace(
            xml,
            @"<[^<>]+>",
            m => { return m.Value.ToLower(); },
            RegexOptions.Multiline | RegexOptions.Singleline);
    }
}

Note: The Regex class is defined in System.Text.RegularExpressions

Here’s an example of the resulting affect.

Before:

<ParentNode>
   <ChildItem TestAttribute="ValueCasing" >
	This text Will not Be Harmed!
   </ChildItem>
</ParentNode>

After:

<parentnode>
   <childitem testattribute="valuecasing" >
	This text Will not Be Harmed!
   </childitem>
</parentnode>

You’ll notice that with this method all text within element tags is converted to lower case. This means that attribute values will lose any special casing they may have had, which may or may not be a problem for what you’re doing.

Javascript: 1337-Speak Translator

February 3rd, 2009

So I borrowed this idea from a friend of mine who wrote the original implementation in C#, of which I ported to javascript because, well, I like playing in javascript.

Here’s a working example:

This is a simple method of converting between English and leet-speak. It uses Javascript’s regular expression library to do most of the heavy lifting, which led to a need to completely escape a string before using it in a regular expression. For this I borrowed some code from Simon Willison.

Here’s the html, pretty basic:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<body style="padding: 10px">
<div style="border: solid 1px Black; 
            padding: 5px; width: 350px; 
            background-color: White;">
    <label for="input">
        Enter message here:</label><br />
    <textarea id="input" name="input" rows="10" cols="40" 
    style="font-weight: bold;
           background-image: url('leetBG.png'); 
           background-attachment: fixed; 
           background-position: 160px 165px;
           background-repeat: no-repeat;"></textarea>
    <br />
    <input type="submit" value="Translate" 
     onclick="translateText();" />
    <select id="conversionType">
        <option value="e">English -> 1337</option>
        <option value="3">1337 -> English</option>
    </select>
</div>
</body>
</html>

And here’s the important stuff, the javascript:

<script type="text/javascript">
    // Create the Phrase translations arrays
    var PhrasesEnglish = 
        new Array('crap', 'dude', 'hacker',
                  'hacks', 'you', 'cool', 'oh my god',
                  'fear', 'power', 'own',
                  'what the hell', 'elite', 'for the win', 
                  'oh really', 'good game');
    var PhrasesLeet = 
        new Array('carp', 'dood', 'haxor', 'hax', 'joo',
                  'kewl', 'omg', 'ph43', 'powwah', 'pwn', 
                  'wth', 'leet', 'ftw', 'o rly', 'gg');
 
    // Create the Letter translations arrays
    var LettersEnglish = 
        new Array('n', 'b', 'k', 'd', 'e', 'f', 'g', 'h',
                  'p', 'm', 'r', 'l', 'o', 'q', 's', 't',
                  'u', 'x', 'w', 'y', 'z', 'c', 'a', 'j', 
                  'i', 'v', ' ');
    var LettersLeet = 
        new Array('/\\/', '|}', '|X', '[)', '3', '|=', 'gee', '|-|',
                  '|*', '(\\/)', '|2', '1', '()', '0', '$', '+',
                  '|_|', '><', '\\X/', '\'/', '2', '<', '/\\', '_|', 
                  '|', '\\/', '  ');
 
    // Translates text in input area to/from leet speak
    function translateText() {
        var inputString = document.getElementById('input').value;
 
        if (document.getElementById('conversionType').value == "e") {
            for (i = 0; i < PhrasesEnglish.length; ++i)
                inputString = inputString.replace(
                        new RegExp(PhrasesEnglish[i], "gi"),
                        PhrasesLeet[i]
                        );
 
            for (i = 0; i < LettersEnglish.length; ++i)
                inputString = inputString.replace(
                        new RegExp(LettersEnglish[i], "gi"),
                        LettersLeet[i]
                        );
        }
        else {
            for (i = 0; i < LettersLeet.length; ++i)
                inputString = inputString.replace(
                        new RegExp(RegExp.escape(LettersLeet[i]), "g"),
                        LettersEnglish[i]
                        );
 
            for (i = 0; i < PhrasesLeet.length; ++i)
                inputString = inputString.replace(
                        new RegExp(RegExp.escape(PhrasesLeet[i]), "g"),
                        PhrasesEnglish[i]
                        );
        }
 
        document.getElementById('input').value = inputString;
    }
 
    // This function is used to escape any special regular expression
    // characters in the search strings used to convert from leet to
    // english. Taken from: http://simonwillison.net/2006/Jan/20/escape/
    RegExp.escape = function(text) {
      if (!arguments.callee.sRE) {
        var specials = [
          '/', '.', '*', '+', '?', '|', '$',
          '(', ')', '[', ']', '{', '}', '\\'
        ];
        arguments.callee.sRE = new RegExp(
          '(\\' + specials.join('|\\') + ')', 'g'
        );
      }
      return text.replace(arguments.callee.sRE, '\\$1');
    }
</script>

So that’s it! It’s quite a bit of code, and there’s probably a better way of doing it but it was a lot of fun. Please post any suggestions or questions you might have.

Determine Credit Card Type with Javascript

November 7th, 2008

I don’t understand why most online commerce sites ask the user to select what type of credit card they are going to enter instead of discerning the type programmatically and displaying feedback to the user. So here’s a small example I came up with on how to do just this.

Here’s a working example:

Here’s the HTML:

<html>
  <body>
    <form>
      <input id="ccNumber" onChange="SetTypeText(this.value)" />
      <br />
      <div id="cardType"></div>
    </form>
  </body>
</html>

Here’s the important part, the javascript:

<script type="text/javascript">
 
	function SetTypeText(number)
	{
		var typeField = document.getElementById("cardType");
		typeField.innerHTML = GetCardType(number);
	}
 
        function GetCardType(number)
        {            
            var re = new RegExp("^4");
            if (number.match(re) != null)
                return "Visa";
 
            re = new RegExp("^(34|37)");
            if (number.match(re) != null)
                return "American Express";
 
            re = new RegExp("^5[1-5]");
            if (number.match(re) != null)
                return "MasterCard";
 
            re = new RegExp("^6011");
            if (number.match(re) != null)
                return "Discover";
 
            return "";
        }
</script>

Credit Card Regular Expression

November 7th, 2008

So for the project I’m currently working on I need to verify credit card numbers input by the user. So I found a regular expression online that would do almost all of it, but it lacked a few necessary validations such as Discover cards and 13-digit Visas. So I modified it to work with almost all forms of Visa, MasterCard, American Express, and Discover cards.

It also supports white space and dashes in between blocks of numbers, as would be found on an actual credit card.

Here it is:
^(((4\d{3})|(5[1-5]\d{2})|(6011))[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4})|(3[4,7][\d\s-]{13})|(4[\d\s-]{12})$

It’s not 100% perfect for catching invalid Discover or 13-digit Visa cards but it will recognize valid ones. For best results, strip out any non-digits from the input string before running it through the regular expression.

Cheers!

C#: Regular Expressions

September 27th, 2008

I’ve decided to document what little knowledge I have on using Regular Expressions in C#. Nothing grand, just a list of formats, special characters and usage.

Control Characters:

Character Matches
. Any character but the newline (\n)
$ Characters at the end of a string
^ Characters at the beginning of a string. Also used in conjunction with ‘[]’ to specify “not.”
+ One or more of the specified characters
* Zero or more of the specified characters
? Zero or One of the specified characters
\ Used to escape special characters as well as signify special character sets
( ) Used to specify a collection of characters to match
[ ] Used to specify a set of single characters or ranges to match
{ } Used to specify how many times to match a given character(s)
| Used as a logical OR. Allows one or more expressions to be selected for a match

Special Character Sets:

Character Matches
\w Any word character. Same as [A-Za-z0-9_]
\W Any non-word character. Same as [^A-Za-z0-9_]
\s Any whitespace character. Same as [ \t\v]
\S Any non-whitespace character. Same as [^ \t\v]
\d Any digit. Same as [0-9]
\D Any non digit. Same as [^0-9]

» Read more: C#: Regular Expressions