Searching for social security numbers in a file using a regular expression and egrep

egrep is a version of grep that supports extended regular expressions. egrep can be used to find all social security numbers in a file using a basic regex. All US social security numbers have the format: 123-45-6789. This can be broken into the regex containing a character class of three numeric digits, a dash, a character class of 2 numeric digits, a dash and finally a character class of four numeric digits.


egrep "[0-9]{3}[-][0-9]{2}[-][0-9]{4}" file

If the file contains social security numbers without dashes this regex will not match. To improve upon the regex you can use the ? operator that matches exactly 0 or 1 occurrence of the preceding character class.

egrep "[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}" file

Placing the dash in its own character class is not required. I find that it makes the regex easier to read.

My next post will show you how to match a US telephone number using a slightly more complicated regular expression.

  • Dave Christian

    Good Example, but it’s little more complicated than that (isn’t everything?).
    While that regex is highly inclusive you may want to consider this:

    (^|[[:space:]])[0-9]{3}[-][0-9]{2}[-][0-9]{4}[[:space:]]

    This will ensure that either a space exists on either the beginning of a new line or a space is at the beginning of the data and that a space follows the data. You'd be amazed how much random data will match a Social Security Number.

    For an outline of what the paramenters of SSN dos and don'ts (e.g. no SSN starts with '666'), see for all parameters:

    http://stevemorse.org/ssn/ssn.html

    Here's one that is close, but not quite there that I dug up fropm a past life. It was designed for the Java regex engine:


    ^(?!000)(?!588)(?!666)(?!69[1-9])(?!73[4-9]|7[4-5]\d|76[0123])([0-6]\d\{2}|7([0-6]\d|7[012]))([ -])?(?!00)\d\d\3(?!0000)\d{4}$

  • Pingback: Matching a US Telephone Number With egrep Using Regular Expressions | David's Unix Tips and Tricks()