Searching for social security numbers in a file using a regular expression and egrep

egrep is a version of grep that supports extended regular expressions. egrep can be used to find all social security numbers in a file using a basic regex. All US social security numbers have the format: 123-45-6789. This can be broken into the regex containing a character class of three numeric digits, a dash, a character class of 2 numeric digits, a dash and finally a character class of four numeric digits.


egrep "[0-9]{3}[-][0-9]{2}[-][0-9]{4}" file

If the file contains social security numbers without dashes this regex will not match. To improve upon the regex you can use the ? operator that matches exactly 0 or 1 occurrence of the preceding character class.

egrep "[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}" file

Placing the dash in its own character class is not required. I find that it makes the regex easier to read.

My next post will show you how to match a US telephone number using a slightly more complicated regular expression.

Display the contents of a Redhat RPM or Debian deb package

Displaying the contents of an RPM or deb package is simple. Each of these can be thought of as an archive of files plus an install script. To view the files in the archive execute the following:

Redhat rpm:


rpm -qlp package.rpm

Debian deb:

dpkg -c package.deb

For Debian/Ubuntu systems I use a program called apt-file that allows you to search for files provided by any package that is available to your system even if that package is not installed. This comes in handy if you are building a program from source that has libraries that it depends on. Finding a library is not as easy as finding a program using aptitude.

I downloaded the traceroute deb and displayed the contents using the command above. It provides a library called libsupp.a. If I was building an application that depends on libsupp.a, I wouldn’t be able to easily find it using aptitude. apt-file would show that it’s provided by traceroute.


dcolon@gold:~$ aptitude search libsupp.a
dcolon@gold:~$
dcolon@gold:~$ apt-file search libsupp.a
traceroute: /usr/lib/libsupp.a
dcolon@gold:~$

Logical AND and OR Using AWK

Before learning about AWK’s logical AND operator I used to string a pair of grep commands together to find two search terms:

grep abc filename | grep def

to find lines with both abc and def. This can be shortened into a single AWK command:

awk '/abc/&&/def/' filename

A logical OR is also provided using a pair of pipes ||

awk '/abc/||/def/' filename

You can also use the equivalent egrep command:

egrep 'abc|def' filename

To get comfortable with AWK, try using it instead of grep for a week. AWK has many more features than just printing fields from a file.

Quickly Generate Sequential Bind Zone Files

For a large number of servers in a given ip block, I frequently see people use the following notation which incorporates the ip address with the hostname:


ip-10.10.10.1.mydomain.com
ip-10.10.10.2.mydomain.com
...
ip-10.10.10.255.mydomain.com

Using this type of notation makes it easy to script the creation of your zone file. To generate the forward dns for this zone (mydomain.com), I would do the following:

for i in $(seq 1 255)
do
   echo "ip-10.10.10.$i        IN A      10.10.10.$i" >> db.mydomain.com
done

Conversely, to generate the 10.10.10.0.in-addr.arpa reverse zone do the following:

for i in $(seq 1 255)
do
   echo "$i             IN PTR    ip-10.10.10.$i.mydomain.com." >> 10.10.10.0.in-addr.arpa
done

This creates the sequence from 1 to 255. For each iteration, you generate a line using echo for the given value of $i. Each line is appended to your zone file.

I like to use tabs instead of spaces in between columns. To add tabs to the echo statement, you need to escape each tab with a control-v. If you want to add two tabs, hit ‘control-v’ followed by the tab key followed by another ‘control-v’ and another tab.

There are obviously many other ways to do this. You can use a for loop:


for ((i=1; i<=255; i++));