Change the Extension on a Large Number of Files

Here’s a common problem. You untar an archive and it contains a large number of files that end .jpeg. You need them to have a .jpg extension. At first glance, you might be tempted to try this:
mv *.jpeg *.jpg

Give it a try and you will find it doesn’t work. The correct way to change the extension is to iterate over all of the files. Let’s start with the following:

dcolon@dcolonbuntu:/tmp/foobar$ ls
friday.jpeg  monday.jpeg  Saturday.jpeg  sunday.jpeg  Thursday.jpeg  Tuesday.jpeg  WEDNESDAY.jpeg
dcolon@dcolonbuntu:/tmp/foobar$

Once in the loop, you need to save the filename before the extension. I use a temporary variable and awk to extract it.

Here is my solution:

dcolon@dcolonbuntu:/tmp/foobar$ for i in *.jpeg
> do
>    basefilename=$(echo $i | awk -F.jpeg '{print $1}')
>    mv "$i" "$basefilename.jpg"
> done
dcolon@dcolonbuntu:/tmp/foobar$ ls
friday.jpg  monday.jpg  Saturday.jpg  sunday.jpg  Thursday.jpg  Tuesday.jpg  WEDNESDAY.jpg
dcolon@dcolonbuntu:/tmp/foobar$

I use double quotes around the variable names to compensate for filenames with spaces. In the awk statement I use the entire replacement pattern as my field separator. Using -F. will fail if you have a filename like foo.bar.jpeg.

Bash Hostname Completion

One of the more well known features in bash is command and filename tab completion. Installing the bash-completion package adds onto this. This package enables hostname completion and a lot more. After installing bash-completion, add the following to your .bashrc:

. /etc/bash_completion
complete -F _known_hosts ssh
 
I also suggest adding
complete -F _known_hosts ping
complete -F _known_hosts traceroute

Hostname completion relies on the ssh known_hosts file. Most modern distributions hash the ~/.ssh/known_hosts file for security reasons. This prevents hostname completion from working. If you are comfortable turning off hostname hashing, then add the following to your ~/.ssh/config:
HashKnownHosts no

If you had to turn off hostname hashing, you will need to re-populate your known_hosts file. I suggest creating a list of all of your hosts and logging into them in a for loop:


for i in $(cat hostlist)
do
   ssh -n -o StrictHostKeyChecking=no $i "uname -a"
done

This assumes that you are using ssh keys. If you are not, you will need to type your password for each host. At this point you can now use hostname tab completion. If you added the second two complete commands, the same hostname completion will work for ping and traceroute.

Copy and Paste for KeePass Under Linux

If you are like most modern Internet users, you subscribe to dozens of services and websites and need an account for each one. Using the same username and password on all of these sites is easy but a terrible idea from a security perspective. Enter KeePass. KeePass is an encrypted software safe for all of your usernames and passwords. It is multi-platform and open source which is important to me. I use it under Linux, Mac OSX, Windows, iPhone/iPad, and Blackberry. To use version 2.0 under Linux, you need to run it under Mono.

One problem I ran into when using KeePass under Linux is that copy/paste does not work out of the box in its default configuration. Google pointed out that Linux has two copy buffers. Details can be found here. This led me to the tool autocutsel and a post on Superuser. After running:

$ autocutsel &
$ autocutsel -s PRIMARY &

I can now copy/paste from KeePass.

I recommend using a very long passphrase for your master password. I synchronize my KeePass database in Dropbox so I can access it from all devices anywhere I am as long as I have an Internet connection.

Matching a US Telephone Number With egrep Using Regular Expressions

This is the follow up to my post searching for social security numbers. US telephone numbers use the following format that can easily be matched with a regular expression.
(215) 555-1212
215-555-1212
215 555 1212
215.555.1212
2155551212

The phone number can be broken down into a series of character classes. Using egrep, character classes are written inside of square brackets. The character class [0-9] represents a single number from 0-9. You can expand upon this and match a series of numbers from the character class by following it with a number inside of curly braces. [3-7]{3} matches exactly three numbers in the range of 3 through 7. We will use this notation to build the three parts of the phone number.

You can also build character classes containing specific characters or symbols. After the first three numbers of the phone number there are a few possibilities for the next character. It can be a right paren, a hyphen, a space, or a period. It can also be none of these. The ? operator matches exactly zero or one instance. Putting these two concepts together, we would build a character class and use the ? operator: [)- .]?

OK, let’s combine all of these concepts to build our regex. This is just one solution that I’ve come up with to match a phone number. The beauty of Unix is that there are many other solutions that are correct; some of which are probably better than my solution.

egrep “[(]?[2-9]{1}[0-9]{2}[)-. ]?[2-9]{1}[0-9]{2}[-. ]?[0-9]{4}” filename

This reads: zero or one left paren followed by a single number in the range two through nine, followed by two numbers in the range zero through nine, followed by zero or one right paren, hyphen, period, or space, followed by a single number in the range two through nine, followed by two numbers in the range zero through nine, followed by zero or one hyphen, period, or space followed by four numbers in the range zero through nine.

Until the explosion of cell phones, US area codes followed the format: number from 2-9, a 0 or 1, followed by a number from 0-9. When additional area codes were needed to accommodate the growing number of phone numbers, the requirement that the middle digit be a 0 or 1 was dropped.

If you have any questions about this article, please post in the comments section.

Update:
I updated the regex thanks to the feedback from Boris. The first and fourth digits cannot contain a zero or one so I created two separate character classes to accommodate that requirement.

Searching for social security numbers in a file using a regular expression and egrep

egrep is a version of grep that supports extended regular expressions. egrep can be used to find all social security numbers in a file using a basic regex. All US social security numbers have the format: 123-45-6789. This can be broken into the regex containing a character class of three numeric digits, a dash, a character class of 2 numeric digits, a dash and finally a character class of four numeric digits.


egrep "[0-9]{3}[-][0-9]{2}[-][0-9]{4}" file

If the file contains social security numbers without dashes this regex will not match. To improve upon the regex you can use the ? operator that matches exactly 0 or 1 occurrence of the preceding character class.

egrep "[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}" file

Placing the dash in its own character class is not required. I find that it makes the regex easier to read.

My next post will show you how to match a US telephone number using a slightly more complicated regular expression.

Display the contents of a Redhat RPM or Debian deb package

Displaying the contents of an RPM or deb package is simple. Each of these can be thought of as an archive of files plus an install script. To view the files in the archive execute the following:

Redhat rpm:


rpm -qlp package.rpm

Debian deb:

dpkg -c package.deb

For Debian/Ubuntu systems I use a program called apt-file that allows you to search for files provided by any package that is available to your system even if that package is not installed. This comes in handy if you are building a program from source that has libraries that it depends on. Finding a library is not as easy as finding a program using aptitude.

I downloaded the traceroute deb and displayed the contents using the command above. It provides a library called libsupp.a. If I was building an application that depends on libsupp.a, I wouldn’t be able to easily find it using aptitude. apt-file would show that it’s provided by traceroute.


dcolon@gold:~$ aptitude search libsupp.a
dcolon@gold:~$
dcolon@gold:~$ apt-file search libsupp.a
traceroute: /usr/lib/libsupp.a
dcolon@gold:~$

Logical AND and OR Using AWK

Before learning about AWK’s logical AND operator I used to string a pair of grep commands together to find two search terms:

grep abc filename | grep def

to find lines with both abc and def. This can be shortened into a single AWK command:

awk '/abc/&&/def/' filename

A logical OR is also provided using a pair of pipes ||

awk '/abc/||/def/' filename

You can also use the equivalent egrep command:

egrep 'abc|def' filename

To get comfortable with AWK, try using it instead of grep for a week. AWK has many more features than just printing fields from a file.

Quickly Generate Sequential Bind Zone Files

For a large number of servers in a given ip block, I frequently see people use the following notation which incorporates the ip address with the hostname:


ip-10.10.10.1.mydomain.com
ip-10.10.10.2.mydomain.com
...
ip-10.10.10.255.mydomain.com

Using this type of notation makes it easy to script the creation of your zone file. To generate the forward dns for this zone (mydomain.com), I would do the following:

for i in $(seq 1 255)
do
   echo "ip-10.10.10.$i        IN A      10.10.10.$i" >> db.mydomain.com
done

Conversely, to generate the 10.10.10.0.in-addr.arpa reverse zone do the following:

for i in $(seq 1 255)
do
   echo "$i             IN PTR    ip-10.10.10.$i.mydomain.com." >> 10.10.10.0.in-addr.arpa
done

This creates the sequence from 1 to 255. For each iteration, you generate a line using echo for the given value of $i. Each line is appended to your zone file.

I like to use tabs instead of spaces in between columns. To add tabs to the echo statement, you need to escape each tab with a control-v. If you want to add two tabs, hit ‘control-v’ followed by the tab key followed by another ‘control-v’ and another tab.

There are obviously many other ways to do this. You can use a for loop:


for ((i=1; i<=255; i++));