Regex Challenge – Ubuntu Tutorials

Over the weekend I started studying Perl and quickly realized I was going to be better off if I reviewed regular expressions before I got too far into things. I went back and found my copy of “Mastering Regular Expressions” and dove right in. Now, maybe its just me, but I find that I really enjoy the problem solving aspect of regular expressions. I thought it might be fun to put up a regular challenge on the blog that needs to be solved via regular expressions.

(Obligatory xkcd reference)

So, I figure I’d start off with one that caught me today. Here is the situation:

You’ve got a regular text file filled with usernames. You want to be able to read this file into a program to populate an array, but there are random blank lines throughout the file. What regex would you use to find and remove all empty lines in the file?

For consistency sake, I’ve populated just such a file and made it available here.

Rules: Use any tool you want (perl, sed, vim, etc) The file must contain all original usernames (total of 15), one per line, with no blank lines start to finish. Please share your solution in the comments!

26 thoughts on “Regex Challenge”

Joseph Hall 2008/11/17

sed -i ‘/^$/ d’ usernames.txt

seb 2008/11/17

I came across this little introduction lately. as a non geek it reads … intresting…
http://phi.lho.free.fr/programming/RETutorial.en.html

Indiangeek 2008/11/17

egrep “\w” usernames.txt

David Owen 2008/11/17

grep . usernames.txt

If the file has a chance of duplicates, I’d actually do:

sort -u usernames.txt |grep .

lebinh 2008/11/17

VIM:
:g/^\s*$/d

Derek Carter 2008/11/17

[dcarter@host ~]$ grep . usernames.txt
jstark
bchilds
cedwards
arhodes
mrowley
prhodes
asmith
bcorey
hnakamura
ppetrelli
cbyrd
jstrazzo
sraver
bgates
ltorvalds

Josh 2008/11/17

grep -v ‘^$’

eddie 2008/11/17

cat usernames.txt | sort | uniq | tail -n `cat usernames.txt | wc -w`

just because I hate regexes and avoid them if I can.

cert 2008/11/17

get-content C:\Scripts\usernames.txt | foreach {if ($_.length -ne 0) {write-host $_}}

lucas 2008/11/17

xargs printf “%s\n” < usernames.txt

(and this week’s Useless Use of Cat Award goes to…)

phoenyx 2008/11/17

ruby -pe ‘next if $_=~/^$/’ < usernames.txt

Mackenzie 2008/11/17

Ok, now suppose the usernames aren’t all on separate lines. Sometimes there are multiple usernames on one line, comma separated, maybe there’s a space after the comma, but maybe there’s not. Like this:

jstark

bchilds, cedwards,arhodes

mrowley

prhodes

asmith
bcorey

hnakamura
ppetrelli
cbyrd

jstrazzo
sraver
bgates

ltorvalds

Mackenzie 2008/11/17

Oh, my answer for the one I just suggested is at http://student.seas.gwu.edu/~mac/files/regex.html

ak 2008/11/17

Python:

for x in re.sub(“\n+”, “:”, file(“username.txt”).read().strip()).split(‘:’):
print x

dominiko 2008/11/17

sed ‘/^\s*$/d’ usernames.txt

perl -ne ‘print unless (/^\s*$/)’ usernames.txt

perl -ne ‘print if (/\S/)’ usernames.txt

egrep -v ‘^\s*$’ usernames.txt

awk ‘!/^\s*$/ { print }’ usenames.txt

vim -c ‘sil g/^\s*$/d’ usernames.txt

Keith 2008/11/18

Although it’s aimed at sed I’ve always found this a useful source of regex solutions : http://sed.sourceforge.net/sed1line.txt

Mark Drago 2008/11/18

Mackenzie:
sed -e ‘s/,/\n/g’ -e ‘s/[ \t]//g’ -e ‘/^$/d’ usernames.txt

Converts commas to newlines, strips whitespace, then removes blanks.

Mark.

Jeff Schroeder 2008/11/18

Regex can be avoided in most cases and should be when possible.

Try this:
awk ‘/^[a-zA-Z]/{print $1}’ usernames.txt

Unlike just removing the newlines, that prevents cases where some crazy admin has commented out users or whatnot in /etc/passwd.

If that was a standard passwd file you could do something more like:
awk -F: ‘/^[a-zA-Z]/{if ($3 !~ 0) print $1}’ /etc/passwd

If it is and Ubuntu / Debian server where normal uids start at 1000, you could do this for a listing of usernames:
awk -F: ‘/^[a-zA-Z]/{if ($3 >= 1000 && $1 != “nobody”) print $1}’ /etc/passwd

Usernames must start with a character and can’t start with a number from my understanding so that should work fairly well.

Kai 2008/11/18

Just delete blank lines:
perl -pi -e ‘s/\A\s+\z//’ usernames.txt
With commas too:
perl -pi -e ‘s/,\s*/\n/g;s/\A\s+\z//’ usernames.txt

adam-collard 2008/11/18

import sys

for line in sys.stdin:
usernames = [name.strip() for name in line.split(“,”)]
if not any(usernames):
continue
for username in usernames:
print username

Readability – 1
The way of the Pathologically Eclectic Rubbish Lister – 0