Over the weekend I started studying Perl and quickly realized I was going to be better off if I reviewed regular expressions before I got too far into things. I went back and found my copy of “Mastering Regular Expressions” and dove right in. Now, maybe its just me, but I find that I really enjoy the problem solving aspect of regular expressions. I thought it might be fun to put up a regular challenge on the blog that needs to be solved via regular expressions.
So, I figure I’d start off with one that caught me today. Here is the situation:
You’ve got a regular text file filled with usernames. You want to be able to read this file into a program to populate an array, but there are random blank lines throughout the file. What regex would you use to find and remove all empty lines in the file?
For consistency sake, I’ve populated just such a file and made it available here.
Rules: Use any tool you want (perl, sed, vim, etc) The file must contain all original usernames (total of 15), one per line, with no blank lines start to finish. Please share your solution in the comments!
sed -i ‘/^$/ d’ usernames.txt
I came across this little introduction lately. as a non geek it reads … intresting…
http://phi.lho.free.fr/programming/RETutorial.en.html
egrep “\w” usernames.txt
grep . usernames.txt
If the file has a chance of duplicates, I’d actually do:
sort -u usernames.txt |grep .
VIM:
:g/^\s*$/d
[dcarter@host ~]$ grep . usernames.txt
jstark
bchilds
cedwards
arhodes
mrowley
prhodes
asmith
bcorey
hnakamura
ppetrelli
cbyrd
jstrazzo
sraver
bgates
ltorvalds
grep -v ‘^$’
cat usernames.txt | sort | uniq | tail -n `cat usernames.txt | wc -w`
just because I hate regexes and avoid them if I can.
get-content C:\Scripts\usernames.txt | foreach {if ($_.length -ne 0) {write-host $_}}
xargs printf “%s\n” < usernames.txt
(and this week’s Useless Use of Cat Award goes to…)
ruby -pe ‘next if $_=~/^$/’ < usernames.txt
Ok, now suppose the usernames aren’t all on separate lines. Sometimes there are multiple usernames on one line, comma separated, maybe there’s a space after the comma, but maybe there’s not. Like this:
jstark
bchilds, cedwards,arhodes
mrowley
prhodes
asmith
bcorey
hnakamura
ppetrelli
cbyrd
jstrazzo
sraver
bgates
ltorvalds
Oh, my answer for the one I just suggested is at http://student.seas.gwu.edu/~mac/files/regex.html
Python:
for x in re.sub(“\n+”, “:”, file(“username.txt”).read().strip()).split(‘:’):
print x
sed ‘/^\s*$/d’ usernames.txt
perl -ne ‘print unless (/^\s*$/)’ usernames.txt
perl -ne ‘print if (/\S/)’ usernames.txt
egrep -v ‘^\s*$’ usernames.txt
awk ‘!/^\s*$/ { print }’ usenames.txt
vim -c ‘sil g/^\s*$/d’ usernames.txt
Although it’s aimed at sed I’ve always found this a useful source of regex solutions : http://sed.sourceforge.net/sed1line.txt
Mackenzie:
sed -e ‘s/,/\n/g’ -e ‘s/[ \t]//g’ -e ‘/^$/d’ usernames.txt
Converts commas to newlines, strips whitespace, then removes blanks.
Mark.
Regex can be avoided in most cases and should be when possible.
Try this:
awk ‘/^[a-zA-Z]/{print $1}’ usernames.txt
Unlike just removing the newlines, that prevents cases where some crazy admin has commented out users or whatnot in /etc/passwd.
If that was a standard passwd file you could do something more like:
awk -F: ‘/^[a-zA-Z]/{if ($3 !~ 0) print $1}’ /etc/passwd
If it is and Ubuntu / Debian server where normal uids start at 1000, you could do this for a listing of usernames:
awk -F: ‘/^[a-zA-Z]/{if ($3 >= 1000 && $1 != “nobody”) print $1}’ /etc/passwd
Usernames must start with a character and can’t start with a number from my understanding so that should work fairly well.
Just delete blank lines:
perl -pi -e ‘s/\A\s+\z//’ usernames.txt
With commas too:
perl -pi -e ‘s/,\s*/\n/g;s/\A\s+\z//’ usernames.txt
import sys
for line in sys.stdin:
usernames = [name.strip() for name in line.split(“,”)]
if not any(usernames):
continue
for username in usernames:
print username
Readability – 1
The way of the Pathologically Eclectic Rubbish Lister – 0
#!/usr/bin/perl
open(FILE,”usernames.txt”) || die “Could not open usernames.txt: $!\n”;
foreach () {if ($_ =~ /\w+/) {print “$_”}}
close(FILE);
Of course you’d have to be crazy to use perl for something that easy in grep/awk!!
boo that should be foreach \…
Hm… why do I have to post this three times? :-/
awk -v RS=”[\n ,]+” ‘{print}’ < u.txt
Late to the challenge, but here’s the regex I whipped out off the top of my head:
s/\n\W*\n/\n/g
My perl code slurps the entire file to a variable and regexes that variable, then splits on newline to an array.
emacs22
C-M-%
C-q C-j C-q C-j +
C-q C-j
and it will then ask you “[y/n]?”
I second “grep .” as the easiest solution.
Perl/Ruby shouldn’t be too complicated either:
perl -lane ‘print if /./’
ruby -lane ‘p if /./’
Too filter whitespace lines (not only blanks):
perl -lane ‘print if /\S/’
ruby -lane ‘print if /\S/’