Regex Challenge

By | 2008/11/17

Over the weekend I started studying Perl and quickly realized I was going to be better off if I reviewed regular expressions before I got too far into things.  I went back and found my copy of “Mastering Regular Expressions” and dove right in.  Now, maybe its just me, but I find that I really enjoy the problem solving aspect of regular expressions.  I thought it might be fun to put up a regular challenge on the blog that needs to be solved via regular expressions.

(Obligatory xkcd reference)

So, I figure I’d start off with one that caught me today.  Here is the situation:

You’ve got a regular text file filled with usernames.  You want to be able to read this file into a program to populate an array, but there are random blank lines throughout the file.  What regex would you use to find and remove all empty lines in the file?

For consistency sake, I’ve populated just such a file and made it available here.

Rules: Use any tool you want (perl, sed, vim, etc)  The file must contain all original usernames (total of 15), one per line, with no blank lines start to finish.  Please share your solution in the comments!

26 thoughts on “Regex Challenge

  1. Joseph Hall

    sed -i ‘/^$/ d’ usernames.txt

  2. David Owen

    grep . usernames.txt

    If the file has a chance of duplicates, I’d actually do:

    sort -u usernames.txt |grep .

  3. Derek Carter

    [dcarter@host ~]$ grep . usernames.txt
    jstark
    bchilds
    cedwards
    arhodes
    mrowley
    prhodes
    asmith
    bcorey
    hnakamura
    ppetrelli
    cbyrd
    jstrazzo
    sraver
    bgates
    ltorvalds

  4. eddie

    cat usernames.txt | sort | uniq | tail -n `cat usernames.txt | wc -w`

    just because I hate regexes and avoid them if I can.

  5. cert

    get-content C:\Scripts\usernames.txt | foreach {if ($_.length -ne 0) {write-host $_}}

  6. lucas

    xargs printf “%s\n” < usernames.txt

    (and this week’s Useless Use of Cat Award goes to…)

  7. phoenyx

    ruby -pe ‘next if $_=~/^$/’ < usernames.txt

  8. Mackenzie

    Ok, now suppose the usernames aren’t all on separate lines. Sometimes there are multiple usernames on one line, comma separated, maybe there’s a space after the comma, but maybe there’s not. Like this:

    jstark

    bchilds, cedwards,arhodes

    mrowley

    prhodes

    asmith
    bcorey

    hnakamura
    ppetrelli
    cbyrd

    jstrazzo
    sraver
    bgates

    ltorvalds

  9. ak

    Python:

    for x in re.sub(“\n+”, “:”, file(“username.txt”).read().strip()).split(‘:’):
    print x

  10. dominiko

    sed ‘/^\s*$/d’ usernames.txt

    perl -ne ‘print unless (/^\s*$/)’ usernames.txt

    perl -ne ‘print if (/\S/)’ usernames.txt

    egrep -v ‘^\s*$’ usernames.txt

    awk ‘!/^\s*$/ { print }’ usenames.txt

    vim -c ‘sil g/^\s*$/d’ usernames.txt

  11. Mark Drago

    Mackenzie:
    sed -e ‘s/,/\n/g’ -e ‘s/[ \t]//g’ -e ‘/^$/d’ usernames.txt

    Converts commas to newlines, strips whitespace, then removes blanks.

    Mark.

  12. Jeff Schroeder

    Regex can be avoided in most cases and should be when possible.

    Try this:
    awk ‘/^[a-zA-Z]/{print $1}’ usernames.txt

    Unlike just removing the newlines, that prevents cases where some crazy admin has commented out users or whatnot in /etc/passwd.

    If that was a standard passwd file you could do something more like:
    awk -F: ‘/^[a-zA-Z]/{if ($3 !~ 0) print $1}’ /etc/passwd

    If it is and Ubuntu / Debian server where normal uids start at 1000, you could do this for a listing of usernames:
    awk -F: ‘/^[a-zA-Z]/{if ($3 >= 1000 && $1 != “nobody”) print $1}’ /etc/passwd

    Usernames must start with a character and can’t start with a number from my understanding so that should work fairly well.

  13. Kai

    Just delete blank lines:
    perl -pi -e ‘s/\A\s+\z//’ usernames.txt
    With commas too:
    perl -pi -e ‘s/,\s*/\n/g;s/\A\s+\z//’ usernames.txt

  14. adam-collard

    import sys

    for line in sys.stdin:
    usernames = [name.strip() for name in line.split(“,”)]
    if not any(usernames):
    continue
    for username in usernames:
    print username

    Readability – 1
    The way of the Pathologically Eclectic Rubbish Lister – 0

  15. meyer

    #!/usr/bin/perl
    open(FILE,”usernames.txt”) || die “Could not open usernames.txt: $!\n”;
    foreach () {if ($_ =~ /\w+/) {print “$_”}}
    close(FILE);

    Of course you’d have to be crazy to use perl for something that easy in grep/awk!!

  16. meyer

    boo that should be foreach \…

  17. Mike

    Hm… why do I have to post this three times? :-/

    awk -v RS=”[\n ,]+” ‘{print}’ < u.txt

  18. Wolfger

    Late to the challenge, but here’s the regex I whipped out off the top of my head:
    s/\n\W*\n/\n/g

    My perl code slurps the entire file to a variable and regexes that variable, then splits on newline to an array.

  19. Khalil Fazal

    emacs22

    C-M-%

    C-q C-j C-q C-j +

    C-q C-j

    and it will then ask you “[y/n]?”

  20. G N

    I second “grep .” as the easiest solution.

    Perl/Ruby shouldn’t be too complicated either:

    perl -lane ‘print if /./’
    ruby -lane ‘p if /./’

    Too filter whitespace lines (not only blanks):

    perl -lane ‘print if /\S/’
    ruby -lane ‘print if /\S/’

Comments are closed.