Home > c0de > Regex Challenge

Regex Challenge

Over the weekend I started studying Perl and quickly realized I was going to be better off if I reviewed regular expressions before I got too far into things.  I went back and found my copy of “Mastering Regular Expressions” and dove right in.  Now, maybe its just me, but I find that I really enjoy the problem solving aspect of regular expressions.  I thought it might be fun to put up a regular challenge on the blog that needs to be solved via regular expressions.

(Obligatory xkcd reference)

So, I figure I’d start off with one that caught me today.  Here is the situation:

You’ve got a regular text file filled with usernames.  You want to be able to read this file into a program to populate an array, but there are random blank lines throughout the file.  What regex would you use to find and remove all empty lines in the file?

For consistency sake, I’ve populated just such a file and made it available here.

Rules: Use any tool you want (perl, sed, vim, etc)  The file must contain all original usernames (total of 15), one per line, with no blank lines start to finish.  Please share your solution in the comments!

If this article has been helpful, please consider linking to it.

Categories: c0de Tags: ,

Related Posts

  1. Joseph Hall
    November 17th, 2008 at 18:15 | #1

    sed -i ‘/^$/ d’ usernames.txt

  2. seb
    November 17th, 2008 at 18:29 | #2

    I came across this little introduction lately. as a non geek it reads … intresting…
    http://phi.lho.free.fr/programming/RETutorial.en.html

  3. November 17th, 2008 at 18:33 | #3

    egrep “\w” usernames.txt

  4. David Owen
    November 17th, 2008 at 18:50 | #4

    grep . usernames.txt

    If the file has a chance of duplicates, I’d actually do:

    sort -u usernames.txt |grep .

  5. lebinh
    November 17th, 2008 at 18:57 | #5

    VIM:
    :g/^\s*$/d

  6. November 17th, 2008 at 19:11 | #6

    [dcarter@host ~]$ grep . usernames.txt
    jstark
    bchilds
    cedwards
    arhodes
    mrowley
    prhodes
    asmith
    bcorey
    hnakamura
    ppetrelli
    cbyrd
    jstrazzo
    sraver
    bgates
    ltorvalds

  7. Josh
    November 17th, 2008 at 19:18 | #7

    grep -v ‘^$’

  8. eddie
    November 17th, 2008 at 19:44 | #8

    cat usernames.txt | sort | uniq | tail -n `cat usernames.txt | wc -w`

    just because I hate regexes and avoid them if I can.

  9. cert
    November 17th, 2008 at 20:06 | #9

    get-content C:\Scripts\usernames.txt | foreach {if ($_.length -ne 0) {write-host $_}}

  10. lucas
    November 17th, 2008 at 20:45 | #10

    xargs printf “%s\n” < usernames.txt

    (and this week’s Useless Use of Cat Award goes to…)

  11. phoenyx
    November 17th, 2008 at 20:56 | #11

    ruby -pe ‘next if $_=~/^$/’ < usernames.txt

  12. November 17th, 2008 at 20:57 | #12

    Ok, now suppose the usernames aren’t all on separate lines. Sometimes there are multiple usernames on one line, comma separated, maybe there’s a space after the comma, but maybe there’s not. Like this:

    jstark

    bchilds, cedwards,arhodes

    mrowley

    prhodes

    asmith
    bcorey

    hnakamura
    ppetrelli
    cbyrd

    jstrazzo
    sraver
    bgates

    ltorvalds

  13. November 17th, 2008 at 21:02 | #13

    Oh, my answer for the one I just suggested is at http://student.seas.gwu.edu/~mac/files/regex.html

  14. November 17th, 2008 at 21:22 | #14

    Python:

    for x in re.sub(“\n+”, “:”, file(“username.txt”).read().strip()).split(‘:’):
    print x

  15. November 17th, 2008 at 23:49 | #15

    sed ‘/^\s*$/d’ usernames.txt

    perl -ne ‘print unless (/^\s*$/)’ usernames.txt

    perl -ne ‘print if (/\S/)’ usernames.txt

    egrep -v ‘^\s*$’ usernames.txt

    awk ‘!/^\s*$/ { print }’ usenames.txt

    vim -c ‘sil g/^\s*$/d’ usernames.txt

  16. Keith
    November 18th, 2008 at 01:58 | #16

    Although it’s aimed at sed I’ve always found this a useful source of regex solutions : http://sed.sourceforge.net/sed1line.txt

  17. November 18th, 2008 at 05:52 | #17

    Mackenzie:
    sed -e ‘s/,/\n/g’ -e ‘s/[ \t]//g’ -e ‘/^$/d’ usernames.txt

    Converts commas to newlines, strips whitespace, then removes blanks.

    Mark.

  18. November 18th, 2008 at 06:53 | #18

    Regex can be avoided in most cases and should be when possible.

    Try this:
    awk ‘/^[a-zA-Z]/{print $1}’ usernames.txt

    Unlike just removing the newlines, that prevents cases where some crazy admin has commented out users or whatnot in /etc/passwd.

    If that was a standard passwd file you could do something more like:
    awk -F: ‘/^[a-zA-Z]/{if ($3 !~ 0) print $1}’ /etc/passwd

    If it is and Ubuntu / Debian server where normal uids start at 1000, you could do this for a listing of usernames:
    awk -F: ‘/^[a-zA-Z]/{if ($3 >= 1000 && $1 != “nobody”) print $1}’ /etc/passwd

    Usernames must start with a character and can’t start with a number from my understanding so that should work fairly well.

  19. Kai
    November 18th, 2008 at 08:18 | #19

    Just delete blank lines:
    perl -pi -e ‘s/\A\s+\z//’ usernames.txt
    With commas too:
    perl -pi -e ‘s/,\s*/\n/g;s/\A\s+\z//’ usernames.txt

  20. November 18th, 2008 at 14:13 | #20

    import sys

    for line in sys.stdin:
    usernames = [name.strip() for name in line.split(",")]
    if not any(usernames):
    continue
    for username in usernames:
    print username

    Readability – 1
    The way of the Pathologically Eclectic Rubbish Lister – 0

  21. meyer
    November 18th, 2008 at 15:04 | #21

    #!/usr/bin/perl
    open(FILE,”usernames.txt”) || die “Could not open usernames.txt: $!\n”;
    foreach () {if ($_ =~ /\w+/) {print “$_”}}
    close(FILE);

    Of course you’d have to be crazy to use perl for something that easy in grep/awk!!

  22. meyer
    November 18th, 2008 at 15:05 | #22

    boo that should be foreach \…

  23. November 19th, 2008 at 06:27 | #23

    Hm… why do I have to post this three times? :-/

    awk -v RS=”[\n ,]+” ‘{print}’ < u.txt

  24. November 20th, 2008 at 08:35 | #24

    Late to the challenge, but here’s the regex I whipped out off the top of my head:
    s/\n\W*\n/\n/g

    My perl code slurps the entire file to a variable and regexes that variable, then splits on newline to an array.

  25. Khalil Fazal
    December 4th, 2008 at 19:22 | #25

    emacs22

    C-M-%

    C-q C-j C-q C-j +

    C-q C-j

    and it will then ask you “[y/n]?”

  26. January 2nd, 2009 at 09:46 | #26

    I second “grep .” as the easiest solution.

    Perl/Ruby shouldn’t be too complicated either:

    perl -lane ‘print if /./’
    ruby -lane ‘p if /./’

    Too filter whitespace lines (not only blanks):

    perl -lane ‘print if /\S/’
    ruby -lane ‘print if /\S/’