Regex Challenge

Filed Under c0de 

Over the weekend I started studying Perl and quickly realized I was going to be better off if I reviewed regular expressions before I got too far into things.  I went back and found my copy of “Mastering Regular Expressions” and dove right in.  Now, maybe its just me, but I find that I really enjoy the problem solving aspect of regular expressions.  I thought it might be fun to put up a regular challenge on the blog that needs to be solved via regular expressions.

(Obligatory xkcd reference)

So, I figure I’d start off with one that caught me today.  Here is the situation:

You’ve got a regular text file filled with usernames.  You want to be able to read this file into a program to populate an array, but there are random blank lines throughout the file.  What regex would you use to find and remove all empty lines in the file?

For consistency sake, I’ve populated just such a file and made it available here.

Rules: Use any tool you want (perl, sed, vim, etc)  The file must contain all original usernames (total of 15), one per line, with no blank lines start to finish.  Please share your solution in the comments!

If this site has been useful, please consider participating in the Fundraiser.

Other Points of Interest

Comments

26 Responses to “Regex Challenge”

  1. Joseph Hall on November 17th, 2008 6:15 pm

    sed -i ‘/^$/ d’ usernames.txt

  2. seb on November 17th, 2008 6:29 pm

    I came across this little introduction lately. as a non geek it reads … intresting…
    http://phi.lho.free.fr/programming/RETutorial.en.html

  3. Indiangeek on November 17th, 2008 6:33 pm

    egrep “\w” usernames.txt

  4. David Owen on November 17th, 2008 6:50 pm

    grep . usernames.txt

    If the file has a chance of duplicates, I’d actually do:

    sort -u usernames.txt |grep .

  5. lebinh on November 17th, 2008 6:57 pm

    VIM:
    :g/^\s*$/d

  6. Derek Carter on November 17th, 2008 7:11 pm

    [dcarter@host ~]$ grep . usernames.txt
    jstark
    bchilds
    cedwards
    arhodes
    mrowley
    prhodes
    asmith
    bcorey
    hnakamura
    ppetrelli
    cbyrd
    jstrazzo
    sraver
    bgates
    ltorvalds

  7. Josh on November 17th, 2008 7:18 pm

    grep -v ‘^$’

  8. eddie on November 17th, 2008 7:44 pm

    cat usernames.txt | sort | uniq | tail -n `cat usernames.txt | wc -w`

    just because I hate regexes and avoid them if I can.

  9. cert on November 17th, 2008 8:06 pm

    get-content C:\Scripts\usernames.txt | foreach {if ($_.length -ne 0) {write-host $_}}

  10. lucas on November 17th, 2008 8:45 pm

    xargs printf “%s\n” < usernames.txt

    (and this week’s Useless Use of Cat Award goes to…)

  11. phoenyx on November 17th, 2008 8:56 pm

    ruby -pe ‘next if $_=~/^$/’ < usernames.txt

  12. Mackenzie on November 17th, 2008 8:57 pm

    Ok, now suppose the usernames aren’t all on separate lines. Sometimes there are multiple usernames on one line, comma separated, maybe there’s a space after the comma, but maybe there’s not. Like this:

    jstark

    bchilds, cedwards,arhodes

    mrowley

    prhodes

    asmith
    bcorey

    hnakamura
    ppetrelli
    cbyrd

    jstrazzo
    sraver
    bgates

    ltorvalds

  13. Mackenzie on November 17th, 2008 9:02 pm

    Oh, my answer for the one I just suggested is at http://student.seas.gwu.edu/~mac/files/regex.html

  14. ak on November 17th, 2008 9:22 pm

    Python:

    for x in re.sub(”\n+”, “:”, file(”username.txt”).read().strip()).split(’:'):
    print x

  15. dominiko on November 17th, 2008 11:49 pm

    sed ‘/^\s*$/d’ usernames.txt

    perl -ne ‘print unless (/^\s*$/)’ usernames.txt

    perl -ne ‘print if (/\S/)’ usernames.txt

    egrep -v ‘^\s*$’ usernames.txt

    awk ‘!/^\s*$/ { print }’ usenames.txt

    vim -c ’sil g/^\s*$/d’ usernames.txt

  16. Keith on November 18th, 2008 1:58 am

    Although it’s aimed at sed I’ve always found this a useful source of regex solutions : http://sed.sourceforge.net/sed1line.txt

  17. Mark Drago on November 18th, 2008 5:52 am

    Mackenzie:
    sed -e ’s/,/\n/g’ -e ’s/[ \t]//g’ -e ‘/^$/d’ usernames.txt

    Converts commas to newlines, strips whitespace, then removes blanks.

    Mark.

  18. Jeff Schroeder on November 18th, 2008 6:53 am

    Regex can be avoided in most cases and should be when possible.

    Try this:
    awk ‘/^[a-zA-Z]/{print $1}’ usernames.txt

    Unlike just removing the newlines, that prevents cases where some crazy admin has commented out users or whatnot in /etc/passwd.

    If that was a standard passwd file you could do something more like:
    awk -F: ‘/^[a-zA-Z]/{if ($3 !~ 0) print $1}’ /etc/passwd

    If it is and Ubuntu / Debian server where normal uids start at 1000, you could do this for a listing of usernames:
    awk -F: ‘/^[a-zA-Z]/{if ($3 >= 1000 && $1 != “nobody”) print $1}’ /etc/passwd

    Usernames must start with a character and can’t start with a number from my understanding so that should work fairly well.

  19. Kai on November 18th, 2008 8:18 am

    Just delete blank lines:
    perl -pi -e ’s/\A\s+\z//’ usernames.txt
    With commas too:
    perl -pi -e ’s/,\s*/\n/g;s/\A\s+\z//’ usernames.txt

  20. adam-collard on November 18th, 2008 2:13 pm

    import sys

    for line in sys.stdin:
    usernames = [name.strip() for name in line.split(",")]
    if not any(usernames):
    continue
    for username in usernames:
    print username

    Readability - 1
    The way of the Pathologically Eclectic Rubbish Lister - 0

  21. meyer on November 18th, 2008 3:04 pm

    #!/usr/bin/perl
    open(FILE,”usernames.txt”) || die “Could not open usernames.txt: $!\n”;
    foreach () {if ($_ =~ /\w+/) {print “$_”}}
    close(FILE);

    Of course you’d have to be crazy to use perl for something that easy in grep/awk!!

  22. meyer on November 18th, 2008 3:05 pm

    boo that should be foreach \…

  23. Mike on November 19th, 2008 6:27 am

    Hm… why do I have to post this three times? :-/

    awk -v RS=”[\n ,]+” ‘{print}’ < u.txt

  24. Wolfger on November 20th, 2008 8:35 am

    Late to the challenge, but here’s the regex I whipped out off the top of my head:
    s/\n\W*\n/\n/g

    My perl code slurps the entire file to a variable and regexes that variable, then splits on newline to an array.

  25. Khalil Fazal on December 4th, 2008 7:22 pm

    emacs22

    C-M-%

    C-q C-j C-q C-j +

    C-q C-j

    and it will then ask you “[y/n]?”

  26. G N on January 2nd, 2009 9:46 am

    I second “grep .” as the easiest solution.

    Perl/Ruby shouldn’t be too complicated either:

    perl -lane ‘print if /./’
    ruby -lane ‘p if /./’

    Too filter whitespace lines (not only blanks):

    perl -lane ‘print if /\S/’
    ruby -lane ‘print if /\S/’

Leave a Comment




    Subscribe to the RSS feed!


    subscribe to the ubuntu tutorials RSS feed

    Ubuntu Tutorials Fundraiser


    Please Donate to
    Server Improvement

    Target amount: USD1,000.00
    Total Donations: USD328.00
    Amount Needed: USD672.00

    Thank you for your support!

    Click to Donate

    Polls



  • Blogroll

  • Ads by Google