Can You Improve This Command Line Magic?

By | 2007/08/13

Today I needed to come up with some more command line magic. You might remember the post I did about digging out of holes with some command line magic. Today’s goal was to do some math, or more specifically, find an average of numbers based from data in a text file. I asked around in IRC for some solutions and one user (the genius coder) came up with something in about two-minutes. Can you improve this or do it with another language? Bash? Python? C? I’d like to see other implementations if you’d like to take the challenge. It’s not much but gets your mind going…

Here is a link to an example file I’m sourcing from. Not the exact same, but you get the idea. Basically taking quarterly data and averaging it. (Let’s not get into a discussion on a better way to store this data in the first place, I’m just looking at clean ways to average the data.)

First submission here:

cat quarterly.txt | grep Q3 | cut -d “=” -f2 | perl -e ‘$a=0;$b=0;while(<>){$a++;$b+=$_;}print $b/$a.”n”;’

I’m sure some of you Perl mongers can play some golf with this. Who else wants to try?

19 thoughts on “Can You Improve This Command Line Magic?

  1. Martijn van de Streek

    Congratulations!

    You’ve won the “Useless Use of ‘cat’ Award”! šŸ™‚

    Try:

    grep Q3 ){$a++;$b+=$_;}print $b/$a.ā€nā€;ā€™

  2. matt harrison

    Not sure if it is an improvement (7 lines longer). (But much more readable IMHO). (Also note that I couldn’t get your sample file, 500 error).

    import sys

    count = 0
    sum_items = 0
    for line in open(sys.argv[1]):
    if “Q3” in line:
    sum_items += float(line.split(“=”)[-1])
    count += 1
    print sum_items/count

  3. matt harrison

    I apologize for the spacing, your comment mechanism isn’t python friendly (and doesn’t have a preview so I’m not going to spend time messing with html tags)

  4. vidakris

    I would definately use an ancient and very handy package named unixstat, the following way:

    dm s1

  5. vidakris

    The previous post had some problems, so let’s try it again:

    cat datafile | dm s1 | stats mean

  6. Hans Fugal

    It’s not command-line golf, but it’s an improvement:

    #!/usr/bin/ruby
    quarters = {}
    ARGF.each_line do |l|
    q,n = l.strip.split(“=”)
    (quarters[q] ||= []).push n.to_i
    end
    quarters.each do |k,v|
    avg = v.inject(0) {|sum,n| sum + n} / v.size.to_f
    puts “#{k}: #{avg}”
    end

  7. Hans Fugal

    And here’s some golf, though not nearly as impressive as the awk example:

    grep Q3 | ruby -nla -F’=’ -e ‘x||=0;n||=0.0;x+=$F[1].to_i;n+=1;p x/n’ | tail -1

  8. Pat

    perl -naF’=’ -e ‘++$c{$F[0]} and $s{$F[0]}+=$F[1];END{print $_.”:”.$s{$_}/$c{$_},”\n” foreach keys %c}’ quarterly.txt

    Bonus, prints average for all quarters šŸ™‚

  9. matt harrison

    I guess I’m interested in the long terms aspects of this “command line wizardry”. Is it a run once command? Something that you run often? Something that others will need to run? Does it actually need to be a “command line” command or is a program sufficient? (I’m assuming the later, since you asked for c?!? examples)

    Every time I try to learn more than the basic sed or awk commands I come back to just writing the thing in python. It’s more portable (ie should run on windows), easier for my brain to grok, actually readable and takes less time. Rather than searching through many examples of trying to do it in 1 line, I can just pound out a 7 line solution in a minute that I am confident about.

    flame on
    BTW, I think from all of the other contributed code, the python version is much more readable….. I’m not convinced as to why I wouldn’t use python here….
    flame off

  10. Stuart Langridge

    python -c “import sys; x=[float(x[3:]) for x in sys.stdin if x.startswith(‘Q3=’)]; print sum(x)/len(x)”

  11. Carsten

    perl -ane ‘next unless /Q3=(\d+)/; $a+=$1; $b++;END{print $a/$b,”\n”}’ quarterly.txt

    It’s not nice, but easy to read.

  12. Tommaso

    just another different way, it uses bash arithmetic evaluation, that’s a pity bash is float-dumb.

    echo “scale=10;” $[ `grep Q3 quarterly.txt | sed -e ‘s/Q3=/+/’` ]/`grep Q3 quarterly.txt | wc -l` | bc

    while this let all the computation to bc:

    echo “scale=10;” \(0 `grep Q3 quarterly.txt | sed -e ‘s/Q3=/+/’`\)/`grep Q3 quarterly.txt |wc -l`|bc

  13. Coucouf

    Weren’t you talking about C ?
    Here’s one try that gives the mean for each quarter.

    #include

    int main(int argc, char* argv[])
    {
    char xBuffer[16];
    int xSum[4] = {0, 0, 0, 0};
    int xNumElements[4] = {0, 0, 0, 0};
    FILE* xFile = fopen(“quarterly.txt”, “r”);
    while(fgets(xBuffer, 15, xFile) != NULL)
    {
    int xKey, xValue;
    sscanf(xBuffer, “Q%d=%d”, &xKey, &xValue);
    xSum[xKey-1] += xValue;
    xNumElements[xKey-1]++;
    }
    int i;
    for(i=0; i

  14. Coucouf

    Oops, sorry but the lesser than sign doesn’t work in comments. šŸ™
    Let’s do it with text…

    for(i=0; i lesser_than 4; i++)
    printf(“Mean for Q%d is %f\n”, i+1, xSum[i]/(float)xNumElements[i]);
    }

  15. marvin

    I’d prefer awk. Much faster than perl..
    The first version has a drawback, though. It acquires floating point error along the way, so if you use it on a very large dataset you might get an unprecise result.
    See numerical recipes in c:
    http://www.nrbook.com/a/bookcpdf/c14-1.pdf

    So my suggestion:
    $ awk -F\= ‘BEGIN {a=0.0}
    > /Q1/ {a+=($2-a)/NR}
    > END {print a}’ quarterly.txt

    Also uses less memory, one variable less šŸ˜‰

  16. marvin

    This works too, all arithmetic is floating point in awk.


    awk -F\= '/Q1/ {a+=($2-a)/NR} END {print a}' quarterly.txt

Comments are closed.