Can You Improve This Command Line Magic?

Today I needed to come up with some more command line magic. You might remember the post I did about digging out of holes with some command line magic. Today’s goal was to do some math, or more specifically, find an average of numbers based from data in a text file. I asked around in IRC for some solutions and one user (the genius coder) came up with something in about two-minutes. Can you improve this or do it with another language? Bash? Python? C? I’d like to see other implementations if you’d like to take the challenge. It’s not much but gets your mind going…

Here is a link to an example file I’m sourcing from. Not the exact same, but you get the idea. Basically taking quarterly data and averaging it. (Let’s not get into a discussion on a better way to store this data in the first place, I’m just looking at clean ways to average the data.)

First submission here:

cat quarterly.txt | grep Q3 | cut -d “=” -f2 | perl -e ‘$a=0;$b=0;while(<>){$a++;$b+=$_;}print $b/$a.”n”;’

I’m sure some of you Perl mongers can play some golf with this. Who else wants to try?

19 thoughts on “Can You Improve This Command Line Magic?”

Ubuntu Tutorials 2007/08/13

First improvement. Can you beat this?

awk -F\= ‘/Q3/ {a++;b+=$2}END{print b/a}’ quarterly*

Martijn van de Streek 2007/08/13

Congratulations!

You’ve won the “Useless Use of ‘cat’ Award”! 🙂

Try:
grep Q3 ){$a++;$b+=$_;}print $b/$a.”n”;’

Martijn van de Streek 2007/08/13

(argh.. should have used <)

matt harrison 2007/08/13

Not sure if it is an improvement (7 lines longer). (But much more readable IMHO). (Also note that I couldn’t get your sample file, 500 error).

import sys

count = 0
sum_items = 0
for line in open(sys.argv[1]):
if “Q3” in line:
sum_items += float(line.split(“=”)[-1])
count += 1
print sum_items/count

matt harrison 2007/08/13

I apologize for the spacing, your comment mechanism isn’t python friendly (and doesn’t have a preview so I’m not going to spend time messing with html tags)

vidakris 2007/08/13

I would definately use an ancient and very handy package named unixstat, the following way:

dm s1

vidakris 2007/08/13

The previous post had some problems, so let’s try it again:

cat datafile | dm s1 | stats mean

Hans Fugal 2007/08/13

It’s not command-line golf, but it’s an improvement:

#!/usr/bin/ruby
quarters = {}
ARGF.each_line do |l|
q,n = l.strip.split(“=”)
(quarters[q] ||= []).push n.to_i
end
quarters.each do |k,v|
avg = v.inject(0) {|sum,n| sum + n} / v.size.to_f
puts “#{k}: #{avg}”
end

Hans Fugal 2007/08/13

And here’s some golf, though not nearly as impressive as the awk example:

grep Q3 | ruby -nla -F’=’ -e ‘x||=0;n||=0.0;x+=$F[1].to_i;n+=1;p x/n’ | tail -1

Pat 2007/08/13

perl -naF’=’ -e ‘++$c{$F[0]} and $s{$F[0]}+=$F[1];END{print $_.”:”.$s{$_}/$c{$_},”\n” foreach keys %c}’ quarterly.txt

Bonus, prints average for all quarters 🙂

matt harrison 2007/08/13

I guess I’m interested in the long terms aspects of this “command line wizardry”. Is it a run once command? Something that you run often? Something that others will need to run? Does it actually need to be a “command line” command or is a program sufficient? (I’m assuming the later, since you asked for c?!? examples)

Every time I try to learn more than the basic sed or awk commands I come back to just writing the thing in python. It’s more portable (ie should run on windows), easier for my brain to grok, actually readable and takes less time. Rather than searching through many examples of trying to do it in 1 line, I can just pound out a 7 line solution in a minute that I am confident about.

flame on
BTW, I think from all of the other contributed code, the python version is much more readable….. I’m not convinced as to why I wouldn’t use python here….
flame off

Stuart Langridge 2007/08/13

python -c “import sys; x=[float(x[3:]) for x in sys.stdin if x.startswith(‘Q3=’)]; print sum(x)/len(x)”

Carsten 2007/08/13

perl -ane ‘next unless /Q3=(\d+)/; $a+=$1; $b++;END{print $a/$b,”\n”}’ quarterly.txt

It’s not nice, but easy to read.

Tommaso 2007/08/14

just another different way, it uses bash arithmetic evaluation, that’s a pity bash is float-dumb.

echo “scale=10;” $[ `grep Q3 quarterly.txt | sed -e ‘s/Q3=/+/’` ]/`grep Q3 quarterly.txt | wc -l` | bc

while this let all the computation to bc:

echo “scale=10;” $0 `grep Q3 quarterly.txt | sed -e ‘s/Q3=/+/’`$/`grep Q3 quarterly.txt |wc -l`|bc

Coucouf 2007/08/14

Weren’t you talking about C ?
Here’s one try that gives the mean for each quarter.

#include

int main(int argc, char* argv[])
{
char xBuffer[16];
int xSum[4] = {0, 0, 0, 0};
int xNumElements[4] = {0, 0, 0, 0};
FILE* xFile = fopen(“quarterly.txt”, “r”);
while(fgets(xBuffer, 15, xFile) != NULL)
{
int xKey, xValue;
sscanf(xBuffer, “Q%d=%d”, &xKey, &xValue);
xSum[xKey-1] += xValue;
xNumElements[xKey-1]++;
}
int i;
for(i=0; i

Coucouf 2007/08/14

OK, the

Coucouf 2007/08/14

Oops, sorry but the lesser than sign doesn’t work in comments. 🙁
Let’s do it with text…

for(i=0; i lesser_than 4; i++)
printf(“Mean for Q%d is %f\n”, i+1, xSum[i]/(float)xNumElements[i]);
}

marvin 2007/08/14

I’d prefer awk. Much faster than perl..
The first version has a drawback, though. It acquires floating point error along the way, so if you use it on a very large dataset you might get an unprecise result.
See numerical recipes in c:
http://www.nrbook.com/a/bookcpdf/c14-1.pdf

So my suggestion:
$ awk -F\= ‘BEGIN {a=0.0}
> /Q1/ {a+=($2-a)/NR}
> END {print a}’ quarterly.txt

Also uses less memory, one variable less 😉

marvin 2007/08/14

This works too, all arithmetic is floating point in awk.

awk -F\= '/Q1/ {a+=($2-a)/NR} END {print a}' quarterly.txt

Comments are closed.