Today I needed to come up with some more command line magic. You might remember the post I did about digging out of holes with some command line magic. Today’s goal was to do some math, or more specifically, find an average of numbers based from data in a text file. I asked around in IRC for some solutions and one user (the genius coder) came up with something in about two-minutes. Can you improve this or do it with another language? Bash? Python? C? I’d like to see other implementations if you’d like to take the challenge. It’s not much but gets your mind going…
Here is a link to an example file I’m sourcing from. Not the exact same, but you get the idea. Basically taking quarterly data and averaging it. (Let’s not get into a discussion on a better way to store this data in the first place, I’m just looking at clean ways to average the data.)
First submission here:
cat quarterly.txt | grep Q3 | cut -d “=” -f2 | perl -e ‘$a=0;$b=0;while(<>){$a++;$b+=$_;}print $b/$a.”n”;’
I’m sure some of you Perl mongers can play some golf with this. Who else wants to try?
First improvement. Can you beat this?
awk -F\= ‘/Q3/ {a++;b+=$2}END{print b/a}’ quarterly*
Congratulations!
You’ve won the “Useless Use of ‘cat’ Award”! š
Try:
grep Q3 ){$a++;$b+=$_;}print $b/$a.ānā;ā
(argh.. should have used <)
Not sure if it is an improvement (7 lines longer). (But much more readable IMHO). (Also note that I couldn’t get your sample file, 500 error).
import sys
count = 0
sum_items = 0
for line in open(sys.argv[1]):
if “Q3” in line:
sum_items += float(line.split(“=”)[-1])
count += 1
print sum_items/count
I apologize for the spacing, your comment mechanism isn’t python friendly (and doesn’t have a preview so I’m not going to spend time messing with html tags)
I would definately use an ancient and very handy package named unixstat, the following way:
dm s1
The previous post had some problems, so let’s try it again:
cat datafile | dm s1 | stats mean
It’s not command-line golf, but it’s an improvement:
#!/usr/bin/ruby
quarters = {}
ARGF.each_line do |l|
q,n = l.strip.split(“=”)
(quarters[q] ||= []).push n.to_i
end
quarters.each do |k,v|
avg = v.inject(0) {|sum,n| sum + n} / v.size.to_f
puts “#{k}: #{avg}”
end
And here’s some golf, though not nearly as impressive as the awk example:
grep Q3 | ruby -nla -F’=’ -e ‘x||=0;n||=0.0;x+=$F[1].to_i;n+=1;p x/n’ | tail -1
perl -naF’=’ -e ‘++$c{$F[0]} and $s{$F[0]}+=$F[1];END{print $_.”:”.$s{$_}/$c{$_},”\n” foreach keys %c}’ quarterly.txt
Bonus, prints average for all quarters š
I guess I’m interested in the long terms aspects of this “command line wizardry”. Is it a run once command? Something that you run often? Something that others will need to run? Does it actually need to be a “command line” command or is a program sufficient? (I’m assuming the later, since you asked for c?!? examples)
Every time I try to learn more than the basic sed or awk commands I come back to just writing the thing in python. It’s more portable (ie should run on windows), easier for my brain to grok, actually readable and takes less time. Rather than searching through many examples of trying to do it in 1 line, I can just pound out a 7 line solution in a minute that I am confident about.
flame on
BTW, I think from all of the other contributed code, the python version is much more readable….. I’m not convinced as to why I wouldn’t use python here….
flame off
python -c “import sys; x=[float(x[3:]) for x in sys.stdin if x.startswith(‘Q3=’)]; print sum(x)/len(x)”
perl -ane ‘next unless /Q3=(\d+)/; $a+=$1; $b++;END{print $a/$b,”\n”}’ quarterly.txt
It’s not nice, but easy to read.
just another different way, it uses bash arithmetic evaluation, that’s a pity bash is float-dumb.
echo “scale=10;” $[ `grep Q3 quarterly.txt | sed -e ‘s/Q3=/+/’` ]/`grep Q3 quarterly.txt | wc -l` | bc
while this let all the computation to bc:
echo “scale=10;” \(0 `grep Q3 quarterly.txt | sed -e ‘s/Q3=/+/’`\)/`grep Q3 quarterly.txt |wc -l`|bc
Weren’t you talking about C ?
Here’s one try that gives the mean for each quarter.
#include
int main(int argc, char* argv[])
{
char xBuffer[16];
int xSum[4] = {0, 0, 0, 0};
int xNumElements[4] = {0, 0, 0, 0};
FILE* xFile = fopen(“quarterly.txt”, “r”);
while(fgets(xBuffer, 15, xFile) != NULL)
{
int xKey, xValue;
sscanf(xBuffer, “Q%d=%d”, &xKey, &xValue);
xSum[xKey-1] += xValue;
xNumElements[xKey-1]++;
}
int i;
for(i=0; i
OK, the
Oops, sorry but the lesser than sign doesn’t work in comments. š
Let’s do it with text…
for(i=0; i lesser_than 4; i++)
printf(“Mean for Q%d is %f\n”, i+1, xSum[i]/(float)xNumElements[i]);
}
I’d prefer awk. Much faster than perl..
The first version has a drawback, though. It acquires floating point error along the way, so if you use it on a very large dataset you might get an unprecise result.
See numerical recipes in c:
http://www.nrbook.com/a/bookcpdf/c14-1.pdf
So my suggestion:
$ awk -F\= ‘BEGIN {a=0.0}
> /Q1/ {a+=($2-a)/NR}
> END {print a}’ quarterly.txt
Also uses less memory, one variable less š
This works too, all arithmetic is floating point in awk.
awk -F\= '/Q1/ {a+=($2-a)/NR} END {print a}' quarterly.txt