Since the post explained about multi-line AWK in bash scripts, I took some
liberties with the formatting.
-------------------------------------------------
* calculate the average closing price, grouped per year
cat netflix.csv | awk -F'[,-]' '
{count[$1]++; sum[$1]+=$7}
END {
for(year in sum) {
print year, sum[year]/count[year]
}
}
' | sort -n | sed 1d
By keeping the numerator and denominator of the average separate, we can calculate the average at the end.
* calculate the max closing price, grouped per month
cat netflix.csv | awk -F'[,-]' '
{month = $1 "-" $2}
$7 > max[month] {max[month] = $7}
END {
for(month in max) {
print month, max[month]
}
}
' | sort -n | sed 1d
If we are interested in each month of each year (as opposed to "all Aprils"), we can
concatenate the year ($1) and the month ($2) as a new variable called month.
If the closing price ($7) is greater than the current max, we have a new max.
* calculate the median volume, in 2015
cat netflix.csv | awk -F'[,-]' '
$1 == 2015 {volume[i++] = $8}
END {
asort(volume)
print volume[i/2]
}
'
We keep pushing values in volume, by incrementing the variable i. asort(...) does numeric sorting of the array.
The median, for all practical purposes, is found in the middle, at i/2.
You can find Q1 and Q3 at i/4 and i*3/4, respectively.