Goal Docs
::
working with tables.
We show below an html highlighted version of the Goal file
found at
examples/purchases.goal
in the repos.
In the
original post,
the R solution is as follows:
purchases |>
group_by(country) |>
filter(amount <= median(amount) * 10) |>
summarize(total = sum(amount - discount))
Below are several possible translations to Goal, a couple
ones close to the R solution, and some others that
efficiently and flexibly perform grouping after computing
amount-discount,
which in R would’ve required replacing first the discount
column with the difference.
/ Translation to Goal of example from:
/ https://www.sumsar.net/blog/pandas-feels-clunky-when-coming-from-r/
/ Post pointed out by oantolin in the matrix chat.
(env"GOALLIB")or"GOALLIB"env"../lib"
import!"fmt math table"
pp:fmt.tbl[;0i-1;0i-1;"%.1f"]
purchases:{(*'x)!1_'x}@csv read"data/purchases.csv"
purchases:@[purchases;"amount""discount";"i"$]
/ Simple solution without using table.by.
pp(..x@<country)purchases.. [
(a;d): :[;%country]='(amount;discount)
..[country:?country;total:+/'(a-d)@'&~a>10*math.med'a]]
/ OUTPUT:
/ === Table 11x2 ===
/ country total
/ ----------- -----
/ "Australia" 540
/ "Brazil" 414
/ "Canada" 270
/ "France" 450
/ "Germany" 513
/ "India" 648
/ "Italy" 567
/ "Japan" 621
/ "Spain" 594
/ "UK" 432
/ "USA" 8455
/ =============================================================================
/ Slightly faster but longer solution without using table.by (avoids ?S by
/ reusing classification indices with &S!¿by, though that's insignificant for
/ such a short csv, and it computes flat amount-discount before grouping).
pp(..x@<country)purchases.. [
a:=amount!by:%country; ad:=(amount-discount)!by
..[country:&country!¿by;total:+/'ad@'&~a>10*math.med'a]]
/ =============================================================================
/ Solution close to the first one but using table.by[c;f;t] with f of rank 1.
/ Here x in f is a derived table after grouping each column.
pp(..x@<country)table.by..country[
....[total:+/'(a-discount)@'&~a>10*math.med'a:amount]]purchases
/ Note how the ..country after table.by merges with the other arguments,
/ following x..field[b] expansion rule to x["field";b].
/ =============================================================================
/ Solution using table.by[c;f;t] with f of rank 2 and flat amount-discount
/ computation before grouping. Here x in f is the original table and y is a
/ constant function returning grouping indices.
pp(..x@<country)table.by..country[
....[total:+/'(y=amount-discount)@'&~a>10*math.med'a:y=amount]]purchases