Goal Docs :: working with tables.

Translating an R example into Goal

We show below an html highlighted version of the Goal file found at examples/purchases.goal in the repos. In the original post, the R solution is as follows:

purchases |>
  group_by(country) |>
  filter(amount <= median(amount) * 10) |>
  summarize(total = sum(amount - discount))

Below are several possible translations to Goal, a couple ones close to the R solution, and some others that efficiently and flexibly perform grouping after computing amount-discount, which in R would’ve required replacing first the discount column with the difference.

/ Translation to Goal of example from:
/ https://www.sumsar.net/blog/pandas-feels-clunky-when-coming-from-r/
/ Post pointed out by oantolin in the matrix chat.
(env"GOALLIB")or"GOALLIB"env"../lib"
import!"fmt math table"
pp:fmt.tbl[;0i-1;0i-1;"%.1f"]
purchases:{(*'x)!1_'x}@csv read"data/purchases.csv"
purchases:@[purchases;"amount""discount";"i"$]
/ Simple solution without using table.by.
pp(..x@<country)purchases.. [
  (a;d): :[;%country]='(amount;discount)
  ..[country:?country;total:+/'(a-d)@'&~a>10*math.med'a]]
/ OUTPUT:
/ === Table 11x2 ===
/     country total
/ ----------- -----
/ "Australia"   540
/    "Brazil"   414
/    "Canada"   270
/    "France"   450
/   "Germany"   513
/     "India"   648
/     "Italy"   567
/     "Japan"   621
/     "Spain"   594
/        "UK"   432
/       "USA"  8455
/ =============================================================================
/ Slightly faster but longer solution without using table.by (avoids ?S by
/ reusing classification indices with &S!¿by, though that's insignificant for
/ such a short csv, and it computes flat amount-discount before grouping).
pp(..x@<country)purchases.. [
  a:=amount!by:%country; ad:=(amount-discount)!by
  ..[country:&country!¿by;total:+/'ad@'&~a>10*math.med'a]]
/ =============================================================================
/ Solution close to the first one but using table.by[c;f;t] with f of rank 1.
/ Here x in f is a derived table after grouping each column.
pp(..x@<country)table.by..country[
  ....[total:+/'(a-discount)@'&~a>10*math.med'a:amount]]purchases
/ Note how the ..country after table.by merges with the other arguments,
/ following x..field[b] expansion rule to x["field";b].
/ =============================================================================
/ Solution using table.by[c;f;t] with f of rank 2 and flat amount-discount
/ computation before grouping. Here x in f is the original table and y is a
/ constant function returning grouping indices.
pp(..x@<country)table.by..country[
  ....[total:+/'(y=amount-discount)@'&~a>10*math.med'a:y=amount]]purchases