Home. Documentation for Goal. Last update: 2024-11-13.

2 Tutorial #

This tutorial aims at giving a practical introduction to Goal, showcasing the language in a couple of practical examples after a short introductory tour of the language. People with array programming experience might prefer to skip the introduction or even jump directly into the concise reference document with short usage examples in the Help chapter.

This tutorial is presented as the result of an interactive REPL session, like when you type the command goal. If you want, you can test things and experiment as you follow the tutorial. Also, for a better experience using the REPL (to get typical keyboard shortcuts), you can install the readline wrapper rlwrap program (available as a package in most systems) and then use rlwrap goal instead.

2.1 Introduction #

2.1.1 Arithmetic #

Arithmetic is similar to that of most programming languages, as you can see in the following interactive REPL session:

  2+3 / addition
5
  5-3 / subtraction
2
  2*3 / multiplication
6
  3%2 / division (returns a float)
1.5
  2!5 / remainder (2 as divisor)
1
  -2!5 / quotient (2 as divisor)
2
  2&3 / minimum
2
  2|3 / maximum
3

A minor surprise might be that division is %, because / already has other uses, like commenting. More surprising is maybe that the remainder and quotient operator is spelled ! and has its arguments reversed: like in various operators, the meaning depends on the left argument’s domain and type. Arithmetic operators are also used for common string handling tasks:

  "a"+"b" / concatenation
"ab"
  "abc.ext"-".ext" / removal of suffix (if it exists)
"abc"
  "a"*3 / repeat string
"aaa"

Comparisons behave like arithmetic operators and return numeric values: 0 for false, 1 for true.

  2<3
1
  2>3
0
  2=3
0

2.1.2 Arrays #

Goal has a relatively small number of types. We just saw that integers, like 2 or 0, can also be interpreted as booleans. There were a couple of floats too, like 1.5. Strings can be built in a variety of ways, like with double-quoting "some text\n" or a raw string quoting construct rq`literal backslash: \`. There are a few other scalar types, also called atomic types, like functions, regular expressions, handles, and error values, as we’ll see later.

Arrays are immutable sequential collections that can contain scalar values or nested arrays. In K and Goal, arrays are free form and often just called lists, in contrast to many other array languages where arrays have higher-dimensional rectangular shapes. Arrays can be built in several ways:

3 5 7 / array with 3 integers (stranding notation)
3,5 7 / same using join operator between atom 3 and array 5 7
,3 / enlist integer atom 3 in an array of length 1
"a" "b" "c" / array with 3 strings
"a" "b" 5 / array with 2 strings and an integer
(3;"a";5 7) / array with 3 elements: integer, string, nested array

The stranding notation makes it very easy to write arrays of numeric and string literals. More complex arrays, containing nested arrays, other atom types, or variables, require the join operator or the generic list notation with parens and semicolons (or synonymous newlines). Also, note that the generic list notation can only be used for lists with at least two elements, because without semicolons parens represent simply a parenthesized expression. A list with a single element is written using the , operator using prefix application, like for ,3 in the above examples.

Arrays containing non-string and non-numeric types, nested arrays or a mix of types, are called generic arrays.

2.1.3 Vectorization #

What makes Goal an array language is the generalization of operations to whole immutable arrays.

  2 4+1 / addition of an array of integers with an integer atom
3 5
  8 9%2 4 / division on arrays of integers (returns floats)
4.0 2.25
  3!4 5 6 / remainder for array of integers
1 2 0
  2 3 4=3 / element-wise equality (returns an array of 0s and 1s)
0 1 0
  "a"*3 2 1 0 / repeat string (returns an array of strings)
"aaa" "aa" "a" ""

This vectorization extends beyond basic operations and concerns any operation where it makes sense, like for example number parsing, string formatting, and array indexing.

  "n"$"1.5" "2.5" / parse numbers from strings
1.5 2.5
  "s"$1.5 2.5 / format values
"1.5" "2.5"
  "%.2f"$1.579 2.5 / sprintf-like formatting for floats
"1.58" "2.50"
  7 8 9[2 1] / bracket indexing at positions 2 and 1
9 8
  7 8 9@2 1 / indexing using “apply at” operator @
9 8
  (6 7;8 9;10 11)[;0] / deep indexing: get all rows, first column
6 8 10

In most programming languages, operators work on scalar immutable values (also called atoms), like numbers and sometimes strings too, but containers are handled using loops, higher order functions, or explicit recursion. This means that such languages do not need many operators: there aren’t that many interesting basic operations when working with scalars.

When working at the immutable array level, there is a larger range of interesting pure operations. For example:

  2#6 7 8 9 / take first two values
6 7
  2#(6 7;8 9;10 11) / same with generic array
(6 7
 8 9)
  5#6 7 8 9 / take 5 values, repeat if there aren't enough
6 7 8 9 6
  5#1 / repeat 5 times a single atom
1 1 1 1 1
  2_6 7 8 9 / drop first two values
8 9

By working on immutable arrays, these operations can be used in the same way as arithmetic operators on scalars are in a formula, without worrying about state.

2.1.4 Dyads and monads #

In the examples until now, most operators we saw were dyadic, meaning they took two arguments. In Goal, like most array languages, operators can also be used monadically, taking only one argument, with a meaning that may or may not be related to the dyadic one but generally has some kind of mnemonic.

The polysemic nature of the operators is one of the things that makes them so concise and versatile, yet intuitive in the same way the polysemic nature of natural languages is for us. Here are some examples of monadic uses:

  ,3 4 / enlist: nest array 3 4 in a list of length 1
,3 4
  #7 8 9 / length
3
  *7 8 9 / first
7
  _4.2 / floor
4.0
  !"Unicode-space separated\tfields" / fields
"Unicode-space" "separated" "fields"
  !10 / enum
0 1 2 3 4 5 6 7 8 9
  @10 / type: "i" for integers, "s" for strings ...
"i"
  &0 0 1 0 0 0 1 / where (indices of 1s)
2 6
  |7 8 9 / reverse
9 8 7

The number and versatility of operators may seem daunting at first: there’s surely a learning curve there, but you’ve probably learnt harder things already, so take it easy and check the help when you need as you progress. In time, you might end up loving using such a powerful notation!

2.1.5 Verbs and Adverbs #

Monadic and dyadic operators in array languages are often called verbs. While most common array transformations can be performed with them directly, more complex kinds of iterations might still require recursion or higher-order operators. The latter are called adverbs, because of how they modify verbs. The modified verb is called a derived verb. There are three adverb operators in Goal: fold /, scan \, and each '. They are quite versatile and can be used in a variety of ways. A few examples:

  +/!10 / sum
45
  +\!10 / cumulative sum
0 1 3 6 10 15 21 28 36 45
  #'(4 5;6 7 8) / length of each nested list
2 3

It’s important that the adverb tightly follow the verb it modifies, without spaces, otherwise it’s not an adverb but special syntax, like for example / for comments.

Other forms of those adverbs allow for other kinds of functional iterations, like the “converge” form or the seeded “while” and “fold while” forms.

Following the same natural language metaphor, we call noun any expression used as a value in an operation or statement. Note that verb, adverb and noun notions are purely syntactic, because Goal’s grammar is context-free. In particular, while verbs and adverbs represent functions, the reverse is not true. As we’ll see in a later section, most ways of creating functions in Goal result in nouns. Actually, even verbs and adverbs can be nominalized, for example using parens around them, so while + is a verb, (+) is a noun, despite the fact that both represent the same function.

  *(+;-) / first of generic array containing nominalized verbs + and -
+
  (nan)^1.0 0n 2.5 / weed out NaNs using nominalized nan verb
1.0 2.5

Adverbs can also modify nouns to form a noun-derived verb, including nouns representing non-function values, like with join and split for strings.

  ","/"a" "b" "c" / join
"a,b,c"
  ","\"a,b,c" / split
"a" "b" "c"

2.1.6 Control flow #

Control flow tends to be less explicit than in scalar programming languages, thanks to the powerful verbs and adverbs, but sometimes explicit conditionals are useful. Goal provides a ?[cond;then;else] syntax form for if-then-else conditionals, as well as logical syntax keywords and and or with short-circuit behavior.

  ?[3>0;"3 is positive";"uh?"]
"3 is positive"
  (3>0)and"3 is positive"
"3 is positive"
  (-3>0)and"-3 is positive"
0

Note that there are several kinds of false values in Goal, like numerical 0, NaN and negative infinity, "", empty arrays and error values.

2.1.7 Evaluation Order #

Most programming languages give precedence to some operators over others. This is not practical in array languages, given the large number of operators. Instead, all verbs use the same precedence and are right-associative.

  2*3+4
14
  (2*3)+4
10

In contrast, adverbs are left-associative, attaching tightly to the nominalized verb or noun they follow.

  +/'(1 2;3 4) / sum in each sublist: read as (+/)'
3 7

2.1.8 Variables #

Variables can be defined as follows:

  a:2 / assignment (prevents echo in REPL)
  a
2
  a+:3 / assignment operation (like a:a+3)
  a
5
  (b;c.d):6 7 / list assignment
  b
6
  c.d / dot-prefixed variable name
7

Variables can also be interpolated within strings.

  "a = $a; b = $b; c.d = $c.d"
"a = 5; b = 6; c.d = 7"

2.1.9 Dictionaries #

Dictionaries are simply a pair of key and value arrays of same length. They are created with the dyadic verb !, and many operators work on them in natural ways.

  d:"a""b"!1 2 / keys!values (same as d:..[a:1;b:2] with dict syntax)
  d"b" / get value associated with key
2
  d,"b""c"!3 4 / merging dicts: upsert semantics
!["a" "b" "c"
  1 3 4]
  .d / get values
1 2
  !d / get keys (monadic use of ! on dict)
"a" "b"

Goal offers more advanced dict and table functionality that’s out of scope for this tutorial: check the help and the Working with tables chapter for learning about those.

2.1.10 Errors #

Goal has two kinds of errors: panics and error values. The former are usually reserved for fatal programming errors and may be produced by builtins, for example due to a type error, or manually using panic. The latter are generated manually using the error keyword or produced by some builtins, in particular for IO (input/output).

  error"msg" / generate custom error
error["msg"]
  rx "[a-z" / attempt to compile regexp from string
error["error parsing regexp: missing closing ]: `[a-z`"]
  read"missing-file.txt" / attempt to read a file into a string
error[!["msg" "op" "path" "err"
        "open missing-file.txt: no such file or directory" "open" "missing-file.txt" "file does not exist"]]
  1+"a" / invalid operation: panics with message and displays error location
'ERROR i+y : bad type "s" in y
1+"a"
 ^

Error values are false values, which can be useful in conditionals. The type verb @ returns "e" for errors and can be used to unambiguously confirm that a value is an error. Also, it’s possible to use the ' syntax for returning errors early, like we’ll see in a scripting example later. Note how error values are not limited to plain strings and can be any kind of value, like a dictionary, as the last example above illustrates. See the question about error values in the FAQ for a deeper understanding of how error values work.

2.1.11 Functions #

Functions are first-class citizens. User-defined functions can be created via lambda-like expressions and can, like all values, be assigned to variables:

  f:{[name;ext]"${name}.$ext"}
  f["fname";"csv"]
"fname.csv"
  f[;"csv"]"fname" / same with projection fixing second argument
"fname.csv"
  g:{2+x} / same as {[x]2+x} but using default argument x
  g 3
5
  g[3] / the same as g 3 or g@3
5
  g:2+ / same with projection fixing left argument: same as {2+x}
  g 3
5

For convenience, if no formal arguments are specified between square brackets, x, y and z can be used as implicit argument names. Also, projection syntax can be used when deriving a new function by fixing some arguments of another. Both features are very useful for defining many short functions, often used inline followed by an adverb.

  (-2!)\10 / converge form of the scan operator with function left
10 5 2 1 0
  f[;"csv"]'"fname1" "fname2" / apply projection for each name
"fname1.csv" "fname2.csv"

Within functions, several statements can be separated with semicolons or newlines, and early return can be obtained by using a colon : before the value we want to return, typically at the beginning of a conditional’s branch. Note that depending on how it’s used, the colon : can have other uses, like assignment if it follows an identifier, but there can never be any confusion.

For example, the following multi-statement function returns a string formatting the minimum and maximum of a numeric list, but returns early "min=?; max=?" if the list is empty.

  minMax:{(#x)or:"min=?; max=?"; min:&/x; max:|/x; "min=$min; max=$max"}
  minMax 3 -2 7 5
"min=-2; max=7"
  minMax[!0]
"min=?; max=?"

Note that &/ and |/ on empty numeric lists return respectively the largest and smallest numbers (of integer type in this case). This is usually a good behavior, but we went a fancier route above for the sake of example.

It’s worth noting that user-defined functions with lambda notation, as well as variables and array literals, are grammatically nouns, unlike primitive operators that work by default as verbs or adverbs. This means they are parsed as nouns, so parens are never needed around them for nominalization, but application sometimes needs to be explicit, with square brackets or @, when they might be parsed as the left argument of some primitive verb or adverb instead.

  {x<0}^1 -3 4 -2 / weed out negative values
1 4
  (0>)^1 -3 4 -2 / same with projection (parens needed)
1 4
  a:3 -4 5 -6 7 -8 9
  b:0 0 1 0 0 0 1
  a@&b / index/apply (@) array where (&) 1s
5 9
  a[&b] / same with bracket indexing
5 9
  a&b / min (array used as left argument of &)
0 -4 1 -6 0 -8 1

2.2 Examples #

2.2.1 Word frequency #

Word frequency analysis is a simple problem that highlights well some basic verbs. It’s also an opportunity to showcase a simple use of regexps, as well as basic IO.

We’ll use as text source the first novel of the I, Mor-Eldal free (as in freedom) fantasy trilogy, a copy of which is available here exported in markdown format.

The first step is reading the file into a string and storing it into a variable.

  s:read"01-yo-mor-eldal-en.md"
  &s / number of bytes
570236

Now we’re going to split the string into words using a regexp. A basic approach would be using a regexp like rx/[A-Za-z-]+/, but this only works if there are no non-ASCII letters. A somewhat more robust approach that will work for more languages may instead use a regexp like rx/[\p{L}-]+/. This makes use of a particular Unicode property that matches all kinds of letters as understood by Unicode.

  words:_rx/[\p{L}-]+/[s;-1]

This stores into a variable words all matches of the given regexp. The -1 argument specifies the maximal number of desired matches, and a negative number means any number of matches. Note how the regular expression can be applied like a function. Finally, the verb _ lowercases all letters in the words, so that we can then compare their frequency in a more realistic manner.

  #words / number of words
103946
  5#words / take first 5 words
"i" "mor-eldal" "the" "necromancer" "thief"

We’ll now get into frequency computing. The monadic form of the verb % is used to classify elements of an array. It will return an array of integers that will attribute to each distinct element a number, starting from zero. For example:

  %"a""b""a""c""b""b"
0 1 0 2 1 1

Then, we can use the monadic form of the verb = to perform index-counting, to know how many times each class occurs, in other words, how many zeros, ones, twos ... there are.

  =0 1 0 2 1 1
2 3 1

This shows us that "a" (class 0) has 2 ocurrences, "b" (class 1) appeared 3 times, while "c" (class 2) had only one occurrence.

We are now ready to perform the same with our word data.

  freq:=%words
  #freq / number of classes = number of distinct words
6131
  5#freq / take first 5 elements
4976 51 5292 2 17

If we match this with the first 5 words, we now can say that "i" has 4976 occurrences, and "the" has 5292.

To get a decreasing list of matchings between words and frequencies, we can sort down a dictionary:

  d:>(?words)!freq

The verb ? used in monadic form returns a new list of words without duplicates, preserving only the first occurrences of each element. Then, the verb > sorts down the dictionary by its values, in this case the frequencies. We can now query the 5 most used words:

  5#d
!["the" "i" "and" "a" "to"
  5292 4976 3628 2654 2521]

Visualizing the default presentation of a dictionary can be hard if there are many keys and values. The following utility function provides a basic solution by putting both the keys and values in a same list, and flipping its columns and rows using the monadic form of the verb +.

  tbl:{+(!x;.x)}
  tbl 10#d
("the" 5292
 "i" 4976
 "and" 3628
 "a" 2654
 "to" 2521
 "you" 1739
 "he" 1582
 "of" 1579
 "it" 1434
 "me" 1361)

A somewhat more involved exercise, which we’ll leave to the reader, would be for example to study word frequency in restricted text windows (using the windows i^y verb form on a list of lines, for example), and search for unwanted repetitions that wouldn’t fit the style.

2.2.2 Handling simple climate data #

Handling CSV data of various kinds is something array languages are particularly well-suited for. In this section, we’ll parse simple daily climate data and process it to obtain a few daily summary results that will be included into a larger monthly summary.

Instead of proceeding in a REPL session as previously in this tutorial, we’ll write a script file, suitable for being called periodically to process a new day’s data. Because there’s no echo showing intermediate results in such case, you can use an output keyword, like say or print to output a string representation of a value to standard output. Alternatively, you can use a logging \ before a value, not following tightly a noun: that will format and print the value on standard error (acting as identity and doing nothing more).

Assume we have a set of files with daily climate data, named following the year-month-day order convention, as in 20060102.csv. We provide example files for two dates: 20230512.csv and 20230513.csv.

The first day starts like this:

2023-05-12T00:00 11.3 87 1014.5 1085.720
2023-05-12T00:01 11.3 87 1014.6 1085.720
2023-05-12T00:02 11.3 87 1014.5 1085.720
2023-05-12T00:03 11.3 87 1014.6 1085.720
2023-05-12T00:04 11.3 87 1014.6 1085.720
...

The next day looks like this:

2023-05-13T00:00 13.8 76 1015.7 1123.540
2023-05-13T00:01 13.9 76 1015.6 1123.540
2023-05-13T00:02 13.9 76 1015.6 1123.540
2023-05-13T00:03 13.9 76 1015.5 1123.540
2023-05-13T00:04 13.9 76 1015.6 1123.540
...

They have five columns: date (one record per minute), temperature (°C), relative humidity (%), air pressure (hPa), and accumulated precipitation (mm).

For temperature, relative humidity and air pressure, we want to get the mean, maximum and minimum values, as well as the first time at which maximum and minimum occur. Also, because there could be some missing entries or nonsensical erroneous values, we want to know the number of valid records of each type.

For precipitation, we want to know the day’s total precipitation, as well as some basic intensity data: the amount and time of the 1-hour window with most precipitation. We’ll have to take into account practical issues, like any missing entries or the possibility of reaching the maximum recordable precipitation by the collecting device we use (2500 in our case), at which point it would be reset to zero again.

Because we want to make a script, we’ll want to use the array ARGS of the arguments passed to goal. The first argument would be the name of the script, which we’ll call climday.goal, and the second would be the date of the day in 20060102 format. We’ll first do some basic checking on arguments and get the date:

(2=#ARGS)or:error"USAGE: goal clim.goal date
date should be in 20060102 format"
date:ARGS 1 / date from examples is "20230512" or "20230513"

In case of an incorrect number of arguments, we return with : a usage error produced with the monadic verb error. When using : to return early from global code, the program will exit with status 1 if the returned value is an error. Also, note the usage of the syntax keyword or with short-circuiting behavior.

We then read the csv file into variables.

(dates;temp;rh;pres;prec):" "csv 'read"${date}.csv"

This first calls read on a file corresponding to the given date. Note ' just before read. When not preceded by a noun or verb (without spaces), ' does nothing if the result is not an error, but returns it early otherwise (like : would). The latter could happen for example if the file doesn’t exist or is not readable. The dyadic verb csv parses the space-separated CSV text into a list of columns, which we assign to various variables at once.

For convenience and easier reasoning later, we replace dates with their unix epoch value:

dates:time["unix";dates;"2006-01-02T15:04"]

This makes use of the verb time which is described in the help. Here, we ask for the unix time of the dates column, using the layout string "2006-01-02T15:04" for parsing .

Other columns contain numeric strings, so we’ll parse them into numbers.

(temp;rh;pres;prec):"n"$(temp;rh;pres;prec) / parse into numbers

We will first treat the case of temperature, relative humidity and air pressure, as they can be handled in a similar way and without caring about missing values.

A helper formatting function for formatting the times corresponding to a maximum or minimum will come in handy:

fmtclock:time["15:04";]

We now write a meanMaxMin function that will take three parameters: a numeric data column c, a filter function f for discarding nonsensical values, and a format string fmt for displaying the mean, maximum and minimum.

meanMaxMin:{[c;f;fmt] (
  fmt${(+/x)%#x}fc:f^c / mean
  fmt$c[i:*&c=|/fc]    / max
  fmtclock dates[i]    / max-time
  fmt$c[i:*&c=&/fc]    / min
  fmtclock dates[i]    / min-time
  $#fc                 / number of records
)}

We’ll now explain the interesting bits, in particular those using features we haven’t covered yet. In the line:

  fmt${(+/x)%#x}fc:f^c / mean

The c parameter contains numerical data, like temp, rh or pres. The filtering code fc:f^c removes from c the values for which the filter function f returns a true value, and it then stores the result into a variable fc. The filter could be for example {(-20>x)|49<x} to discard bogus temperatures that wouldn’t make any sense (in the current location). Finally, {(+/x)%#x} computes the mean, and fmt$ formats the result according to format fmt, for example "%.1f".

Next is the line computing the maximum.

  fmt$c[i:*&c=|/fc]    / max

The maximum value is obtained simply with |/fc, but, for getting the time next, we want to know when it happens. We compute a boolean vector c=|/fc of positions where the maximum value appears in the original c. The indices of positions with a 1 are obtained by calling “where” & on the result. Using “first” * on the list of indices returns the index of the first occurrence of the maximum value in c. We store that index in i and use it to get the time dates[i]. The minimum is obtained in a similar way. Finally, the number of records is just the number of elements that remain after applying the filter, and we format it into a string with $ (in the default way for integers).

Handling precipitation is a bit more complicated, because we have cumulated precipitations instead of minutely precipitation. Also, when going over 2500, the accumulator overflows and goes to zero again. We’ll therefore handle both things and convert our data into minutely precipitation.

prec:{x-»x}prec-*prec:prec+{2500*+\x<»x}prec / minutely precipitation

First, prec+{2500*+\x<»x}prec cancels any resets at 2500. The x<»x part puts a 1 at the places where resets, if any, occur, by comparing x with itself shifted right by one, with a 0 as filler left, using the right-shift verb » (which can also be spelled rshift). The sum scan will transform the result such that each element corresponds to the number of resets up to that point, such that when multiplying by 2500 we obtain the amount that was discarded due to resets. The {x-»x}prec-*prec part transforms the obtained cumulative precipitation into minutely precipitation. Total precipitation can now be obtained and formatted easily with "%.3f"$+/prec

As we said before, we also want to compute the 1-hour window with most precipitation. This requires some further processing of the precipitation data, filling any missing records with 0.

unix:time["unix";date;"20060102"]
mdates:unix+60*!1440             / minutes of the day
prec:.(mdates!1440#0),dates!prec / fill missing minutes with zeros

This creates an array mdates with all the dates corresponding to minutes in the day. It then merges a template dictionary mdates!1440#0, filled with zeros, with a dictionary corresponding to recorded dates and precipitation data.

We can now compute the precipitation in all 60-minutes windows of the day with prec1h:+/60^prec. The time of the 1-hour window with maximum precipitation can then be obtained with mdates@*>prec1h. The *>prec1h call is an idiom that returns the index of the first occurrence of the maximum value, obtained trough the descending sorting permutation indices returned by >. It’s a simpler way to write *&prec1h=|/prec1h.

We gather a record with all the desired daily results:

record:,//(
  date:time["2006-01-02";unix]
  / mean, max, max-time, min, min-time, nrecords for temp, rh, pres
  meanMaxMin[temp;{(-20>x)|49<x};"%.1f"]
  meanMaxMin[rh;{(0>x)|100<x};"%.0f"]
  meanMaxMin[pres;{(960>x)|1060<x};"%.1f"]
  "%.3f"$+/prec             / total precipitation
  "%.3f"$|/prec1h:+/60^prec / max precipitation in 1 hour
  fmtclock mdates@*>prec1h  / time of 1-hour window with max
)

Note the ,// at the beginning that transforms any nested list into a flat list (applies join-over until convergence). For the first example date, if we display the record with say, we get:

2023-05-12 12.9 16.8 16:16 10.3 08:59 1435 79 89 10:23 63 16:08 1435 1013.9
1015.8 20:41 1012.5 04:52 1435 37.820 11.920 04:26

(wrapped for display purposes here, but it’s a single line)

All that is left is adding this record for the given date’s day to a monthly CSV file, updating it if there is already one.

month:time["200601";unix]
mdata:read["${month}.csv"]or""  / read data of the month
mcsv:{x[;0]=date}^+" "csv mdata / remove record for date if already present
mcsv,:,record                   / add our new record at the end
mcsv@:<mcsv[;0]                 / sort records again by date
'"${month}.csv"print" "csv+mcsv / write the file again

One new thing here is the monadic use of + applied to the result of the csv verb to flip columns ands rows, so that we get a list of the CSV’s records, which in this case is more convenient because we want to add/replace a particular record, not a column. Another novelty is the monadic form of < which returns a sorting permutation for its input, here the date column of the monthly data. The assignment operation @: replaces old mcsv by indexing it on the sorting permutation. Finally, the dyadic form of print allows to print the result to a specific file given as left argument. Also, note how the verb csv is used both for parsing and serializing, depending on whether the input is a string or a list of columns.

Running now the script for both rainy days produces this result in the summary monthly file:

2023-05-12 12.9 16.8 16:16 10.3 08:59 1435 79 89 10:23 63 16:08 1435 1013.9
1015.8 20:41 1012.5 04:52 1435 37.820 11.920 04:26
2023-05-13 14.2 17.4 14:48 12.5 10:03 1437 80 88 11:01 69 14:28 1437 1016.2
1017.2 10:38 1014.8 02:17 1437 62.160 35.740 08:32

(both lines wrapped for display purposes here)

Well, this example was a bit long! Some things could still be improved, like more robust and informative error handling in case of invalid dates or data that should normally not happen. Also, instead of directly replacing the monthly file at the end, it would be safer to write it first to another temporary file, to avoid corrupting the file in case of a power outage during the write (though it can be re-obtained by running the script for all days of the month). Still, this script does some actually useful things without much code, so I hope this example does highlight some of the strengths of array programming!

To finish, we reproduce the complete script below:

(2=#ARGS)or:error"USAGE: goal clim.goal date
date should be in 20060102 format"
date:ARGS 1 / date from examples is "20230512" or "20230513"
(dates;temp;rh;pres;prec):" "csv 'read"data/${date}.csv"
dates:time["unix";dates;"2006-01-02T15:04"]
(temp;rh;pres;prec):"n"$(temp;rh;pres;prec) / parse into numbers
fmtclock:time["15:04";]
meanMaxMin:{[c;f;fmt] (
  fmt${(+/x)%#x}fc:f^c / mean
  fmt$c[i:*&c=|/fc]    / max
  fmtclock dates[i]    / max-time
  fmt$c[i:*&c=&/fc]    / min
  fmtclock dates[i]    / min-time
  $#fc                 / number of records
)}
prec:{x-»x}prec-*prec:prec+{2500*+\x<»x}prec / minutal precipitation
unix:time["unix";date;"20060102"]
mdates:unix+60*!1440             / minutes of the day
prec:.(mdates!1440#0),dates!prec / fill missing minutes with zeros
record:,//(
  date:time["2006-01-02";unix]
  / mean, max, max-time, min, min-time, nrecords for temp, rh, pres
  meanMaxMin[temp;{(-20>x)|49<x};"%.1f"]
  meanMaxMin[rh;{(0>x)|100<x};"%.0f"]
  meanMaxMin[pres;{(960>x)|1060<x};"%.1f"]
  "%.3f"$+/prec             / total precipitation
  "%.3f"$|/prec1h:+/60^prec / max precipitation in 1 hour
  fmtclock mdates@*>prec1h  / time of 1-hour window with max
)
month:time["200601";unix]
mdata:read["data/${month}.csv"]or""  / read data of the month
mcsv:{x[;0]=date}^+" "csv mdata      / remove record for date if already present
mcsv,:,record                        / add our new record at the end
mcsv@:<mcsv[;0]                      / sort records again by date
'"data/${month}.csv"print" "csv+mcsv / write the file again

2.3 Learn more #

At this point, you should have a grasp of the spirit of the language. You probably want to check out the Help chapter and experiment with simple problems of your own. You then might want to follow with the FAQ, or jump directly into the Working with tables chapter.