This tutorial aims at giving a practical introduction to Goal, showcasing the language in a couple of practical examples after a short introductory tour of the language. People with array programming experience might prefer to skip the introduction or even jump directly into the concise reference document with short usage examples in the Help chapter.
This tutorial is presented as the result of an interactive
REPL session, like when you type the command
goal
.
If you want, you can test things and experiment as you
follow the tutorial.
Also, for a better experience using the REPL (to get typical
keyboard shortcuts), you can install the readline wrapper
rlwrap
program (available as a package in most systems) and
then use
rlwrap goal
instead.
Arithmetic is similar to that of most programming languages, as you can see in the following interactive REPL session:
2+3 / addition 5 5-3 / subtraction 2 2*3 / multiplication 6 3%2 / division (returns a float) 1.5 2!5 / remainder (2 as divisor) 1 -2!5 / quotient (2 as divisor) 2 2&3 / minimum 2 2|3 / maximum 3
A minor surprise might be that division is
%
,
because
/
already has other uses, like commenting.
More surprising is maybe that the remainder and quotient
operator is spelled
!
and has its arguments reversed: like in various operators,
the meaning depends on the left argument’s domain and type.
Arithmetic operators are also used for common string
handling tasks:
"a"+"b" / concatenation "ab" "abc.ext"-".ext" / removal of suffix (if it exists) "abc" "a"*3 / repeat string "aaa"
Comparisons behave like arithmetic operators and return numeric values:
0
for false,
1
for true.
2<3 1 2>3 0 2=3 0
Goal has a relatively small number of types. We just saw
that integers, like
2
or
0
,
can also be interpreted as booleans.
There were a couple of floats too, like
1.5
.
Strings can be built in a variety of ways, like
with double-quoting
"some text\n"
or a raw string quoting construct
rq`literal backslash: \`
.
There are a few other scalar types, also called
atomic
types, like functions, regular expressions, handles, and
error values, as we’ll see later.
Arrays are immutable sequential collections that can contain scalar values or nested arrays. In K and Goal, arrays are free form and often just called lists, in contrast to many other array languages where arrays have higher-dimensional rectangular shapes. Arrays can be built in several ways:
3 5 7 / array with 3 integers (stranding notation)
3,5 7 / same using join operator between atom 3 and array 5 7
,3 / enlist integer atom 3 in an array of length 1
"a" "b" "c" / array with 3 strings
"a" "b" 5 / array with 2 strings and an integer
(3;"a";5 7) / array with 3 elements: integer, string, nested array
The stranding notation makes it very easy to write arrays of
numeric and string literals. More complex arrays, containing
nested arrays, other atom types, or variables, require the
join operator or the generic list notation with parens and
semicolons (or synonymous newlines). Also, note that the
generic list notation can only be used for lists with at
least two elements, because without semicolons parens
represent simply a parenthesized expression. A list with a
single element is written using the
,
operator using prefix application, like for
,3
in the above examples.
Arrays containing non-string and non-numeric types, nested arrays or a mix of types, are called generic arrays.
What makes Goal an array language is the generalization of operations to whole immutable arrays.
2 4+1 / addition of an array of integers with an integer atom 3 5 8 9%2 4 / division on arrays of integers (returns floats) 4.0 2.25 3!4 5 6 / remainder for array of integers 1 2 0 2 3 4=3 / element-wise equality (returns an array of 0s and 1s) 0 1 0 "a"*3 2 1 0 / repeat string (returns an array of strings) "aaa" "aa" "a" ""
This vectorization extends beyond basic operations and concerns any operation where it makes sense, like for example number parsing, string formatting, and array indexing.
"n"$"1.5" "2.5" / parse numbers from strings 1.5 2.5 "s"$1.5 2.5 / format values "1.5" "2.5" "%.2f"$1.579 2.5 / sprintf-like formatting for floats "1.58" "2.50" 7 8 9[2 1] / bracket indexing at positions 2 and 1 9 8 7 8 9@2 1 / indexing using “apply at” operator @ 9 8 (6 7;8 9;10 11)[;0] / deep indexing: get all rows, first column 6 8 10
In most programming languages, operators work on scalar immutable values (also called atoms), like numbers and sometimes strings too, but containers are handled using loops, higher order functions, or explicit recursion. This means that such languages do not need many operators: there aren’t that many interesting basic operations when working with scalars.
When working at the immutable array level, there is a larger range of interesting pure operations. For example:
2#6 7 8 9 / take first two values 6 7 2#(6 7;8 9;10 11) / same with generic array (6 7 8 9) 5#6 7 8 9 / take 5 values, repeat if there aren't enough 6 7 8 9 6 5#1 / repeat 5 times a single atom 1 1 1 1 1 2_6 7 8 9 / drop first two values 8 9
By working on immutable arrays, these operations can be used in the same way as arithmetic operators on scalars are in a formula, without worrying about state.
In the examples until now, most operators we saw were dyadic, meaning they took two arguments. In Goal, like most array languages, operators can also be used monadically, taking only one argument, with a meaning that may or may not be related to the dyadic one but generally has some kind of mnemonic.
The polysemic nature of the operators is one of the things that makes them so concise and versatile, yet intuitive in the same way the polysemic nature of natural languages is for us. Here are some examples of monadic uses:
,3 4 / enlist: nest array 3 4 in a list of length 1 ,3 4 #7 8 9 / length 3 *7 8 9 / first 7 _4.2 / floor 4.0 !"Unicode-space separated\tfields" / fields "Unicode-space" "separated" "fields" !10 / enum 0 1 2 3 4 5 6 7 8 9 @10 / type: "i" for integers, "s" for strings ... "i" &0 0 1 0 0 0 1 / where (indices of 1s) 2 6 |7 8 9 / reverse 9 8 7
The number and versatility of operators may seem daunting at first: there’s surely a learning curve there, but you’ve probably learnt harder things already, so take it easy and check the help when you need as you progress. In time, you might end up loving using such a powerful notation!
Monadic and dyadic operators in array languages are often
called
verbs.
While most common array transformations can be
performed with them directly, more complex kinds of
iterations might still require recursion or higher-order
operators.
The latter are called
adverbs,
because of how they modify verbs. The modified verb is called a
derived verb.
There are three adverb operators in
Goal: fold
/
,
scan
\
,
and each
'
.
They are quite versatile and can be used in a variety of
ways. A few examples:
+/!10 / sum 45 +\!10 / cumulative sum 0 1 3 6 10 15 21 28 36 45 #'(4 5;6 7 8) / length of each nested list 2 3
It’s important that the adverb tightly follow the verb it
modifies, without spaces, otherwise it’s not an adverb but
special syntax, like for example
/
for comments.
Other forms of those adverbs allow for other kinds of functional iterations, like the “converge” form or the seeded “while” and “fold while” forms.
Following the same natural language metaphor, we call
noun
any expression used as a value in an operation or statement.
Note that verb, adverb and noun notions are purely
syntactic, because Goal’s grammar is context-free. In
particular, while verbs and adverbs represent functions, the
reverse is not true. As we’ll see
in a later section,
most ways of creating functions in Goal result in nouns.
Actually, even verbs and adverbs can be nominalized, for
example using parens around them, so while
+
is a verb,
(+)
is a noun, despite the fact that both represent the same
function.
*(+;-) / first of generic array containing nominalized verbs + and - + (nan)^1.0 0n 2.5 / weed out NaNs using nominalized nan verb 1.0 2.5
Adverbs can also modify nouns to form a noun-derived verb, including nouns representing non-function values, like with join and split for strings.
","/"a" "b" "c" / join "a,b,c" ","\"a,b,c" / split "a" "b" "c"
Control flow tends to be less explicit than in scalar
programming languages, thanks to the powerful verbs and
adverbs, but sometimes explicit conditionals are useful.
Goal provides a
?[cond;then;else]
syntax form for if-then-else conditionals, as well as
logical syntax keywords
and
and
or
with short-circuit behavior.
?[3>0;"3 is positive";"uh?"] "3 is positive" (3>0)and"3 is positive" "3 is positive" (-3>0)and"-3 is positive" 0
Note that there are several kinds of
false values
in Goal, like numerical
0
,
NaN and negative infinity,
""
,
empty arrays and error values.
Most programming languages give precedence to some operators over others. This is not practical in array languages, given the large number of operators. Instead, all verbs use the same precedence and are right-associative.
2*3+4 14 (2*3)+4 10
In contrast, adverbs are left-associative, attaching tightly to the nominalized verb or noun they follow.
+/'(1 2;3 4) / sum in each sublist: read as (+/)'
3 7
Variables can be defined as follows:
a:2 / assignment (prevents echo in REPL) a 2 a+:3 / assignment operation (like a:a+3) a 5 (b;c.d):6 7 / list assignment b 6 c.d / dot-prefixed variable name 7
Variables can also be interpolated within strings.
"a = $a; b = $b; c.d = $c.d"
"a = 5; b = 6; c.d = 7"
Dictionaries are simply a pair of key and value arrays of
same length. They are created with the dyadic verb
!
,
and many operators work on them in natural
ways.
d:"a""b"!1 2 / keys!values (same as d:..[a:1;b:2] with dict syntax) d"b" / get value associated with key 2 d,"b""c"!3 4 / merging dicts: upsert semantics !["a" "b" "c" 1 3 4] .d / get values 1 2 !d / get keys (monadic use of ! on dict) "a" "b"
Goal offers more advanced dict and table functionality that’s out of scope for this tutorial: check the help and the Working with tables chapter for learning about those.
Goal has two kinds of errors: panics and error values. The
former are usually reserved for fatal programming errors and
may be produced by builtins, for example due to a type
error, or manually using
panic
.
The latter are generated manually using the
error
keyword or produced by some builtins, in particular for IO
(input/output).
error"msg" / generate custom error error["msg"] rx "[a-z" / attempt to compile regexp from string error["error parsing regexp: missing closing ]: `[a-z`"] read"missing-file.txt" / attempt to read a file into a string error[!["msg" "op" "path" "err" "open missing-file.txt: no such file or directory" "open" "missing-file.txt" "file does not exist"]] 1+"a" / invalid operation: panics with message and displays error location 'ERROR i+y : bad type "s" in y 1+"a" ^
Error values are false values, which can be useful in
conditionals. The type verb
@
returns
"e"
for errors and can be used to unambiguously confirm that a
value is an error. Also, it’s possible to use the
'
syntax for returning errors early, like we’ll see in a
scripting example
later. Note how error values are not limited to plain
strings and can be any kind of value, like a dictionary, as
the last example above illustrates. See the question about
error values
in the FAQ for a deeper understanding of how error values
work.
Functions are first-class citizens. User-defined functions can be created via lambda-like expressions and can, like all values, be assigned to variables:
f:{[name;ext]"${name}.$ext"} f["fname";"csv"] "fname.csv" f[;"csv"]"fname" / same with projection fixing second argument "fname.csv" g:{2+x} / same as {[x]2+x} but using default argument x g 3 5 g[3] / the same as g 3 or g@3 5 g:2+ / same with projection fixing left argument: same as {2+x} g 3 5
For convenience, if no formal arguments are specified
between square brackets,
x
,
y
and
z
can be used as implicit argument names. Also,
projection syntax can be used when deriving a new function
by fixing some arguments of another. Both features are very
useful for defining many short functions, often used inline
followed by an adverb.
(-2!)\10 / converge form of the scan operator with function left 10 5 2 1 0 f[;"csv"]'"fname1" "fname2" / apply projection for each name "fname1.csv" "fname2.csv"
Within functions, several statements can be separated with
semicolons or newlines, and early return can be obtained by
using a colon
:
before the value we want to return, typically at the
beginning of a conditional’s branch. Note that
depending on how it’s used, the colon
:
can have other uses, like assignment if it follows an
identifier, but there can never be any confusion.
For example, the following multi-statement function returns
a string formatting the minimum and maximum of a numeric
list, but returns early
"min=?; max=?"
if the list is empty.
minMax:{(#x)or:"min=?; max=?"; min:&/x; max:|/x; "min=$min; max=$max"} minMax 3 -2 7 5 "min=-2; max=7" minMax[!0] "min=?; max=?"
Note that
&/
and
|/
on empty numeric lists return respectively the largest and
smallest numbers (of integer type in this case). This is
usually a good behavior, but we went a fancier route above
for the sake of example.
It’s worth noting that user-defined functions with lambda
notation, as well as variables and array literals, are
grammatically nouns, unlike primitive operators that work by
default as verbs or adverbs. This means they are parsed as
nouns, so parens are never needed around them for
nominalization, but application sometimes needs to be
explicit, with square brackets or
@
,
when they might be parsed as the left argument of some
primitive verb or adverb instead.
{x<0}^1 -3 4 -2 / weed out negative values 1 4 (0>)^1 -3 4 -2 / same with projection (parens needed) 1 4 a:3 -4 5 -6 7 -8 9 b:0 0 1 0 0 0 1 a@&b / index/apply (@) array where (&) 1s 5 9 a[&b] / same with bracket indexing 5 9 a&b / min (array used as left argument of &) 0 -4 1 -6 0 -8 1
Word frequency analysis is a simple problem that highlights well some basic verbs. It’s also an opportunity to showcase a simple use of regexps, as well as basic IO.
We’ll use as text source the first novel of the I, Mor-Eldal free (as in freedom) fantasy trilogy, a copy of which is available here exported in markdown format.
The first step is reading the file into a string and storing it into a variable.
s:read"01-yo-mor-eldal-en.md" &s / number of bytes 570236
Now we’re going to split the string into words using a
regexp. A basic approach would be using a regexp like
rx/[A-Za-z-]+/
,
but this only works if there are no non-ASCII letters. A
somewhat more robust approach that will work for more
languages may instead use a regexp like
rx/[\p{L}-]+/
.
This makes use of a particular Unicode property that matches
all kinds of letters as understood by Unicode.
words:_rx/[\p{L}-]+/[s;-1]
This stores into a variable
words
all matches of the given regexp. The
-1
argument specifies the maximal number of
desired matches, and a negative number means any number of
matches. Note how the regular expression can be applied like
a function. Finally, the verb
_
lowercases all letters in the words, so that we can then
compare their frequency in a more realistic manner.
#words / number of words 103946 5#words / take first 5 words "i" "mor-eldal" "the" "necromancer" "thief"
We’ll now get into frequency computing. The monadic form of
the verb
%
is used to classify elements of an array.
It will return an array of integers that will attribute to
each distinct element a number, starting from zero. For
example:
%"a""b""a""c""b""b"
0 1 0 2 1 1
Then, we can use the monadic form of the verb
=
to perform index-counting, to know how many
times each class occurs, in other words, how many zeros,
ones, twos ... there are.
=0 1 0 2 1 1
2 3 1
This shows us that
"a"
(class
0
)
has 2 ocurrences,
"b"
(class
1
)
appeared 3 times, while
"c"
(class
2
)
had only one occurrence.
We are now ready to perform the same with our word data.
freq:=%words #freq / number of classes = number of distinct words 6131 5#freq / take first 5 elements 4976 51 5292 2 17
If we match this with the first 5 words, we now can say that
"i"
has 4976 occurrences, and
"the"
has 5292.
To get a decreasing list of matchings between words and frequencies, we can sort down a dictionary:
d:>(?words)!freq
The verb
?
used in monadic form returns a new list of words without
duplicates, preserving only the first occurrences of each
element. Then, the verb
>
sorts down the dictionary by its values, in this case the
frequencies. We can now query the 5 most used words:
5#d
!["the" "i" "and" "a" "to"
5292 4976 3628 2654 2521]
Visualizing the default presentation of a dictionary can be
hard if there are many keys and values. The following
utility function provides a basic solution by putting both
the keys and values in a same list, and flipping its columns
and rows using the monadic form of the verb
+
.
tbl:{+(!x;.x)} tbl 10#d ("the" 5292 "i" 4976 "and" 3628 "a" 2654 "to" 2521 "you" 1739 "he" 1582 "of" 1579 "it" 1434 "me" 1361)
A somewhat more involved exercise, which we’ll leave to the
reader, would be for example to study word frequency in
restricted text windows (using the windows
i^y
verb form on a list of lines, for example), and search for
unwanted repetitions that wouldn’t fit the style.
Handling CSV data of various kinds is something array languages are particularly well-suited for. In this section, we’ll parse simple daily climate data and process it to obtain a few daily summary results that will be included into a larger monthly summary.
Instead of proceeding in a REPL session as previously in this
tutorial, we’ll write a script file, suitable for being
called periodically to process a new day’s data. Because
there’s no echo showing intermediate results in such case,
you can use an output keyword, like
say
or
print
to output a string representation of a value to standard
output. Alternatively, you can use a logging
\
before a value, not following tightly a noun: that will
format and print the value on standard error (acting as
identity and doing nothing more).
Assume we have a set of files with daily climate data,
named following the year-month-day order convention, as in
20060102.csv
.
We provide example files for two dates:
20230512.csv
and
20230513.csv.
The first day starts like this:
2023-05-12T00:00 11.3 87 1014.5 1085.720
2023-05-12T00:01 11.3 87 1014.6 1085.720
2023-05-12T00:02 11.3 87 1014.5 1085.720
2023-05-12T00:03 11.3 87 1014.6 1085.720
2023-05-12T00:04 11.3 87 1014.6 1085.720
...
The next day looks like this:
2023-05-13T00:00 13.8 76 1015.7 1123.540
2023-05-13T00:01 13.9 76 1015.6 1123.540
2023-05-13T00:02 13.9 76 1015.6 1123.540
2023-05-13T00:03 13.9 76 1015.5 1123.540
2023-05-13T00:04 13.9 76 1015.6 1123.540
...
They have five columns: date (one record per minute), temperature (°C), relative humidity (%), air pressure (hPa), and accumulated precipitation (mm).
For temperature, relative humidity and air pressure, we want to get the mean, maximum and minimum values, as well as the first time at which maximum and minimum occur. Also, because there could be some missing entries or nonsensical erroneous values, we want to know the number of valid records of each type.
For precipitation, we want to know the day’s total precipitation, as well as some basic intensity data: the amount and time of the 1-hour window with most precipitation. We’ll have to take into account practical issues, like any missing entries or the possibility of reaching the maximum recordable precipitation by the collecting device we use (2500 in our case), at which point it would be reset to zero again.
Because we want to make a script, we’ll want to use the
array
ARGS
of the arguments passed to goal. The first argument would be
the name of the script, which we’ll call
climday.goal
,
and the second would be the date of the day in
20060102
format. We’ll first do some basic checking on arguments and
get the date:
(2=#ARGS)or:error"USAGE: goal clim.goal date
date should be in 20060102 format"
date:ARGS 1 / date from examples is "20230512" or "20230513"
In case of an incorrect number of arguments, we return with
:
a usage error produced with the monadic verb
error
.
When using
:
to return early from global code, the program will exit with
status
1
if the returned value is an error.
Also, note the usage of the syntax keyword
or
with short-circuiting behavior.
We then read the csv file into variables.
(dates;temp;rh;pres;prec):" "csv 'read"${date}.csv"
This first calls
read
on a file corresponding to the given date.
Note
'
just before
read
.
When not preceded by a noun or verb (without spaces),
'
does nothing if the result is not an error, but returns it
early otherwise (like
:
would). The latter could happen for example if the file
doesn’t exist or is not readable.
The dyadic verb
csv
parses the space-separated CSV text into a list of columns,
which we assign to various variables at once.
For convenience and easier reasoning later, we replace dates with their unix epoch value:
dates:time["unix";dates;"2006-01-02T15:04"]
This makes use of the verb
time
which is described in the help. Here, we ask for the
unix time of the
dates
column, using the layout string
"2006-01-02T15:04"
for parsing .
Other columns contain numeric strings, so we’ll parse them into numbers.
(temp;rh;pres;prec):"n"$(temp;rh;pres;prec) / parse into numbers
We will first treat the case of temperature, relative humidity and air pressure, as they can be handled in a similar way and without caring about missing values.
A helper formatting function for formatting the times corresponding to a maximum or minimum will come in handy:
fmtclock:time["15:04";]
We now write a
meanMaxMin
function that will take three parameters: a numeric data
column
c
,
a filter function
f
for discarding nonsensical values, and a
format string
fmt
for displaying the mean, maximum and minimum.
meanMaxMin:{[c;f;fmt] (
fmt${(+/x)%#x}fc:f^c / mean
fmt$c[i:*&c=|/fc] / max
fmtclock dates[i] / max-time
fmt$c[i:*&c=&/fc] / min
fmtclock dates[i] / min-time
$#fc / number of records
)}
We’ll now explain the interesting bits, in particular those using features we haven’t covered yet. In the line:
fmt${(+/x)%#x}fc:f^c / mean
The
c
parameter contains numerical data, like
temp
,
rh
or
pres
.
The filtering code
fc:f^c
removes from
c
the values for which the filter function
f
returns a true value, and it then stores the result into a
variable
fc
.
The filter could be for example
{(-20>x)|49<x}
to discard bogus temperatures that wouldn’t make any sense
(in the current location). Finally,
{(+/x)%#x}
computes the mean, and
fmt$
formats the result according to format
fmt
,
for example
"%.1f"
.
Next is the line computing the maximum.
fmt$c[i:*&c=|/fc] / max
The maximum value is obtained simply with
|/fc
,
but, for getting the time next, we want to know when it
happens.
We compute a boolean vector
c=|/fc
of positions where the maximum value appears in the original
c
.
The indices of positions with a
1
are obtained by calling “where”
&
on the result. Using “first”
*
on the list of indices returns the index of the first
occurrence of the maximum value in
c
.
We store that index in
i
and use it to get the time
dates[i]
.
The minimum is obtained in a similar way. Finally, the
number of records is just the number of elements that remain
after applying the filter, and we format it into a string
with
$
(in the default way for integers).
Handling precipitation is a bit more complicated, because we
have cumulated precipitations instead of minutely
precipitation. Also, when going over
2500
,
the accumulator overflows and goes to zero again. We’ll
therefore handle both things and convert our data into
minutely precipitation.
prec:{x-»x}prec-*prec:prec+{2500*+\x<»x}prec / minutely precipitation
First,
prec+{2500*+\x<»x}prec
cancels any resets at
2500
.
The
x<»x
part puts a 1 at the places where resets, if any, occur, by
comparing
x
with itself shifted right by one, with a
0
as filler left, using the right-shift verb
»
(which can also be spelled
rshift
).
The sum scan will transform the result such that each
element corresponds to the number of resets up to that
point, such that when multiplying by
2500
we obtain the amount that was discarded due to resets. The
{x-»x}prec-*prec
part transforms the obtained cumulative precipitation into
minutely precipitation. Total precipitation can now be
obtained and formatted easily with
"%.3f"$+/prec
As we said before, we also want to compute the 1-hour window
with most precipitation. This requires some further
processing of the precipitation data, filling any missing
records with
0
.
unix:time["unix";date;"20060102"]
mdates:unix+60*!1440 / minutes of the day
prec:.(mdates!1440#0),dates!prec / fill missing minutes with zeros
This creates an array
mdates
with all the dates corresponding to minutes in the day. It
then merges a template dictionary
mdates!1440#0
,
filled with zeros, with a dictionary corresponding to
recorded dates and precipitation data.
We can now compute the precipitation in all 60-minutes
windows of the day with
prec1h:+/60^prec
.
The time of the 1-hour window with maximum
precipitation can then be obtained with
mdates@*>prec1h
.
The
*>prec1h
call is an idiom that returns the index of the
first occurrence of the maximum value, obtained trough the
descending sorting permutation indices returned by
>
.
It’s a simpler way to write
*&prec1h=|/prec1h
.
We gather a record with all the desired daily results:
record:,//(
date:time["2006-01-02";unix]
/ mean, max, max-time, min, min-time, nrecords for temp, rh, pres
meanMaxMin[temp;{(-20>x)|49<x};"%.1f"]
meanMaxMin[rh;{(0>x)|100<x};"%.0f"]
meanMaxMin[pres;{(960>x)|1060<x};"%.1f"]
"%.3f"$+/prec / total precipitation
"%.3f"$|/prec1h:+/60^prec / max precipitation in 1 hour
fmtclock mdates@*>prec1h / time of 1-hour window with max
)
Note the
,//
at the beginning that transforms any nested list into a flat
list (applies join-over until convergence). For the first
example date, if we display the record with
say
,
we get:
2023-05-12 12.9 16.8 16:16 10.3 08:59 1435 79 89 10:23 63 16:08 1435 1013.9
1015.8 20:41 1012.5 04:52 1435 37.820 11.920 04:26
(wrapped for display purposes here, but it’s a single line)
All that is left is adding this record for the given date’s day to a monthly CSV file, updating it if there is already one.
month:time["200601";unix]
mdata:read["${month}.csv"]or"" / read data of the month
mcsv:{x[;0]=date}^+" "csv mdata / remove record for date if already present
mcsv,:,record / add our new record at the end
mcsv@:<mcsv[;0] / sort records again by date
'"${month}.csv"print" "csv+mcsv / write the file again
One new thing here is the monadic use of
+
applied to the result of the
csv
verb to flip columns ands rows, so that we get a list of
the CSV’s records, which in this case is more convenient
because we want to add/replace a particular record, not a
column.
Another novelty is the monadic form of
<
which returns a sorting permutation for its input, here the
date column of the monthly data. The assignment operation
@:
replaces old
mcsv
by indexing it on the sorting permutation. Finally, the
dyadic form of
print
allows to print the result to a specific file given as left
argument. Also, note how the verb
csv
is used both for parsing and serializing, depending on
whether the input is a string or a list of columns.
Running now the script for both rainy days produces this result in the summary monthly file:
2023-05-12 12.9 16.8 16:16 10.3 08:59 1435 79 89 10:23 63 16:08 1435 1013.9
1015.8 20:41 1012.5 04:52 1435 37.820 11.920 04:26
2023-05-13 14.2 17.4 14:48 12.5 10:03 1437 80 88 11:01 69 14:28 1437 1016.2
1017.2 10:38 1014.8 02:17 1437 62.160 35.740 08:32
(both lines wrapped for display purposes here)
Well, this example was a bit long! Some things could still be improved, like more robust and informative error handling in case of invalid dates or data that should normally not happen. Also, instead of directly replacing the monthly file at the end, it would be safer to write it first to another temporary file, to avoid corrupting the file in case of a power outage during the write (though it can be re-obtained by running the script for all days of the month). Still, this script does some actually useful things without much code, so I hope this example does highlight some of the strengths of array programming!
To finish, we reproduce the complete script below:
(2=#ARGS)or:error"USAGE: goal clim.goal date
date should be in 20060102 format"
date:ARGS 1 / date from examples is "20230512" or "20230513"
(dates;temp;rh;pres;prec):" "csv 'read"data/${date}.csv"
dates:time["unix";dates;"2006-01-02T15:04"]
(temp;rh;pres;prec):"n"$(temp;rh;pres;prec) / parse into numbers
fmtclock:time["15:04";]
meanMaxMin:{[c;f;fmt] (
fmt${(+/x)%#x}fc:f^c / mean
fmt$c[i:*&c=|/fc] / max
fmtclock dates[i] / max-time
fmt$c[i:*&c=&/fc] / min
fmtclock dates[i] / min-time
$#fc / number of records
)}
prec:{x-»x}prec-*prec:prec+{2500*+\x<»x}prec / minutal precipitation
unix:time["unix";date;"20060102"]
mdates:unix+60*!1440 / minutes of the day
prec:.(mdates!1440#0),dates!prec / fill missing minutes with zeros
record:,//(
date:time["2006-01-02";unix]
/ mean, max, max-time, min, min-time, nrecords for temp, rh, pres
meanMaxMin[temp;{(-20>x)|49<x};"%.1f"]
meanMaxMin[rh;{(0>x)|100<x};"%.0f"]
meanMaxMin[pres;{(960>x)|1060<x};"%.1f"]
"%.3f"$+/prec / total precipitation
"%.3f"$|/prec1h:+/60^prec / max precipitation in 1 hour
fmtclock mdates@*>prec1h / time of 1-hour window with max
)
month:time["200601";unix]
mdata:read["data/${month}.csv"]or"" / read data of the month
mcsv:{x[;0]=date}^+" "csv mdata / remove record for date if already present
mcsv,:,record / add our new record at the end
mcsv@:<mcsv[;0] / sort records again by date
'"data/${month}.csv"print" "csv+mcsv / write the file again
At this point, you should have a grasp of the spirit of the language. You probably want to check out the Help chapter and experiment with simple problems of your own. You then might want to follow with the FAQ, or jump directly into the Working with tables chapter.