chaource: (Default)
[personal profile] chaource
Part 1.
Part 2.
Part 3.

The join calculus has some further limitations, beyond those I already described:

  • It is impossible to define a reaction with a variable number of input reagents, say a reaction that starts when a(x) is present or when a(x) and b(y) are present. It is also impossible to define a reaction with duplicated reagents. We cannot have a reaction that starts when, say, exactly three reagents a(x), a(y), a(z) are present.

  • More generally, we cannot have reactions with a computed set of input reagents. For instance, if we wanted to define a reaction with 1000 input reagents, we cannot write the input reagents as "a_n for n = 0, ..., 999". We would have to write each of the reagents a000, a001, ..., a999 explicitly.

  • It is not possible to specify reactions that depend on the set of input reagents. A reaction starts when its input reagents, statically specified, are present; the chemical engine only checks that the reagents are present. We cannot specify any additional computations to determine the set of input reagents for a reaction.

    For example, if we wanted to have 1000 reagents named a000, a001, ..., a999, and a reaction that starts when any 100 of them are present, the chemical engine would have to perform a computation before deciding whether to start a reaction. By design, the join calculus prohibits any nontrivial computations before starting a reaction.

  • For the same reason, we cannot specify that a reaction starts only if some computed condition holds on the decoration values of the input reagents.

    So we cannot have a reaction a(x) & b(y) = ... that starts only when some computed condition p(x,y) holds on the values x, y.


Nevertheless, it seems that the join calculus can express essentially all concurrent computations. I would like to consider three examples now:

  1. Given an array of functions, produce an array of their result values. The functions should be evaluated asynchronously in parallel.
  2. Given an array of integers, compute their sum. All pairwise summations should be evaluated in parallel in arbitrary order, and the partial sums should be also added in parallel in arbitrary order.
  3. Sort an array of integers using the merge-sort algorithm, again doing as much as possible in parallel.


Now let us figure out the implementation of these examples in join calculus. We will be using the purely "chemical" approach to concurrent computation. We will never say the words "semaphore", "thread", "deadlock", "mutex", or "synchronize". Instead, we will talk about reagents and reactions. My goal is to see what kind of tricks and patterns emerge from this paradigm.

Example 1. Evaluate many functions in parallel and store results

Each evaluation of a function must be a separate reaction. The payload of that reaction can assign the element of the array with the result value. So we only need one reagent for this reaction. Let this reagent be called "a" and carry, as decoration values, the function, the array, and the array index.

let n = 10;
let tasks = (* array of functions, let's just compute "1" *)
Array.make n (fun () -> 1);;
let results = Array.make n 0;;
def a(f,arr,i) = arr.(i) <- f(); 0;;
for i = 0 to n-1 do spawn a(tasks.(i),results,i) done;;


This works; however, the "results" array is updated asynchronously, and the final array will be ready at an unknown time when all 100 reactions are finished. Certainly, we would like to know when this happens! We need a counter that will show how many computations remain, and start a new reaction when all computations are finished. This counter also needs a separate reaction for updating itself. How will we write the reaction for the counter?

As we found at the end of the previous tutorial, it is impossible to wait until no more "a" reagents are present. Therefore, the counter should explicitly know how many reactions were started.

To maintain the counter, we need to check how many "a" reactions have been completed. Let us produce an additional reagent, "ready", at the end of each "a" reaction, so that the counter could react with the "ready" reagents.

So we modify the reaction for "a" like this:

def a(f,arr,i) = arr.(i) <- f(); ready()
and ready() & remains(k) = if k>1 then remains(k-1) else all_done()
and all_done() = print_string "all done\n"; 0;;
spawn remains(n);;


The advantage of this approach is that the counting is performed asynchronously, independently of the "a" reaction. Thus, all "a" reactions can run in parallel, producing many "ready" reagents, while the "remains" reagent counts and consumes each of the "ready" reagents one by one.

This code works; here is the full test code for JoCaml. I added some printing so that we can watch the progress.

#let test n =
let tasks = Array.make n (fun () -> 1) in
let results = Array.make n 0 in
def a(f,arr,i) = arr.(i) <- f(); print_int i; print_newline(); ready()
and ready() & remains(k) = if k>1 then remains(k-1) else all_done()
and all_done() = print_string "all done\n"; 0 in
spawn remains(n);
for i = 0 to n-1 do spawn a(tasks.(i),results,i) done
in
test 10;;
0
2
1
3
5
6
4
7
9
8
all done
- : unit = ()


As we can see, the tasks were processed concurrently in somewhat arbitrary order.

Example 2. Compute the sum of an array of integers concurrently

Suppose we have an array, say, {10,20,30,40,50}. We need to add some pairs, then add partial sums, etc., until all is added together. A first idea would be to inject reagents a(10), a(20), a(30), a(40), a(50) and organize reactions such that all numbers are eventually added together. It remains to define the necessary reactions.

A reaction such as a(x) & a(y) = a(x+y) will not work because reagents cannot be duplicated. How can we put two "a" reagents together in a reaction? Obviously, we need other reagents here. If we define the reaction "a(x) & b(y) = b(x+y)", then we merely convert a's to b's, and the reactions will stop. We will not have succeeded in bringing a pair of "a" reagents together. So we need two further reagents and two reactions:

def a(x) & b() = c(x) or a(x) & c(y) = a(x+y)

Note that we use the "or" keyword; this is required in JoCaml for reactions that use the same reagents.

This has essentially the same effect as the disallowed reaction a(x) & a(y) = a(x+y), as long as we initially inject exactly as many "b" reagents as "a" reagents. The number of "b" reagents always remains equal to the number of "a" reagents.

So we find that the restriction to non-repeating reagents is not really a limitation of the computing power of the join calculus. We can always define additional reagents and reactions to achieve the same effect as in a reaction with repeating reagents.

Having defined these reactions, we will have achieved that eventually all "a(x)" reagents will come together and produce a single "a(z)" reagent, whose value z will be equal to the sum of all previously given values. However, we again face the same problem as in the previous example: namely, we will not know when this happens. There is no way to check that only one reagent "a(x)" is present in the soup.

It seems that we again need some kind of "counter" that keeps track of how much remains to be done. What kind of reaction will this counter have?

The only place that we perform the summation of a pair is the reaction a(x) & c(y) = a(x+y). Therefore, we need to count how many such reactions have been finished, so that the last "a(x+y)" reagent knows that it is the last one.

If we use the same technique as in the previous example, we will force the summation to proceed serially: the "counter" consumes each "ready" reagent one by one. So here we need a different tecnique.

It seems that we need to put the progress information into the "a" reagents. For instance, the "a" reagent can carry the number of sums already performed as a second value: "a(x,k)" represents a partial sum of "k" elements, and the value of the partial sum is "x". Initially we will inject "a(x,1)" for each x from the given array. Let us see how to arrange the necessary reactions.

At the end of the calculation, we will be left with a single "a(z,n)" reagent whose value "z" will be our resulting sum, while "n" will be equal to the total number of elements. There will be also a single "b" reagent (since the numbers of "a" and "b" are always equal). Therefore, the last reaction "a & b = c" will happen at the end of the calculation. It is this reaction that we can use to generate the final "all_done" reagent. Let us put the value "n" on each of the "b" reagents from the beginning; these values will be used to check that "a(z,n)" is indeed the last reagent. Our reactions therefore look like this:
def a(x,k) & b(n) = if k=n then all_done() else c(x,k)
or a(x,k) & c(x',k') = a(x+x',k+k');;

In order to start the reactions, we need to inject "n" copies of "b(n)" as well as "n" reagents "a(x,1)".

Here is some test code for JoCaml:
# let test n =
let inputs = Array.make n 0 in
let () = for i=0 to n-1 do inputs.(i) <- 10*(i+1) done in
def all_done(z) = Printf.printf "all done, result=%d\n" z; 0 in
def a(x,k) & b(n) = if k=n then all_done(x) else c(x,k)
or a(x,k) & c(x',k') = a(x+x',k+k') in
for i=0 to n-1 do spawn a(inputs.(i), 1) & b(n) done;;
val test : int -> unit =
# test 10;;
- : unit = ()
all done, result=550
#


Example 3. Sort an array of integers using the parallel merge-sort algorithm

Suppose our array is [4,2,6,1,5,3]. We would like to begin by making sorted pairs, making three arrays [2,4], [1,6], [3,5]. Then we merge the partial arrays: [1,2,4,6], [3,5]. Finally, the lonely array [3,5] needs to be merged in.

In the previous examples, we could have performed concurrent computations in any order. The situation is different now: we do not want to merge arrays in an arbitrary order; this would increase the asymptotic complexity of the algorithm, since we would be merging arrays of unequal size very frequently. We need to organize the computation so that, e.g., an array of four is merged only with another array of four, unless no other such arrays exist or could be created.

Let us try to think how to organize the computation. Merging of two arrays definitely needs to be a single reaction, and the resulting sorted array needs to be carried by a reagent, say "sorted[1,2,4,6]". However, if each reagent carries one array, and if we inject reagents for each partial array, say "sorted[2,4]", "sorted[1,6]", "sorted[3,5]", we cannot prevent reactions from happening between any two sorted arrays of any size. This method will not implement the sorting algorithm correctly.

What we need is to organize the reactions hierarchically. The merging of two arrays must be performed only when the two arrays are at the same level in the hierarchy.

Our main problem here is that the chemical machine does not do any computations in order to decide which reactions to start. It starts reactions whenever reagents are available. It is impossible to specify that the reagent "a(x)" starts a reaction only if its value "x" satisfies some condition. Therefore, we need to process this information ourselves.

We could explicitly store the information about which parts have been merged -- say, a list of some kind, stored on a special "dispatcher" reagent. This means we would write a complicated "dispatcher" code that will start all other reactions.

Alternatively, we could define separate reactions for merging of partial arrays of different size. Let us investigate the latter possibility.

What we would ideally have is that the smallest arrays are merged pairwise in parallel; then the next-level arrays are merged; and so on. Each merging needs a different reagent. Suppose that the smallest arrays are carried by the reagents "a". Whenever two "a"'s merge, say "a[4]" with "a[2]", we would get a reagent "b[2,4]" with a sorted partial array. Whenever two "b"s merge, say "b[2,4]" and "b[1,6]", we get a "c[1,2,4,6]" reagent, and so on. So we would need to define as many different reagents as we have levels in the binary tree that represents the sorting scheme. If the initial array has length 1024, we would need at least ten different reagents ("a", "b", "c", ..., "j"). The problem is that the required set of reagents would have to be computed at run time. However, the join calculus does not allow us to define a computed set of new reagents. All the reagents and their reactions need to be statically defined!

It turns out we can overcome this limitation elegantly (i.e. without writing a lot of complicated logic) by using the basic tools of functional programming: locally defined reagents and recursion. The basic scheme of "merge-sort" is that we split the initial array in two roughly equal parts, sort each part recursively, and merge the two parts. The splitting is performed in a certain reagent, and this reagent needs to be recursive, since it delegates sorting to itself. The result of sorting each part is a sorted list in a reagent that we call "sorted". However, the "sorted" reagents need to only merge with "sorted" reagents at the same level of the hierarchy. We need to prevent the "sorted" reagents from merging with any other sorted results.

One way to achieve this is to define the "sorted" reagents locally within the splitting reagent. Then the "sorted" reagents will be lexically confined to the body of the splitting reagent; no other reagents will be able to enter any reactions with these locally defined "sorted" reagents. This will be the basic idea of our solution.

What we would like to do, therefore, is to implement the splitting as a recursive definition of the reagent "mergesort". This reagent carries the initial, unsorted array, e.g. "mergesort[5,1,3,4,2]", and immediately starts a reaction, "mergesort(arr) = ...". When this reaction is finished, it should inject the reagent "sorted[1,2,3,4,5]", with the resulting sorted array as the value.

The reaction "mergesort(arr) = ..." should work like this: If the array has length 1, it is already sorted, and we can emit the reagent "sorted(arr)" right away. Otherwise we split the array "arr" in two parts and call "mergesort" on each part. These will eventually produce two "sorted" reagents, which we will then merge.

So our initial idea of the code is this:

def mergesort(arr) =
if ( arr has length 1) then sorted(arr)
else
def sorted(x) & sorted(y) = sorted(array_merge x y)
(* this is not legal in join calculus, but this is what we really want *)
in
let (arr1, arr2) = array_split arr in
mergesort(arr1) & mergesort (arr2) (* recursively sort two halves *)
(* results will be sorted(arr1') and sorted(arr2'),
which will be merged recursively *)


But this code will not work, for two reasons. First, we are not allowed to have a reaction with two identical reagents. Well, this problem is minor, and we already know how to fix it: Any reaction of the form
a(x) & a(y) = something(x,y)
can be always replaced by two "or"-coupled reactions with two new reagents,
def a(x) & b() = c(x) or a(x) & c(y) = something(x,y)

The second problem is deeper: if we define the "sorted" reagent locally inside "mergesort", then "sorted" will be undefined outside "mergesort". This is a problem since we want to use "mergesoft" recursively. We expect that "mergesort(...)" yields a "sorted(...)" at the end, so that all the "sorted" reagents can be merged. If "sorted" is undefined outside "mergesort", we cannot use it to get the results of the reactions starting from "mergesort(arr1)" and "mergesort(arr2)". These reactions will produce reagents that are defined locally inside them, and invisible to the scope where these reactions were started. Then our merging reaction "sorted & sorted = sorted" will never start, since it is defined for our local "sorted" reagent, not for the (two different) "sorted" reagents that are locally defined inside the recursive calls to "mergesort(arr1)" and "mergesort(arr2)".

So we must define the "sorted" reagent outside "mergesort". However, if all "mergesort" reactions yield the same "sorted" reagent, the result of every "mergesort" will be free to combine arbitrarily with other results, and we lose the hierarchical organization of the merging reactions. Somehow we still need to have different "sorted" agents for every level of "mergesort".

The solution is to define two different "sorted" reagents, one outside and one inside the "mergesoft". The external "sorted" reagent (to be named differently, say "sorted_res") will be passed to the "mergesoft" reagent as a parameter, i.e. as a reagent constructor. The end result of the "mergesoft" reaction should be one external "sorted_res" reagent. The local "sorted" reagent defined inside "mergesort" will be used to sort the recursive results. This local "sorted" reagent will be also passed to the two recursively called "mergesort" reagents. In this way, these "mergesort" reagents will each produce a locally defined "sorted" reagent, which cannot react with any external "sorted" reagents (or with the internal recursively defined "sorted" reagents). The "sorted" reagents will be able to react only with the ones defined within the same scope. This automatically enforces the hierarchy of reactions that we need.

Here is the complete test code for JoCaml:
let array_split arr =
let n = Array.length arr in
if n<2 then raise (Invalid_argument "arraysplit") else
let i = n/2 in
(Array.sub arr 0 i, Array.sub arr i (n-i))
;;

let array_merge arr1 arr2 is_less =
let arr = Array.append arr1 arr2 in
let rec merge i1 i2 i =
( if i1 = Array.length arr1 && i2 = Array.length arr2
then arr (* all done *)
else ( (* the first element is chosen only under this condition *)
let (x,new_i1,new_i2) = if i1 < Array.length arr1 && ( i2 = Array.length arr2 || is_less arr1.(i1) arr2.(i2) )
then (arr1.(i1), i1+1, i2) else (arr2.(i2), i1, i2+1)
in
(arr.(i) <- x; merge new_i1 new_i2 (i+1))
)
)
in merge 0 0 0
;;

let string_of_array arr string_of_elt separator =
let rec arr2str i result = if i = Array.length arr then result
else arr2str (i+1) (result ^ (if result = "" then "" else separator) ^ string_of_elt arr.(i))
in
arr2str 0 ""
;;

# def mergesort(arr, sorted_res) =
if Array.length arr <= 1 then sorted_res(arr) else
let (part1, part2) = array_split arr in
def sorted(x) & a() = b(x) or sorted(x) & b(y) = sorted_res(array_merge x y (<)) in
mergesort(part1, sorted) & mergesort(part2, sorted) & a()
in (* mergesort is now defined; now we set up the first call to it *)
def print_result (arr) = Printf.printf "finished: [%s]" (string_of_array arr string_of_int ";"); 0
in (* now we call mergesort with an externally provided "sorted_res" argument equal to "print_result" *)
spawn mergesort([| 3; 2; 5; 1; 4 |] , print_result);;
- : unit = ()
#
finished: [1;2;3;4;5]


Mixing global and local reagents

In the example just given, we defined the "mergesort" reaction recursively, and we defined new reactions locally in the body of the "mergesort" reaction. These local reactions used new (locally defined) reagents "sorted", "a", "b" as well as the previously defined reagent "sorted_res". Can we freely mix new, locally defined reagents with global or previously defined reagents? Experimentation shows that this does not always work:

# def a() & b() = print_string "reaction a & b\n"; reply true to b & a();;
val a : unit Join.chan =
val b : unit -> bool =
# def a() & c() = print_string "reaction a & c\n"; reply true to c & a();;
val a : unit Join.chan =
val c : unit -> bool =
# spawn a();;
- : unit = ()
# c();;
reaction a & c
- : bool = true
# (* now try the reaction a&b *)
b();; (* this blocks forever *)


The second "def" statement introduces a new, locally defined "a" reagent, shadowing the previous "a" and the "a&b" reaction completely. When we say "spawn a()", we inject the new "a" reagent, not the old "a" with which "b" can react.

A local "def" statement cannot add a new reaction to a previously defined reagent! When a reagent, such as "a", is being defined, the chemical machine needs to know at once all the reactions in which "a" appears as an input reagent. Suppose we have reagents "a" and "b" in the soup, and a reaction "a() & b() = ..." is defined. Then, it is not possible to add more reactions starting from "a" in a local expression; this (in my understanding) contradicts the semantics of the join calculus. Here is what would happen if we allowed such definitions, for example, an additional reaction "a() = ..." in a local expression. Suppose that the (fictitious) keyword "def_more" adds reactions to an already defined reagent.

def a() & c() = 0
or b() & c() = d();;
def_more a() = b() in spawn c();;
spawn a();;


We have set up the reactions so that "a", "b", "c", and "d" are globally defined. The reaction "a = b" is (supposedly) defined only locally within the expression "spawn c()". Upon evaluating that expression, however, no reaction has started yet, since the expression "spawn c()" does not block, it returns right away, injecting "c" into the soup. We also intend that the reaction "a=b" should be inactive outside the expression "spawn c()". So the reaction "a=b" seems to be permanently invisible! It seems that the reagent "a" had an extra reaction defined in a local expression, but this reaction must remain undefined (and inactive) unless this local expression is "active". However, it is meaningless to say "an expression is active"; an expression is not a process but a computed value. So this kind of code does not actually define any extra reactions at all. The effect must be the same as if we never defined the reaction "a=b".

To summarize: Each "def" statement introduces new and locally defined reagents; any reagents appearing on the left-hand side of a locally defined reaction are new and will shadow previously defined reagents with the same name. (Previously defined reagents can be used only on the right-hand side of a local "def", i.e. as output reagents of a new reaction.) The chemical machine can define local reagents, but not additional "locally defined" reactions with any previously defined input reagents. There is no such concept as a "locally defined reaction". Reactions defined for local reagents are just as permanently and globally defined as all other reactions. All reactions are defined statically; this means that any reactions defined anywhere in the code are always active in the chemical machine, even though the local reagents may be invisible outside some expressions.

Conclusion

The chemical machine (the join calculus) does not support any computations on the set of reagents, and yet we can impose any recursive (inductively defined) structure on the reagents. We can organize reagents in a list or in a tree, for instance. This is possible because we are allowed to define new reagents at run time in local scopes, as long as we define them statically.

In the example with "merge-sort", we effectively imposed a tree hierarchy on reagents by using local definitions and recursion. To the chemical machine, all the "sorted" reagents defined by various recursive invocations of "mergesort" are different reagents, because they are defined in different local scopes. At the same time they are all statically defined. The system can thus guarantee consistency of reactions.

For example, if we were allowed to define an "array" of 100 reagents and a function that injects reagent number n, and later try to inject a reagent number 101, we would get an error such as "array index out of bounds". These errors are simply not possible when all reagents are defined statically. The join calculus does not allow us to inject a reagent that somehow becomes undefined later.

So far I have not seen any real limitations of the join calculus. It has very few features and yet can express many different ways of organizing concurrent computations. Is this what they call a "highly expressive" calculus?

Profile

chaource: (Default)
chaource

February 2026

S M T W T F S
123 45 67
8 910 11121314
151617181920 21
22232425262728

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 24th, 2026 06:26 pm
Powered by Dreamwidth Studios