# Introduction Since the introduction of the Fortran 2008 standard, Fortran is a parallel language. Unlike the parallel extensions [OpenMP](https://www.openmp.org/) or [OpenACC](https://www.openacc.org/), the coarray parallelism Coarrays is built into the language core, so there are fewer problems with interaction between different standards and different standards bodies. This tutorial aims to introduce Fortran coarrays to the general user. A general familiarity with modern Fortran is assumed. People who are not familar with Fortran, but are familiar with other imperative languages like C might need to refer to other sources such as the [FortranWiki](http://fortranwiki.org/fortran/show/HomePage) to check what individual language constructs mean. ## What is the idea behind coarrays? Coarrays follow the idea of a [Partitioned global address space](https://en.wikipedia.org/wiki/Partitioned_global_address_space) or PGAS. In PGAS, there are several images executing. Each image has its own local memory. It is, howewer, possible to access the memory of other images via special constructs. This is more loosely coupled than the thread model, where threads share variables unless explicitly directed otherwise. Using PGAS means that coarray Fortran can be used on a massively parallel computing system as well as a shared-memory implementation on a single, multi-CPU computer. ## A remark on compiling and running the example programs If you want to try out the example programs, you need to have a coarray-capable compiler and know how to compile and run the programs. Setting the number of images is done in a compiler-dependent manner, usually via a compiler option, an environment variable, or, if the system is MPI-based, as an argument to `mpirun`. # Images and synchronization One central concept of coarray Fortran is that of an image. When a program is run, it starts multiple copies (or, possibly, one copy) of itself. Each image runs in parallel until completion, and works independently of other images unless the programmer specifically asks for synchronization. ## A first example Here is a Coarray variant of the classic "Hello world" program: ``` program main implicit none write (*,*) "Hello from image", this_image(), "of", num_images() end program main ``` This program will output something like ``` Hello from image 2 of 4 Hello from image 4 of 4 Hello from image 3 of 4 Hello from image 1 of 4 ``` depending on how many images you run and shows the use of two important functions: The number of images that is run can be found with the `num_images()` function and the current image via `this_image()`. Both of these are functions that are built into the language (so-called intrinsic functions). ## Basic Synchronization Usually, some kind of ordering has to be imposed on the images to do anything useful. This can be done with the `SYNC ALL` statement, which partitions the programs into what the Fortran standard calls segments. Anything before one `SYNC ALL` statement will get executed before anything after the `SYNC ALL` statement. Here is an example program, where each image prints both a Hello and a Goodbye message. Assume you want to make sure that each Goodbye message is printed before each Hello message, then this is *not* the way to do it: ``` program main implicit none write (*,*) "Hello from image", this_image(), "of", num_images() write (*,*) "Goodbye from image", this_image(), "of", num_images() end program main ``` The output will look something like ``` Hello from image 4 of 4 Goodbye from image 4 of 4 Hello from image 3 of 4 Goodbye from image 3 of 4 Hello from image 1 of 4 Hello from image 2 of 4 Goodbye from image 1 of 4 Goodbye from image 2 of 4 ``` What you can do instead to put things into order is to insert `SYNC ALL` between the two `write` statements, like this: ``` program main implicit none write (*,*) "Hello from image", this_image(), "of", num_images() sync all write (*,*) "Goodbye from image", this_image(), "of", num_images() end program main ``` which will get the intended result: ``` Hello from image 2 of 4 Hello from image 4 of 4 Hello from image 3 of 4 Hello from image 1 of 4 Goodbye from image 1 of 4 Goodbye from image 2 of 4 Goodbye from image 4 of 4 Goodbye from image 3 of 4 ``` The `SYNC ALL` statements do not have to be in the same place in the program. For example, this program will print the "Hello" message from image 1 later than all the others: ``` program main implicit none if (this_image() == 1) sync all write (*,*) "Hello from image", this_image() if (this_image() /= 1) sync all end program ``` Output is (for example) ``` Hello from image 2 Hello from image 4 Hello from image 3 Hello from image 1 ``` # Coarrays In order to be really useful, the images need a way to exchange data with other images. This can be done with coarrays. A coarray is just a normal variable, of any type, which can be either a scalar or an array. Like for any other variable, there is one instance for each image. The variable itself can be a scalar or an array. A coarray has one important property: It is possible to access data on another image, both for reading and writing, using normal Fortran syntax. Let us see how this works. ## Syntax of simple coarrays Coarrays are declared either by using the `codimension` attribute or by using square brackets in addition to normal brackets. The final codimension is unknown at compile-time (and can usually be selected at run-time). This is expressed by using a `*` as the codimension. The following declaration declares an integer coarray: ``` integer :: a[*] ``` as does this line: ``` integer, codimension[*] :: a ``` It is a matter of taste and line length which variant is used. Accessing this coarray is done by putting the coindex in square brackets. For the simple case above, this is equal to the value of `this_image()`. So, this statement prints the value of a on image 5: ``` integer :: a[*] print *,a[5] ``` and this sets the value of a on image 3 to 42: ``` integer :: a[*] a[3] = 42 ``` or you can even use I/O to set the value: ``` integer :: a[*] read (*,*) a[3] ``` Of course, when these code fragments are run, the referenced image has to exist. ## Simple use of coarrays As previously mentioned, the images run independently unless otherwise directed. The most important rule is that changes to coarrays only get propagated to other images via synchronization. So, for example, this fragment will *not* work as maybe expected: ``` if (this_image() == 3) then a[2] = 42 end if print *,a[2] ``` but this will: ``` if (this_image() == 3) then a[2] = 42 end if sync all print *,a[2] ``` You could access the variable `a` declared as above on its own image by using `a[this_image(a)]`. While correct, there is a shortcut; you can simply use `a` in that case. So, here is a small example where image number 1 sums up the image numbers, plus the expected value. This uses a rather common idiom, where all images do work, while only one of them does I/O. ``` program main implicit none integer :: me[*] integer :: i, s, n me = this_image() sync all ! Do not forget this. if (this_image() == 1) then s = 0 n = num_images() do i=1, n s = s + me[i] end do write (*,'(*(A,I0))') "Number of images: ", n, " sum: ", s, & " expected: ", n*(n+1)/2 end if end program main ``` With four images, this gives the result ``` Number of images: 4 sum: 10 expected: 10 ``` Here is another example: A program where each image writes "Hello from" and its own image number into a character coarray of the image with `image_number()` one higher, or to 1 for the last image number. Each image then prints out the greeting it received from the other image. Here is the program: ``` program main implicit none character (len=30) :: greetings[*] integer :: me, n, you me = this_image() n = num_images() if (me /= n) then you = me + 1 else you = 1 end if write (unit=greetings[you],fmt='(A,I0,A,I0)') & "Greetings from ", me, " to ", you sync all write (*,'(A)') trim(greetings) end program main ``` and here its output with four images: ``` Greetings from 3 to 4 Greetings from 1 to 2 Greetings from 2 to 3 Greetings from 4 to 1 ``` ## Coarrays as arrays All examples so far have used coarrays which were scalars, but they can be arrays, as well. A somewhat contrived example: ``` program main implicit none real, dimension(10) :: a[*] integer :: i call random_number(a) a = a**2 sync all if (this_image () == num_images()) then do i=1,num_images()-1 a = a + a(:)[i] end do print '(*(F8.5))',a end if end program main ``` which will print the sum of the squares of 10 random numbers for each image, something which could look like ``` 2.14682 2.70696 2.50518 3.09663 2.81545 1.88543 4.53160 2.67531 2.29398 2.96503 ``` You will need the array reference `(:)` before the coarray reference `[i]`, and you can use the full power of the array indexing that Fortran provides. ## Lower cobounds not equal to one If you feel like it, you can also set the lower bound of a coarray to some other value. If you are a fan of C and like zero lower bounds, the following is valid: ``` integer :: a[0:*] ``` or if you are a fan of Douglas Adams, you can use ``` integer :: a[42:*] ``` Actually, declaring a coarray a `a[*]` is only a shortcut for declaring the coarray as `a[1:*]` with a lower cobound of 1. There is a subtlety to the use of `this_image()`: Without any arguments, it gives you the image number. When it has a coarray argument, it will give you the argument that you need to access the coarray on the current image. For example, in this program ``` program main integer :: a[42:*] print *, this_image(), this_image(a) end program main ``` you will need a coindex of 42 to access the coarray on the first image, and the program will print ``` 4 45 2 43 1 42 3 44 ``` ## An example program A classic example is the estimation of pi/4 by Monte Carlo simulation. This program sets up the field n strips along the x-axis, then distributes points randomly and checks if they are inside or outside the unit circle. ``` program main implicit none integer, parameter :: blocks_per_image = 2**16 integer, parameter :: block_size = 2**10 real, dimension(block_size) :: x, y integer :: in_circle[*] integer :: i, n_circle, n_total real :: step, xfrom n_total = blocks_per_image * block_size * num_images() step = 1./real(num_images()) xfrom = (this_image() - 1) * step in_circle = 0 do i=1, blocks_per_image call random_number(x) call random_number(y) in_circle = in_circle + count((xfrom + step * x)** 2 + y**2 < 1.) end do sync all if (this_image() == 1) then n_circle = in_circle do i=2, num_images() n_circle = n_circle + in_circle[i] end do print *,"pi/4 is approximately", real(n_circle)/real(n_total), "exact", atan(1.) end if end program main ``` ## Multi-dimensional coarrays It is also possible to have coarrays with more than one codimension. This can be useful, for example, when using a computational grid. The way to declare such a coarray is, for example, ``` real :: a[2,*] ``` The asterisk is always the last codimension that needs to be specified. If you have four images running, this declaration will give you `a[1,1]`, `a[2,1]`, `a[1,2]` and `a[2,2]`. For coarrays with multiple codimension, `this_image()` will give you all the indices for accessint the current image, like this: ``` program main integer :: a[2,2:*] print *, this_image(), this_image(a) end program main ``` What happens if the number of images is not divisible by two in the above example? The answer is complex, and it is best to avoid this case for now. ## Allocatable coarrays It is generally not considered enough to set the size of a problem during compile-time. Therefore, Fortran introduced allocatable arrays, where the bounds can be set at run-time. This has also ben extended to allocatable coarrays. This is especially useful if the coarrays hold a large amount of data. An allocatable coarray can be declared with the syntax ``` real, dimension(:), codimension(:), allocatable :: a ``` (note the colons in the declarations) and allocated with ``` allocate (a(n)[*]) ``` Like a regular allocatable variable, it will be deallocated automatically when going out of scope. `SOURCE` and `MOLD` can also be specified. One important thing to notice is that coarray sizes have to agree on all images, otherwise unpredictable things will happen; at best, there will be an error message. If you want to, you can adjust the bounds. This, for example, would be legal: ``` from = (this_image() - 1) * n + 1 to = this_image () * n allocate (a(from:to)[*]) ``` and give you an index running from `1` to `num_images * n`, but you would still have to specify the correct coindices. `ALLOCATE` and `DEALLOCATE` also do implicit synchronization, so you can use the allocated coarrays directly, no need to specifcy any `SYNC` variant. # More advanced synchronization `SYNC ALL` is not everything that may be needed for synchronization, Fortran allows for more fine-grained control. ## `SYNC IMAGES` Suppose not every image needs to communicate with every other image, but only with a specific set. It is possible to use `SYNC IMAGES` for this purpose. `SYNC IMAGES` takes as argument an image, or a list of the images with which it should synchronize, for example ``` if (this_image () == 2) sync_images ([1,3]) ``` This will hold execution of image number two until a corresponding `SYNC IMAGES` statement has been executed on images 1 and 3: ``` if (this_image () == 1) sync_images (2) if (this_image () == 3) sync_images (2) ``` The following example uses `SYNC IMAGES` for a pairwise exchange of greetings between different images: ``` program main implicit none character (len=30) :: greetings[*] integer :: me, n, you me = this_image() n = num_images() if (mod(n,2) == 1 .and. me == n) then greetings = "Hello, myself" else you = me + 2 * modulo(me,2) - 1 write (unit=greetings[you],fmt='(A,I0,A,I0)') & "Greetings from ", me, " to ", you sync images (you) end if write (*,'(A)') trim(greetings) end program main ``` Here is an idiom to have image 1 prepare something and have all images wait on image 1, plus have image 1 wait on all other images: ``` program main implicit none if (this_image() == 1) then write (*,'(A)') "Preparing things on image 1" sync images(*) else sync images(1) end if write (*,'(A,I0)') "Using prepared things on image ", this_image() end program ``` Two images can issue `SYNC IMAGES` commands to each other multiple times. Execution will only continue if the numbers match. A slightly more complex example. Assume you want to write "Hello, world" from each image in reverse sequence (because you can). Here is a program to do this: ``` program main implicit none integer :: me me = this_image() if (me < num_images()) sync images(me + 1) print *,"Hello, world from", this_image() if (me > 1) sync images (me - 1) end program main ``` Let's look at what happens with this program: All images but the one with the highest number wait until the image with one number higher has synchronized with them, so they get stuck (temporarily) in the first `SYNC IMAGES` statement. The image with the highest number does not execute that, but runs straight through to the print statement and synchronizes with the one below, which then runs executes the print statement, which... until `me = 1`. Output could look like ``` Hello, world from 4 Hello, world from 3 Hello, world from 2 Hello, world from 1 ``` ## `CRITICAL` and `END CRITICAL` Sometimes, it is desirable to protect some resource from interference from other images. This can be done via the `CRITICAL` and `END CRITICAL` statements. The syntax is simple: ``` CRITICAL ! Only one image may execute this part at a time END CRITICAL ``` ## LOCK and UNLOCK Whie ```CRITICAL``` allows for some protection, pepole might want something more fine-grained. For this, there is the `LOCK_TYPE` from `ISO_FORTRAN_ENV`. The `LOCK` and `UNLOCK` statements allow one to manipulate such a lock. To be useful, this variable has to be a coarray. An example: Let us assume we want to calculate the factorial of the number of images in a parallel way. One possibility would be ``` program main use, intrinsic :: iso_fortran_env, only: lock_type implicit none type(lock_type), codimension[*] :: lck integer, codimension[*] :: i if (this_image() == 1) i = 1 sync all lock (lck[1]) i[1] = i[1] * this_image() unlock (lck[1]) if (this_image() == 1) print *,i end program main ``` For four images, this will dutifully print `24`. # Collective subroutines Data transfer between images can be repetetive to write. For example, setting a value on all images would require an explicit DO loop over all images, plus explicit synchronization. To facilitate this, the Fortran 2018 standard introduced the collective subroutines. Using these subroutines, you can transfer data between images using normal (i.e. non-coarray) variables. ## Setting a value on all images - `CO_BROADCAST` You use the subroutine `CO_BROADCAST` to set the value of variables on all images from one particular image. This variable can be an array or a scalar. Here is an example: ``` program main integer, dimension(3) :: a if (this_image () == 1) then a = [2,3,5] end if call co_broadcast (a, 1) write (*,*) 'Image', this_image(), "a =", a end program main ``` The call to co_broadcast works as if the value of `a` is been assigned to the value of `a` on image 1. `a` is *not* a coarray (no square brackets), and no explicit synchronization is needed. The compiler does that for you. The example output is ``` Image 2 a = 2 3 5 Image 4 a = 2 3 5 Image 3 a = 2 3 5 Image 1 a = 2 3 5 ``` ## Common reductions - sum, maximum, minimum You often want to know the sum, maximum, minimum or product of something that is calculated on each image. This is common enough so that three is a subroutine for each of these tasks: `CO_SUM`, `CO_MAX`, `CO_MIN`, respectively. You can apply these subroutines to scalars or arrays. These subroutines take as argument the variable to be reduced, plus an optional argument `RESULT_IMAGE` where the result should be stored. If you supply that image number, then the result is only stored on the corresponding image, and the variables on all other variables become undefined. If you do not supply `RESULT_IMAGE`, the result is stored on every variable. Here is an example without using `RESULT_IMAGE`: ``` program main integer :: a a = this_image() call co_sum(a) write (*,*) this_image(), a end ``` with the output ``` 2 10 4 10 3 10 1 10 ``` And here is a variant which used `RESULT_IMAGE` to assign the value to image 1 only: ``` program main implicit none integer :: me, n me = this_image () n = num_images() call co_sum (me, result_image = 1) if (this_image() == 1) then write (*,'(*(A,I0))') "Number of images: ", n, " sum: ", me, & " expected: ", n*(n+1)/2 end if end program main ``` with the output ``` Number of images: 4 sum: 10 expected: 10 ``` Here is another example which calculates the sum, minimum and maximum of a value which is calculated for each image. The program prints out the values for each image, then the minimum, maximum and sum of each element. ``` program main implicit none integer, parameter :: n = 3 integer :: i real, dimension(n) :: val real, dimension(n) :: val_min, val_max, val_sum val = [(cos(0.2*i*this_image()),i=1,n)] write (*,'(I4," ",3F12.5)') this_image(), val val_min = val call co_min (val_min, result_image = 1) val_max = val call co_max (val_max, result_image = 1) val_sum = val call co_sum (val_sum, result_image = 1) if (this_image() == 1) then write (*,'(A,3F12.5)') "Min: ", val_min, "Max: ", val_max, & "Sum: ", val_sum end if end program main ``` The output is, for four images ``` 4 0.69671 -0.02920 -0.73739 2 0.92106 0.69671 0.36236 1 0.98007 0.92106 0.82534 3 0.82534 0.36236 -0.22720 Min: 0.69671 -0.02920 -0.73739 Max: 0.98007 0.92106 0.82534 Sum: 3.42317 1.95093 0.22310 ``` ## Generalized reduction - `CO_REDUCE` There is a possibility that the reduction that is needed is not among the supported ones above. In that case, you can define your own function to do the reduction and call `CO_REDUCE`. The function needs to be `PURE`, and it needs to apply the operation to its two arguments. It also needs to be commutative, so `f(a,b)` needs to do the same thing as `f(b,a)`. The following example checks if all elements of the logical variable `flag` are true, the same way that the `ALL` intrinsic would do for normal Fortran variables. ``` program main implicit none integer, parameter :: n = 3 integer :: i logical, dimension(n) :: flag flag = [(cos(0.2*i*this_image()) > 0.,i=1,n)] write (*,'(I4," ",3L2)') this_image(), flag call co_reduce (flag, both, result_image=1) if (this_image() == 1) then write (*,'(A5,3L2)') "All: ", flag end if contains pure function both (lhs,rhs) result(res) logical, intent(in) :: lhs,rhs logical :: res res = lhs .AND. rhs END FUNCTION both end program main ``` And here is its output: ``` 2 T T T 3 T T F 4 T F F 1 T T T All: T F F ``` # Errors, error discovery and program termination What happens when errors occur and images terminate needs to be defined carefully. Fortran has facilities to detect failure on individual compute nodes and offers possibilities to deal with them. ## Image states There are three states that an image can be in: It can be an - *active image* if it is running normally - *stopped image* if it has been terminated normally by reaching the end of the main program or by executing a `STOP` statement. - *failed image* when an image stopped working for some reason (for example a hardware failure) or execution of a `FAIL IMAGE` statement. Once an image is in a stopped or failed state, there is no coming back - it will always remain in that state. An image can also be terminated by an *error condition*; all other images should then also be terminated by the system as soon as possible. This is what usually happens when you try to allocate an already allocated variable, open a non-existent file for reading without specifying a `STAT` variable. ## Look at the state you are in If you synchronize with a failed or stopped image, try to allocate or deallocate a variable there or other similar things, what is the system to do? Without direction from the programmer, it will simply terminate the program (an error condition, as above). This is not very useful as a fail-safe tactic. However, the programmer can specify a `STAT` and optionally the `ERRMSG` arguments to catch the error and act accordingly. It is then possible to compare the value returned for the `STAT` argument against predefined values from `iso_fortran_env` and then use the intrinsic functions `FAILED_IMAGES()` and `STOPPED_IMAGES()` too look up which ones failed. ``` program main use iso_fortran_env, only : STAT_FAILED_IMAGE, STAT_STOPPED_IMAGE integer :: sync_stat, alloc_stat sync all (stat=sync_stat) if (stat /= 0) then if (stat == STAT_FAILED_IMAGE) then print *,"Failed images: ", failed_images() else if (stat == STAT_STOPPED_IMAGE) then print *,"Stopped images: ", stopped_images() else print *,"Unforseen error, aborting" error stop end if end if ``` # Getting it to work ## Using gfortran The [GNU Fortran compiler](https://gcc.gnu.org/onlinedocs/gfortran/) supports [OpenCoarrays](http://www.opencoarrays.org/). If you do not have it in your Linux distribution, you can follow the [installation instructions](https://github.com/sourceryinstitute/OpenCoarrays/blob/main/INSTALL.md) . Compilation then will be done via ``` $ mpif90 hello.f90 -lcaf_mpi ``` and the program can then be run by ``` $ mpiexec -n 10 ./a.out ``` Another possibilility currently under development is the [shared memory coarray branch](https://gcc.gnu.org/git/?p=gcc.git;a=tree;h=refs/heads/devel/coarray_native;hb=refs/heads/devel/coarray_native). This will work without any additional libraries and currently under active development, but does not yet have all features implemented. ## Using ifort If you use `ifort`, you can use the `-coarray` option, as in ``` $ ifort -coarray hello.f90 ``` and then run the executable. This will give you the shared memory version. For more details refer to the manpage of ifort. ## Using NAG Fortran If you use `nagfor`, you can use the `-coarray` option, as in ``` $ nagfor -coarray hello.f90 ``` and then run the executable. This will give yo the shared memory version. For more details refert to the manpage of nagfor.