The algorithms of antiquity : Eratosthenes

Eratosthenes, or rather Eratosthenes of Cyrene (275-194BC), was an ancient Greek scientific writer, mathematician, astronomer and poet (basically a polymath). Born in Cyrene (modern Libya), he was educated at Alexandria and Athens, eventually becoming entrusted with the care of the library at Alexandria. Circa 200BC he developed an algorithm for finding all prime numbers up to a specified integer known as the “Sieve of Eratosthenes.” Myth suggests he inscribed the series of odd numbers on parchment, and then cutting out the composite numbers, the parchment with all its holes resembled a sieve, hence the reason why the algorithm is named Eratosthenes Sieve. However Eratosthenes also invented the Julian calendar, and calculated the circumference of the Earth.

Image
Eratosthenes’s method to calculate the circumference of the earth

Eratosthenes surmised that in Aswan (Syene) on June 21, the sun would be directly overhead and cast no shadow. At the same moment, the sun would cast shadows in Alexandria, about 600 miles north of Aswan. So at noon on June 21, 230BC, Eratosthenes measured the height of a pole, and the length of its shadow in Alexandria. He found that angle ∠1 measured about 7.2°. Based on the idea that the sun’s rays are parallel, and that “alternate interior angles of parallel lines are congruent”, then ∠1 ≅ ∠2. Therefore ∠2 = 7.2°, or 1/50 of a circle. Assuming the distance between Aswan and Alexandria to be 5000 stades then Eratosthenes found the circumference of the earth (c) to be:

7.2/360 = 5000/c
c = 250,000 stades★ 
1 stade = 185m
c = 46,250km

★ What is a stade? A stade was an ancient Greek unit of length, defined as consisting of 600 Greek feet. There were a bunch of actual sizes for stadia used in the ancient world (possibly six or more), and it is a hotly debated topic.

Engels [2] suggests Eratosthenes used the Attic stade at 184.98m. Although many people have cited the use of the itinerary stade (157m) [3], Engels [2] provides arguments questioning the validity of this number, and why the Attic stade makes sense. One such work is the empirical determination of the length of the stadia made by Lev Vasilevich Firsov, who averaged 81 distances given by Eratosthenes and Strabo with straight-line distances measured by modern methods [4], resulting in a value of 157.7m. One of the arguments is that many authors from the first century AD suggest that 1 Roman mile = 8 stades. A Roman mile is roughly 1480m, so 1/8 of a mile is 185m [1]. Recent work also tends to favour the use of 185m [5].

Now the actual equatorial circumference is 40,075km, so the calculation is not terrible considering the amount of approximation going on (15% difference). Using this value of 157m, the result is 39,250km, which is only a 2% error (and hardly seems believable). The fact that Eratosthenes’s result is not 100% accurate does not distract from the fact that what he achieved was remarkable. There has been at least one modern experiment performed to replicate Eratosthenes’s method [6].

Further Reading:

  1. Walkup, N., “Eratosthenes and the Mystery of the Stades” Dissertation (2005)
  2. Engels, D., “The Length of Eratosthenes’ Stade”, American Journal of Philology, 106(3), pp.298-311 (1985)
  3. Dreyer, J. L. E., “The well of Eratosthenes”, The Observatory, 37, pp.352-353 (1914)
  4. L. V. Firsov, “Eratosthenes’ Calculation of the Earth’s cumference and the Length of the Hellenistic Stade”, Journal of Ancient History, 121, pp.154-175 (1972)
  5. Shcheglov, D.A., “The so-called ‚Itinerary Stade‘ and the Accuracy of Eratosthenes’ Measurement of the Earth”, KLIO, 100(1), pp.153-177 (2018)
  6. Longhorn, M., Hughes, S., “Modern replication of Eratosthenes’ measurement of the circumference of Earth”, Physics Education, 50(2), pp.175-178 (2015)

The algorithms of antiquity: Babylonia

Although we equate algorithms with computing, they have a much longer lineage, some have been around since antiquity.

Let’s start with the Babylonians. Babylonia was named for its capital Babylon, famed for its Hanging Gardens, situated in the southern region of Mesopotamia, which combined the territories of Sumer and Akkad. Babylonians wrote using cuneiform, a script that goes back to about 3000 BC. The Babylonians were well versed with the notion of mathematics using a base 60, or sexagesimal, numeral system. From this system we derive the modern-day usage of 60 seconds in a minute, 60 minutes in an hour, and 360 (60 × 6) degrees in a circle. Less well known is that the Babylonians worked with floating point sexagesimal numbers. The Babylonian mathematicians were adept at arithmetic calculations, including algebraic equations. The caveat is that they had no real notation to express the algebra. Instead they used a step-by-step set of rules to represent each formula. On one ancient stone tablet, YBC 7289 an approximation of the square root of 2 has been found, represented in sexagesimal notation as (1; 24, 52, 10) which is 1.41421296 in decimal form. This differs from the true value only by 0.0000006.

Image
YBC 7289, ca. 1800-1600 BC

One of their interesting algorithms demonstrated knowledge of the Pythagorean theorem well before Pythagoras. In 1894, Father Jean-Vincent Scheil excavated a clay tablet at an archaeological expedition at Sippar, southwest of Baghdad. The tablet, known as Si.427 is from the period 1900-1600BC. At the time its significance was not known, however recent analysis by mathematician Dr. Daniel Mansfield suggests it is a cadastral document, used by surveyors to define land boundaries [1,2]. The ancient surveyors used Pythagorean triples to construct accurate right angles. The Plimpton 322, a Babylonian clay tablet, contains a table which lists two of the three numbers in what are now called Pythagorean triples, i.e., integers a, b, and c satisfying a2+b2=c2 (although in all honesty there seems to be a lot of debate over its contents).

In 1972 Donald Knuth published “Ancient Babylonian Algorithms” [1]. In it he describes a series of examples of “Babylonian programming” – algorithms denoted on various tablets to perform calculations, and basically solve algebraic problems.

✽ Note that quite a few descriptions on Babylonian tablets seem to cite a translation of a Pythagorean algorithm from a ca. 1900BC tablet by a Dennis Ramsey – I have not been able to find the original source of this anywhere.

Further Reading:

  1. 3,700-Year-Old Babylonian Clay Tablet is Earliest Known Example of Applied Geometry, SCI NEWS (Aug.5, 2021)
  2. How ancient Babylonian land surveyors developed a unique form of trigonometry — 1,000 years before the Greeks, The Conversation (Aug.4, 2021)
  3. Knuth, D. E., “Ancient Babylonian algorithms,” Communications of the ACM, 15(7), pp.671–677 (1972)
  4. Beery, J.L., Swetz, F.J., “The Best Known Old Babylonian Tablet?“, MMA: Convergence
  5. Cuniform Digital Library Initiative

R.I.P. 🪦 archive.org

One of the best ideas on the internet was the one which provided digitized copies of books and magazines that are out-of-print – the website archive.org. Fantastic idea, because it gives the lay-person digital access to books that can usually only be found in sporadic libraries, or in off-site storage facilities that require a request to be made. The problem has always been one of access, and the ability to browse and book, rather than having a requirement to see a specific series of pages. Some books I end up buying, others just aren’t available, or available at a huge cost because they are so rare. I would even pay for such a service, but I guess nothing good lasts forever.

What I like loosing the least is access to 80+ years of Popular Photography. I would happily pay to access this sort of information, but it isn’t available anywhere. So now, instead of being able to browse digital versions, I shall have to go old-school and physically go to the reference library and request archived copies. It’s not the end of the world, but it is for those without easy access to such information.

So recently archive.org lost a law-suite against against a publisher, complaining about copyright. Now I get copyright from the perspective of the author, and yes, books still in print should not be digitized and available for free. However out-of-print books I think are fair game. They will in all likelihood never be printed again, and won’t make any publisher any money, so why bother? Because of greed. And before you say anything about lack of royalties for authors, believe me the only authors that make money are those whose books still selling 20+ years after being published. Publishers and bookstores make the lions share of the money. Copyright in the US is the age of the author plus 70 years (on or after 1978). Works published after 1923, but before 1978 are protected for 95 years from the date of publication. This is really somewhat ridiculous considering patents are only like 20 years. So that means a book published in 1950, which hasn’t been in print since 1965 will only be out of copyright and freely accessible in 2045.

The only people that loose here are the people who would like to read something that they have little or no access to because they wither live in a place that doesn’t have great libraries, or can’t get hold of a book that’s been out of print for 40 years. It’s about doing the right thing when it comes to knowledge.

Wading through the software mediocrity

It’s funny that we still write programs. I mean we have software to do most things now. Word processing (with quasi-intelligent editing) – check. Numerical software – check. Photo-editing software – check. Smartphone software – check. Software that lets anyone build a website, and a e-store – check. I mean there isn’t much left to do. Yeah, I’m sure there are a few scientificky things that need code, and new mechanical things that likewise will need new code, but everything else that we use day-in-day-out is pretty much working.

From here on in, we’re just making minuet adjustments to existing software. If IOS has some issues, they get fixed, some minor adjustments are made, maybe the layout of some things changes, some more emojis are added, and then the iPhones are updated. In the last 10 years that hasn’t resulted in too many awe-inspiring changes. The same with the writing software I use, Scrivener – great piece of software, but at this point it’s hard to add more functionality to it. Drawing software, photo-editing software, it’s all pretty much the same – it has evolved to the point where there is little or no evolution necessary. Even the web is easy. If you want a web-store, or a blog. You don’t have to build it from scratch anymore. Most systems, like Squarespace, allows users to use pre-built website templates and drag-and-drop elements to create and modify webpages. It’s super easy. You don’t need to have any programming skills to build websites for people – but you do need design skills, because it still needs to look good.

What I’m getting at is that there isn’t really the same frontier of applications that there were 30, 20, or even 10 years ago. When the iPhone first appeared in 2007 there was a frenzy of app-writing activity. Companies still write apps, but there isn’t anything exciting about it really. So where does that leave us? Small changes and bug fixes will likely be automated in the future (if they aren’t already), so what new exciting software will be created in the future? More AI? Hardly exciting stuff – build an AI and let it eat data and grow in intelligence.

The reality is that the software world has reached a plateau. Normal everyday software has no need to evolve. Technology like televisions hasn’t evolved in years, self-driving cars are not as intelligent as people think they are, and AI isn’t that useful yet (although Google Translate is pretty cool). Where are the Trek replicators, the Trek medical scanner? Maybe we’ll see them sometime in the future, but not before we breakthrough the software stagnancy we are currently experiencing. We need some sort of pivotal change in computing to move software forward. This will likely be precipitated by someone, not something – the same way that many technological breakthroughs have occurred. Until then, well just have to wade through the software mediocrity.

Tips for first-year computer science students

Below are some of the posts I have written over the years dealing with various aspects of students in computer science.

The original Googol

Before Google there was Googol.

Edward Kasner of Columbia University was very interested in large number estimates. Kasner described various large number scenarios, and popularized the concept of a “googol” in his 1940 book Mathematics and the Imagination. Here is an excerpt from the book:

“A very distinguished scientific publication recently came forth with the revelation that the number of snow crystals necessary to form the ice age was a billion to the billionth power. This is very startling and also very silly. A billion to the billionth power looks like this: 10000000001000000000. A more reasonable estimate, and a somewhat smaller number would be 1030. As a matter of fact, it has been estimated that if the entire universe, which you will concede is a trifle larger than the earth, were filled with protons and electrons, so that no vacant space remained, the total number of protons and electrons would be 10110.”

Kasner was looking for a name for 10100, and asked his 9 year old nephew, Milton Sirotta (in 1920) for a suitable name. Milton coined the term googol, and so 10100 became known as a googol. Later the even larger 10googol became known as a googolplex. The name may have come from the American comic strip Take Barney Google, F’rinstance, which debuted in 1919.

Image

✿ The “very distinguished scientific publication” turns out to be “Science News Letter“, and in particular December 18, 1937 (Vol.32,No.871). The article by Dr. Frank Thone was about the uniqueness of snowflakes, and was titled “No Two Alike” (p.394-396). Thone asks the question “How many separate snow crystals to make a whole Ice Age?”, and then suggests “billions to the billionth power!”. It was really more of an idea than a fact.

What is Meansort?

No, it’s not an unkind, nasty or malicious sorting algorithm.

Meansort is a variation on Quicksort which uses the mean value of those elements being partitioned as the pivot. The mean value for use during partitioning step k is determined during the scan of a previous step. The first partitioning step simply uses the left element. The algorithm was proposed by Dalia Motzkin in a 1983 issue of Communications of the ACM [1]. The paper describes the algorithm, and compares it with Quicksort. A year later, the Motzkin added a technical correspondence relating to some typographical errors, and more information on the algorithm [2]. Like Quicksort, Meansort is also recursive.

The code for the meansort() is recursive subroutine written in Fortran.

recursive subroutine meansort(m,n,mean)
   integer, intent(in) :: m,n,mean
   integer :: temp, pmean, i, j, left_sum, right_sum
   logical :: loopt=.TRUE.
   if (m < n) then
      left_sum = 0
      right_sum = 0
      i = m
      j = n
      do while (loopt)
         do while (k(i) < mean .and. i /= j)
            left_sum = left_sum + k(i)
            i = i + 1
         end do
         do while (k(j) >= mean .and. i /= j)
            right_sum = right_sum + k(j)
            j = j - 1
         end do
         if (i /= j) then
            temp = k(i)
            k(i) = k(j)
            k(j) = temp
         else
            exit
         end if
      end do
      if (i == m) return
      right_sum = right_sum + k(j)
      pmean = aint(left_sum / real(i-m))
      call meansort(m,i-1,pmean)
      pmean = aint(right_sum/real(n-j+1))
      call meansort(j,n,pmean)
   end if
end subroutine meansort

The array storing the integers is represented by k:

integer, dimension(100) :: k

In this case, because the subroutine meansort() is nested within the program, it can access the array. For the initial call of meansort(), m=1, n=100, and mean is simply the left element.

Further reading:

  1. Motzkin, D., “Meansort”, Communications of the ACM, 26(4), pp.250-251 (1983)
  2. Motzkin, D., “Technical Correspondence”, Communications of the ACM, 27(7), pp. 719-722 (1984)

Thinking of cheating in computer science?

Since ChatGPT arrived there has been a lot of discussion about cheating in universities. I mean how many students would realistically turn to using AI to cheat?

Likely quite a few. The studies on it abound.

And for those wondering whether using AI to complete an assignment is cheating or not, it’s often not that simple. If you were using it to find resources for a paper, it may be all good and well, but if you are asking it to write a program for you, well that’s academic misconduct. Why? Because it’s not an original piece of work, the AI created the program – not the student. Should the AI get the degree? A recent Forbes article suggested that more than half of students believe using ChatGPT to complete assignments is cheating. That 51% is obviously the students with some integrity. The 20% who disagreed are probably the ones who do cheat on a regular basis (the rest were sitting on the fence).

I mean some CS students already cheat by outsourcing their work, paying to have someone else do it, the phenomena known as contract cheating. It happens, and it’s hard to catch. More people cheat because there are more tools with which to cheat. I mean there are even some websites that blatantly show you how to circumvent code checkers like MOSS. Some have likened the use of AI in education to calculators, and wikipedia. Hardly, to use a calculator you still had to know what you were calculating, and wikipedia is just a huge reservoir of information, information you still have to interpret, analyze, and verify to make sure it is actually correct. Hardly the same as AI.

Letting students get away with cheating is not really an option. Would you want to drive over a bridge designed by someone who cheated throughout their engineering degree? What about a doctor who cheating in an anatomy class? Hardly. So what should be done to fix this situation? For things like essays, just like AI is used to create papers, it will be used to identify them. Turnitin now has an AI-detection add-on, and a high level of detection accuracy. But it’s likely not the sole solution. In computer science things get a little trickier. There are programs like MOSS to detect plagiarized programs, but likely better tools are needed. The question is how many different versions of a program can ChatGPT create? I mean if 100 students ask the ChatGPT to write the same assignment, do you really think it’s going to generate different versions of the program? I mean why would it? They are going to be very similar, and hence much easier to spot.

There are a number of real solutions for CS, but people aren’t going to like them, mainly because they are more suited towards small classes. There is no doubt that AI may finally force some instructors to deal with the way they design assessments in courses.

  • Oral assessments – Reduce the percentage of assignments and projects and incorporate a final exam in the form of an oral assessment, worth at least 50% of the grade. People who cheat likely won’t pass.
  • Large-scale projects – Instead of having small assignments, have more continuous large-scale projects, perhaps group projects where there is some level of peer assessment. This may involve changing the whole way a CS degree runs, but it would be more experiential, and weed out the cheaters early. It could also incorporate some level of oral assessment.
  • New assignment topics – Don’t make assignments that have a lot of existing material. AI creates programs from existing knowledge. If it doesn’t know about a topic, it will be harder to generate a program.
  • AI detectors – Like GPTZero which can detect AI-written text, and maybe code in the future.

Look, at the end of the day, if you’re in computer science and you cheat, that’s on you. You may not get caught, you may even graduate having cheated all the way through. But it’s your integrity that is at stake. If you have lied about your work at university, what’s to say you won’t lie about it in a job? Will you actually be able to get a job in computing? Maybe not considering most companies ask some crazy questions during interviews. And even if you get a job, somewhere along the way your cheating will catch up with you. That’s how karma works.

If you cheat, you are only cheating yourself.

NOTE: A lot of instructors likely don’t report plagiarism because of the mountain of paperwork it creates. Also in many universities, the instructor has no control over the investigation, or penalties. Maybe if they did, penalties for cheating would be harsher, and fewer people would be inclined to cheat.

Further reading