Index Zero - Medium

A Simple Explanation of OOP

Caleb Winston — Sat, 25 May 2019 17:26:49 GMT

Types

You already know how you can declare variables and assign values to them.

For example, you can declare a variable called x.

int x;

And you can assign the value 5 to the variable x .

x = 5;

You can also declare a variable called y.

String y;

And you can assign the value "hello world" to the variable y .

y = "hello world";

The values you assigned to these two variables were of different types.

The variable x was assigned a value of 5 , which is of type int.

The variable y was assigned a value of hello world , which is of type String.

Both of these types are defined in the Java language.

But the Java language has a way for you to define types as well.

You can define the structure of values of type Car by defining a class.

class Car {
    int mileage;
    String name;
}

You can then declare a variable of type Car .

Car mySedan;

You can then assign a value to the mySedan variable.

mySedan = new Car();

And that’s object-oriented programming (OOP) — creating classes that define a type of values.

A Simple Explanation of OOP was originally published in Index Zero on Medium, where people are continuing the conversation by highlighting and responding to this story.

To-do Lists

Caleb Winston — Tue, 09 Oct 2018 16:01:32 GMT

Index Out of Bounds ~ October 2018 ~ More stuff by me can be found on my website at calebwin.github.io.

To-do Lists was originally published in Index Zero on Medium, where people are continuing the conversation by highlighting and responding to this story.

On Heaps

Caleb Winston — Wed, 03 Oct 2018 15:16:19 GMT

A heap of data

Almost every data structure ever invented was designed for doing specific things really well. For example, lists are really good for storing stuff that’s ordered. Sets are really good for storing stuff where we don’t care about order. Similarly, heaps are really good at storing stuff where the only thing we care about is the one with the highest priority.

For example, let’s say we want to store data for a bunch of tasks on a to-do list.

Each task has a priority attached to it.A to-do list

And we have a priority attached to each task…

A to-do list with priorities

And we really want to be able to quickly get the task with highest priority at any moment… What data structure could we use?

As it turns out, there’s one data structure that is perfect for doing this — heaps.

What heaps do

Like sets, heaps don’t store data in a strict order. The only order they maintain is one element with the highest priority and all other elements with other priorities beneath it.

How heaps store data

And heaps really only have three main operations.

Add an element to the heap with a certain priority
Get the element with highest priority
Remove the element with highest priority

As you can see, none of these operations do anything with any of the elements with less than the highest priority.

So you might be wondering — what happens when we remove the highest element? and what happens to the current highest element when we add a new element with higher priority?

Well, removing the top element simply brings the next highest element to the top of the heap.

Removing element with highest priority from a heap

And adding a new element with highest priority will bury the previous element with highest priority.

Adding new element with highest priority to a heap

How to implement heaps (with trees)

So how can we structure our heaps? Well, a tree-like structure turns out to work pretty well.

A tree we can use to store a heaps elements

The above tree has a a couple properties that make it perfect for implementing a heap.

Every element node has a priority greater than each of its children nodes
Every row of the heap must be filled as much as possible from left to right

As long as we can maintain a tree with these two properties satisfied, we can implement an efficient heap that can have elements added to and removed from.

Let’s take a look at implementing such a tree by first looking at how we would implement addition to the heap.

How to implement addition to heaps (with trees)

The simplest way of doing this is to add an element all the way at the end of the bottom-most row of the tree.

Adding a new element with priority 9 to the end of the bottom-most row

As you can see, it certainly satisfies the property that our tree must be filled completely on each row from left to right. If we put our new element anywhere else, this property would no longer be true.

However, notice that the other property is not satisfied — the property that each element node must have a greater priority than all of its children nodes. The element node with priority 7 is not less than its new child with priority 9.

The simple solution to this problem is that whenever we add a new element at the end of the tree, we repeatedly swap the new element with its parent element until its parent element does indeed have a greater priority than the new element.

So in this case, we first check if the new element with priority 9 is less than its parent node with priority 7. It’s not, so we swap.

Swapping the elements with priorities 9 and 7

Next, we check if the new element with priority 9 is less than its parent node with priority 10. It is, so we’re done — we have added a new element and satisfied both properties of our tree.

So adding new elements to our heap requires us to do two things in our tree —

Add a new node with the new element at the end of the bottom-most row
Repeatedly swap the new element with its parent node as long as it has a greater priority than its parent

How to implement removal from heaps (with trees)

Heaps have another really important operation they need to be able to do — removing the element with highest priority.

There happens to be a really simple way of removing the top-most elements from trees while preserving both of our heap properties.

Swap the top-most node with the node at the end of the bottom-most row
Remove the node at the end of the bottom-most row
Repeatedly swap the new top-most node with the child with the greatest priority as long as that child has a greater priority than the node we’re swapping.

Let’s look at an example.

A tree we want to remove the top-most element from

Let’s say we want to remove the top-most node with priority of 10. We can first swap it with the last node on the bottom-most level, the one with priority 7.

The same tree with the top-most element swapped with the bottom-most element

Our element with highest priority is now at the end at the bottom, so we can easily remove the node without losing the property that every row is filled as completely as possible.

The same tree with the element with highest priority removed

Now we have the element with highest priority removed. But as you can probably see, we have lost the other property that each node must have a higher priority than all of its children. The new node at the top of the tree no longer has a greater priority than all of its children.

We can solve this problem by repeatedly swapping the top-most element with its child with greatest priority as long as that child has a greater priority than the element we’re swapping.

So first we look at the children of the element with priority 7.

The children of the new top-most element

Well, the child with largest priority is the one with priority 9 and that is greater than the priority of the top-most element (7). So we swap.

The top-most element swapped with its child with greater priority

Now we have swapped the top-most element with one of its children, we take a look at its new set of children at this new position in the tree.

The children of the element at its new position in the tree

If we look at the children of the element now, we can see that none of them have a greater priority than the element. So the element is now in its correct place in the tree and both properties are satisfied.

Now lets see how we can implement this 3-step process for removal and the 2-step process for addition we looked at earlier with an array.

How to implement heaps (with arrays)

We’ve looked at how we could implement heaps fairly easily with trees. It so happens that we can implement these “heap trees” fairly easily with a simple array.

We can have an array that stores each element left to right and top to bottom in that order.

For example, let’s take a look at the following tree representation of a heap —

A tree representation of a heap

We can store the data for this tree in the following array —

An array to store the data of our tree representation

As you can see, we simply order the elements by how they appeared in our tree from left to right, top to bottom.

Let’s see how we can realistically implement addition and removal from heaps with arrays.

How to implement addition to heaps (with arrays)

If we want to add a new element to a heap, the first thing we do is add the new element to the end.

A new element added at the end with priority 8

But the end might not be the correct spot for this new element with priority 8 to be. To get the element to its correct spot, we should first look at its parents.

If we know the index we added the new element at (in this case it’s 6), we can quite easily find its parent element.

Finding the parent element of the new element we just added

If we take the index of the new element, subtract 1, divide by the number of children of each node (3 in this case), and truncate the answer (remove everything after the decimal point), we get the index of the parent.

All we have to do then is compare the index of the new element we just added and its parent.

Comparing the priority of the new element we just added with the priority of its parent element

Since our new element has a priority of 8 while its parent has a priority of only 7, we swap them.

The new element swapped with its parent with smaller priority

We now look at the new element’s new parent.

The parent of the new element at its new position

But now we can see that its parent actually does have a greater priority as it should. So we don’t need to swap and we leave our new element as it is.

As you can see adding new element to our “heap array” is really just the same two-step process we had discussed earlier —

Add a new node with the new element at the end of the bottom-most row (now that’s just the end of the array)
Repeatedly swap the new element with its parent node (the element at position (n-1)/3 where n is the current index of new element) as long as it has a greater priority than its parent.

How to implement removal from heaps (with arrays)

Removing the top element from our heap is also pretty straightforward with arrays. The first thing we do is swap it with the last element.

The top-most element swapped with the last element

Now we can safely remove the element from the end of the array.

The (former) top-most element removed

But now we have a bit of a problem. We’ve safely removed the element with priority 9, but remember the element we swapped it with? the one with priority 7? That’s now in the wrong place.

This can be fixed by repeatedly swapping the element with its greatest priority child as long as the child has a greater priority than the element.

We first find its children using the index of the top-most element (initially it’s 0). We simply multiply the current index by the number of children of each node (in our case, that’s 3), then add 1, 2, or 3 to get the indices of the element’s children.

The children of the top-most element

If we look at the children, we can find the one with highest priority.

The priorities of the children of the top-most element

The one with priority 8 is the largest and its priority is also larger than the current top-most element (7), so we need to swap.

The top-most element swapped with its largest child

We can now look at the children of the element in its new position.

The children of the top-most element in its new position

Since none of its children in its new position have a larger priority than it, we can stop.

Once again, we can see that the process of removing an element from a heap is not really any different from the process we had discussed earlier —

Swap the top-most node with the node at the end of the bottom-most row
Remove the node at the end of the bottom-most row
Repeatedly swap the new top-most node with the child with the greatest priority as long as that child has a greater priority than the node we’re swapping.

So there you have it! 👏 Heaps are a simple data structure that are really good at storing data where the only thing you care about is the element with highest priority. Now you know what they are and now you know how to implement them! 🙂

If you enjoyed reading this article, you can find more articles by me and projects I’m working on (such as this Markov chain compiler) at calebwin.github.io.

On Heaps was originally published in Index Zero on Medium, where people are continuing the conversation by highlighting and responding to this story.

On Disjoint Sets

Caleb Winston — Mon, 17 Sep 2018 14:44:32 GMT

Maybe you’ve heard about merge-find sets, or union-find forests, or perhaps disjoint sets. Surely such a pedantic name is that of a correspondly complicated data structure… right? 🤨

Well… not really — conceiving a data structure that can store disjoint sets is not actually all that hard.

What are sets?

Before we take a look at disjoint sets, what exactly are sets? A set is really just a collection of elements.

A visual representation of a single set with three elements

As you can see from the visualization above, sets don’t maintain an order of elements. We can perform pretty much any operation you can perform on a set of data — addition, deletion, size, etc. — except for operations that require an order of the elements to be known.

For example, a few things sets can’t do are inserting at an “index”, popping, or pushing elements, etc.

Another interesting thing about sets — what happens when we add an element to a set that already exists in it?

What should happen when we add a duplicate element

That’s right — we simply replace the element that was already there; we don’t do anything.

There’s a simple reason for why sets don’t store duplicates. In other data structures such as lists and dictionaries, every element was different in at least one way. Even if two entries had the same value, they had to have different indices or keys. But in a set, we’re just storing the elements by themselves in no particular order. So we can’t distinguish a 3 we add in first from a 3 we add in later.

What are disjoint sets?

Now that you know what sets are, a group of disjoint sets is really just a bunch of sets where each set has a representative element.

A bunch of disjoint sets and their representative elements

That’s really all a data structure holding disjoint sets needs to keep track of — a bunch of elements and the representative element of the set they belong to.

The operations you can perform on disjoint sets are also quite simple -

Make a new set with a new representative element
Find the representative element of the set an element belongs to
Union the sets two elements belong to

The first operation is making a new disjoint set and putting an element in it. This element will be its representative element. This is shown in the below visualization.

Making a new set with the element 4

The second operation is finding the representative element of the set an element belongs to. In a previous picture, we saw that the representative element of the set with {5, 3, 2} is 5. So the representative element of 3 is 5 because 3 is in the set which has 5 as its representative element.

Finding the representative element of the set that contains 4

The third operation is taking the union of the sets that contain two elements. So what’s a union? A union of two sets is what we get when we take all the elements of one set and all the elements of the other set and put them together into the same set. Here’s what that looks like.

Taking the union of two sets

As you can see, we pretty much just took the elements in both sets and put them together in the same set. However there are two other things you should notice.

We kept only one 3 even though there was 1 in both sets. This is something we have to do be because, as we mentioned earlier, we can’t distinguish the 3 from the first set from the 3 from the other set. So instead, we just keep one of them.
We also had to pick a new representative element for the new set. Naturally, we picked one of two representative elements of the original set. But why did we pick the 9 and not the 5? The answer is that it doesn’t really matter that much which one we pick. Depending on how we implement our disjoint set data structure, we might choose to pick either one.

Now that we’ve seen the two things we need to keep track of for disjoint sets (all the elements, the representative elements of the sets they belong to) and the three operations we need to be able to perform (make a new set, find a representative element, and union two sets). Let’s move on to how we would actually implement these sets.

How to implement disjoint sets

So how exactly do we implement a data structure that stores disjoint sets?

One way we can store disjoint sets is with a tree-like structure. For example we can represent the following disjoint sets as follows.

Two disjoint sets represented by trees.

If we use some sort of tree-like structure, all we have to do to make a set is create a new “tree” with the representative element of the new set at the root or the top of the tree. To find the representative element of the set an element belongs to, we simply have to follow the element up the tree till we get to the representative element at the top. To take the union of two sets we simply have to make the root of one set’s tree point to the root of other other set’s tree.

Taking the union of two sets

So it looks like trees are a great way of representing disjoint sets. We can store everything we want to and perform the operations we want to. But what do we actually use to store the trees?

As it turns out arrays can be used quite effectively to store these “disjoint set trees”. We can have an array and each element can correspond to an index in our array and the parent of that element in the tree can correspond to the element at that index in our array. Here’s what that would look like.

An array that stores the tree representation of the disjoint set(s)

As you can see the elements at indices 2, 3, and 7 represented the parents of elements 2, 3, and 7 in the disjoint set tree. They all had a value of 5 because that’s what their parent element is in the tree.

And you can also notice that the element at index 5 was a -1. When we are trying to find the representative element of a set an element belongs to, we can keep moving to the parent of the current element till we get to one with a value of -1. Then we can know to stop because the current element’s index is the representative element.

So just to summarize — here’s how we can perform our three operations on disjoint sets using an array.

We can make a new set by simply adding an element to the array at the index of the value we want the set’s representative element to have.

Making a new disjoint set with a representative element of 9

We can find the representative element of the set an element belongs to by repeatedly moving a pointer to the parent element of the original element.

Finding an element in our disjoint sets using our array

Lastly, we can also take the union of two sets by simply setting the parent value of the representative element of one to set to the value of the representative element of the other set.

How to make disjoint sets work for stuff other than number

Up to this point, we’ve just been looking at disjoint sets where all the elements are numbers. However, it is possible for our disjoint set data structure to support any type of data. The simplest solution is to maintain a dictionary that maps data in the type we want to corresponding numbers in our disjoint sets.

A dictionary that can map from strings to integer indices in our disjoint set array

How to optimize our disjoint set data structure

There are actually a few simple optimizations we can do to speed up the operations our disjoint set data structure can do. We’re going to take a look at the two most common ones.

The first is union-by-height. It turns out that even though we can randomly choose which element to choose as the representative element of the union of two sets, by making the element from the set with a taller tree, we can increase performance.

The second is path compression. When we try to find the representative element of an element’s set, the representative element we return in the end is technically the representative element of every element we come across. So, we can modify our find function to simply set the parent of each element we come across to the representative element we ultimately find.

So that’s the disjoint set data structures. 👏 Disjoint sets aren’t as “general-purpose” ase other data structures out there but they can be very useful in specific and now you know how to implement them! 👍

(If you want to take a look at a coded implementation for one, here is one I did in sixty lines in the Lua programming language)

On Disjoint Sets was originally published in Index Zero on Medium, where people are continuing the conversation by highlighting and responding to this story.

On Bags

Caleb Winston — Mon, 10 Sep 2018 15:49:42 GMT

The data structure, that is

A bag found in the wild

There are many data structures out there with various complex implementations. Each one has its advantages and disadvantages. Of all data structures, however, there’s one that’s probably the simplest way of storing data and super easy to implement — the bag. That’s what we’ll be focusing on in this article. Let’s go 🚀

But what’s a data structure?

So first of all, what exactly are data structures. If you take a look at the Wikipedia page for data structure, this is the definition you get —

A data structure is a data organization, management and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.

If this seems a bit complex, here’s a slightly simpler definition that I think can work well for our purposes —

A data structure is a format for structures that store data

That’s really all a data structure is. A simple example of one is a string. Strings are structures that store data — more specifically, they store characters in a linear seqence. In Java, we can create a string as follows —

String s = new String("Hello World!");
// or
String s = "Hello World!";

Let’s say we want a data structure that stores a linear sequence of anything — characters, integers, booleans, even strings. We call these kind of data structures generic. They are data structures that can store data of any type.

There are many linear data structures that store sequences of stuff, but we’re here to talk about a data structure that stores stuff in no particular sequence.

That’s right — we’re talking about bags!

A representation of an empty bag

What are bags?

Most implementations of the bag data structure work quite simply.

Once you create one, pretty much all you can do is add an element, check if an element is in the bag, find the size, and check if the bag is empty.

Here’s what adding would look like in our visual representation —

Adding a new element to the bag

Here’s what our bag might look like after several additions —

The same bag with more elements added

As you can see, these elements don’t keep any order. Unlike strings, for example, bags don’t store elements of data in a sequence.

So there are really only two things bags need to keep track of for the four operations mentioned above to be possible.

Everything a bag implementation needs to keep track of and be able to do

How to implement bags

So now that you know what bags are supposed to keep track of and what they are supposed to be able to do, let’s take a look at how we would go about implementing one.

First things first — we need to keep track of the stuff in the bag, and the size of the bag. As it turns out, a simple array will work fine for internally keeping track of what’s in our bag and we can use an integer for its size.

What we can use to store the stuff we need to keep track of

Now to finish our implementation of the bag data structure, we just have four operations we need to implement.

Add an element to the bag
Check if an element is in the bag
Get the size of the bag
Check if the bag is empty

At this point, the implementations of three of these operations should be pretty self-explanatory.

To check if an element is in the bag, we can just iterate over our array with a for loop and check if the element matches any of the elements we iterate through.
To get the size of the bag, we simply have to return the size variable we are keeping track of.
To check if the bag is empty, all we have to do is check if the size variable is 0 or not.

But how do we add an element?

It might seem that all we have to do is insert the element into the array at position size . When size is 0, we insert at position 0. When size is 1, we insert at position 1. (Note that we are assuming that indices start from 0 here; if they were to start from 1 we would insert at position size + 1 )

4 elements inserted into our array

But here’s our problem with this implementation of the add operations — what happens when we try to insert three or more elements here?

The array won’t have space for them. It’s only of size 6. It can only hold data for 6 elements in the bag and no more.

But that’s not how bags are supposed to work! Bags are supposed to allow you to add in as many elements as you want — not just 6.

So how can we solve this? We can’t just have a really huge array and hope that someone using our bag doesn’t need that more space.

A simple solution is to resize our array whenever we run out of space.

Resizing our internal array when its full

We can simply create a new array twice the size (or 4x size or any factor you want) and then copy over everything from the original array to our new resized array.

And that’s really all there is to the bag data structure. 👏 Bags are one of the simplest data structures out there and now you know how to implement them! 👍

(If you want to take a look at a coded implementation for one, here is one I did in forty lines (!) in the Lua programming language)

On Bags was originally published in Index Zero on Medium, where people are continuing the conversation by highlighting and responding to this story.