Stories by Optalysys on Medium

The distinction between FHE and TEEs: The Downfall attack

Optalysys — Wed, 16 Aug 2023 16:24:08 GMT

As a company focused on accelerating Homomorphic Encryption, we are often asked questions about the distinction between the security provided by FHE and other technical solutions for protecting data-in-use.

These questions often focus on existing hardware-level protections that are intended to isolate and defend data-in-use from the rest of a computing system. Some degree of hardware-based protection is present in almost every computing system; however, in the context of protecting very sensitive data, this hardware model can be extended to provide isolation of this data from even the system administrators themselves.

This isolation forms the basis of the Trusted Execution Environment, or TEE. Most of the largest CPU manufacturers provide some mechanism for creating a TEE, such as Intel Secure Guard Extensions (SGX), ARM TrustZone, or AMD Secure Encrypted Virtualization (SEV).

While specifics differ, the common thread connecting these approaches lies in the implementation of a secure enclave, a region of encrypted memory. Notionally, even if data or instructions are read from these regions, they would be unintelligible; this information can only be decrypted and used on the fly when within the CPU.

This provides additional protection against many potential threats, but it’s not infallible, and a recently released hardware attack methodology has once again highlighted the challenges that face TEEs.

The short answer to the question regarding the distinction between FHE and TEEs is that only the former offers total security for data when in use within a processing system.

If you’re interested in the long answer, read on. We’ll start by describing how information can be extracted from computing systems without directly attacking the core security measures, and then take a look at some recent examples of when memory-level protections have failed. We then go on to describe the foundational distinction between the protections offered by FHE, and the root of the failure that enables attacks on conventional microprocessing architectures.

Side channel attacks

It’s extremely rare for a properly-implemented and well-designed cryptographic solution to be defeated head-on, by attacking the actual principles on which it is based. For example, AES encryption is considered to be nigh-unbreakable in the context where an attacker doesn’t have the secret key.

However, if an attacker can recover the secret key by, say, monitoring the time taken for your CPU to execute the key generation step, then they’ve found a way of attacking the system that lies outside of the aspects that can be covered by cryptography. We would say that the attacker has successfully used side-channel information, and this kind of attack is known as a side-channel attack.

These attacks can be ingenious, or surprising; we recently saw an attack against keycard-controlled doors that can successfully recover a secret by analysing video of a power LED blinking. The pattern of the blinks reflects the instantaneous power draw of the system, which can give the attackers some clue as to the operations that the system is performing.

The tools attackers have for finding these clues aren’t limited to just LEDs either; past examples of hardware side-channel attacks (to name but a few) include analysing the high-frequency noise generated by systems working on cryptographic algorithms, or even deciphering the subtle differences in sound produced by keystrokes on a phone screen or keyboard. It’s also been noticed that AI is increasingly used to assist attackers in accurately analysing the complex data gathered in the course of such an attack.

Of course, some of these attacks are possible, but may require specific or unrealistic conditions to replicate. In practical terms, the range of side-channel attacks that can be deployed in the field is smaller than the range of all known side channel attacks.

However, where we do find practical side-channel attacks, defending against them is notoriously tricky. Now the attack surface, the range of potential vulnerabilities that we have to consider, is no longer neatly covered cryptography, or the protocols that can be built on top of it. Instead, we have to account for the messy business of computing as a physical phenomenon.

For example, in the case of the AES timing attack, we can prevent this attack by ensuring that our key-generation algorithm runs in constant-time, by which we mean deliberately programming the algorithm such that it takes the same amount of time to run regardless of the specifics of the key.

That’s a relatively straightforward solution that can be executed in software, and most good-quality cryptographic libraries (e.g Libsodium/NaCl) already implement safe behaviours in the vulnerable algorithms.

Addressing hardware problems is more involved; in the case of the power LED attack outlined above, the answer is to apply a capacitor or similar filter to the LED power trace in the circuit. The objective here is to smooth out the power supplied to the LED, and thus limit or prevent the tell-tale flickering.

Fixing the problem in hardware like this might mean that a product line or two is affected, which is expensive; there’s the question of remediation, recalls, refunds and so on. But what happens when a practical side-channel attack hits a system that is in use everywhere, with no practical alternative?

Spectre and Meltdown…

That’s a nightmare scenario that has repeatedly hit the chip-making industry. A few years ago, the Spectre and Meltdown attacks generated international headlines when they revealed hardware vulnerabilities in every Intel processor manufactured since 1995, as well as a quantity of ARM-based processors. AMD-based processors were also found to be vulnerable to the same essential principle behind these attacks.

The technical specifics of these attacks is complex, but the core idea is that Spectre and Meltdown allowed attackers to overcome the isolation boundary model of security that segregates interactions between processes in a CPU.

As applied in an Intel CPU, these boundary layers are described as “protection rings” that describe the conceptual level of control over hardware resources assigned to applications and processes. These rings range from Ring 3 (the lowest level of security, occupied by applications) through to Ring 0, which is reserved for the core OS processing kernel, with the intermediate levels reserved for things such as hardware drivers and virtual machines.

Nominally, this permission-based isolation means that data stored in memory under Ring 0 is protected from malicious applications, which means that it can be used to store sensitive information such as passwords and encryption keys when in use. However, while applications don’t have direct control over memory, they do have the capacity to request resources from processes running in Ring 0. It’s this request mechanism that allowed Spectre and Meltdown to work.

Both attacks exploit a property of CPUs called speculative execution, a technique used to accelerate processing by allowing the CPU to “guess” possible future commands that might be sent to it, execute the outcome of each command, and then retain and act on only the outcome corresponding to the actual instruction that it receives.

By executing these commands before they are actually received, this speeds up the overall execution of program flow control, and is an extremely helpful trick for bridging the gap in latency induced by retrieving information held in slower RAM.

Meltdown and Spectre manipulated this behaviour to leak information from Ring 0 memory. While slightly different with respect to their mechanism of action, the essential idea behind both is that by submitting requests over data held in protected memory, even though the request would eventually be refused, the CPU would still speculatively execute the outcome. This would temporarily place protected information in the CPU cache, which could then be dumped to unprotected memory.

By repeatedly performing this attack, the entire contents of the kernel memory (including passwords and encryption keys) could be extracted to an insecure environment and read by a malicious application. In short, almost every processor in use at the time harboured a critical flaw that would allow attackers access to areas of a computer’s memory that would normally be off-limits, and read the most important secrets.

Fixing this problem was fraught; simply disabling speculative execution would have slowed many workloads to a crawl. Fortunately, updates to the microcode of the processors could be made remotely, preventing the need for what would likely have been the biggest and most expensive product recall in history.

… and now Downfall

Now the curse of speculative execution side-channels has struck again, this time in the form of the Downfall attack.

However, what’s particularly interesting is that the isolation boundaries affected by the attack don’t just include the standard Security Ring model that was violated by the Spectre and Meltdown attacks, but also extend to Intel Secure Guard Extensions (SGX).

In brief, the authors of the Downfall attack determined that the “Gather” instruction can can leak the content of a vector register on the CPU. As this leaks information that is inside the CPU (and thus can’t be retained in an encrypted form), then like the Spectre and Meltdown attacks, this bypasses the in-memory encryption that forms a key part of the security of a secure enclave.

A more detailed explanation on Gather instruction manipulation is provided by Intel themselves, who are releasing a microcode update

Why is FHE different?

The unifying factor in all of these side-channel attacks lies in the fact that in regular computing hardware, data must be decrypted at some point in order to be used. And, as we have repeatedly seen, it is incredibly difficult to ensure that there are no hardware side channels that could be used to leak this decrypted information. The Wikipedia article on SGX contains a list of known attacks on the SGX architecture and their remediations, which goes some way towards highlighting the scale of the challenge.

However, the existence of this list doesn’t preclude the possibility that there are as-yet unknown attacks that could be launched against the principle of a TEE, nor does it rule out the worst-case possibility that malicious actors have discovered a new vulnerability that they are keeping to themselves.

FHE doesn’t face this problem; in fact, in the security model that we’re most concerned about (i.e executing computations in untrusted computing environments), FHE is an effective guard against all hardware-based side-channel attacks.

Not only does the information remain encrypted at all points, including when inside hardware processing environments (thus neutralising the attack vectors pursued by speculative execution attacks, which rely on dumping information which is decrypted on-the-fly within the CPU), but no cryptographic material that could be used to decrypt this information is held anywhere in the untrusted system.

As information is never decrypted, there’s no need to send a decryption key for use in on-the-fly decryption; the only additional cryptographic material needed is a “server” key, which facilitates bootstrapping, as well as the keys or hints required for operations such as ciphertext rotation and relinearisation. None of these pieces of information are sensitive, as they cannot be used to decrypt or otherwise break the security of the encrypted information. In short, there’s no point in the process where an attacker could ever gain access to data protected under FHE.

By removing side-channel attacks in hardware, attackers are instead obliged to directly attack the security of an encryption method backed by decades of research. This approach ensures that data remains protected not only for years, but decades and more.

The distinction between FHE and TEEs: The Downfall attack was originally published in Optalysys on Medium, where people are continuing the conversation by highlighting and responding to this story.

FHE: An alternative to client-side scanning?

Optalysys — Mon, 31 Jul 2023 13:12:08 GMT

With Joseph Wilson and Florent Michel

During our recent presentation at the Eyes-Off data summit in Dublin, we were asked a question about FHE. The question went something like this:

“If you can evaluate encrypted information, couldn’t that property be used to evaluate the content of a message? And doesn’t that property break the security of encryption”?

The answer varies, depending on the security model. In the “classic” symmetric example of FHE, we can outsource the processing of some data to a cloud that we might not otherwise trust.

In this model, we hold the decryption key and send no supplementary information back to the server. The server can only perform the processing and thus learns nothing (and can learn nothing) about the outcome.

However, how can we be sure that the server has run the right process? What if someone swapped out the intended algorithm with something that could analyse something different about the information we supplied to it?

In the symmetric case, we would receive an incorrect result from the calculation. Guarding against that possibility is important for the symmetric-key case.

However, this is especially important in the context for some applications, in which the use-model might involve some third party receiving information about the outcome of the computation. This is of particular interest in models of FHE that feature multi-party computing. In that case, we want to be certain that only the agreed-upon process was run and that the third party can only learn what we are comfortable with them knowing.

The above invokes the notion of an attestation, a proof that can be used to guarantees that a specific operation has been performed, even when we can’t directly see that process ourselves.

Attestation is a common theme in another area of confidential computing, specifically with respect to trusted execution environments, or TEEs. A TEE is a hardware-based solution to the problem of trusted computing; not only is the hardware itself often physically isolated from the rest of a computing system and inaccessible even to system administrators, but everything that happens inside the TEE is subjected to additional checks such as cryptographic validation, which can be used to attest to the fact that only authorised code is run.

Notionally, this provides some additional security. However, what if the entity that authorises the code and the entity that submits the data aren’t the same? How can we design a system that ensures trust across all parties, including the users of a service?

Ultimately, this is a very interesting question. In this article, we expand on the answer that we gave at the conference. There, we selected an example from a subject that has been in the news a lot as of late; the question of how we prevent end-to-end services being used to share child abuse material.

We’ll start by filling in some of the background to this debate, and then put forwards a toy model in which we apply FHE to the evaluation of messages that are sent over an E2EE service. Here, we want the server to be able to learn something about an image, but we don’t want to compromise the overall security of the system. In the process, we demonstrate the basic idea of how attestation would be required to ensure the validity of the computation, and then discuss what would be required for the deployment of such a system at scale.

Background

Moderating the internet is hard work. Any platform that allows users to communicate and share content faces a slew of difficulties when it comes to deciding what’s allowable on their service.

Be it contentious yet legal free speech, cyberbullying, or other sources of potentially harm (such as pro-anorexia communities), there are many aspects of running an online service that can run afoul of legal and social conditions. At the same time, services are also expected to provide and protect free expression, and where necessary to shield users from harm.

Achieving this balance is not trivial. The scope and reach of the internet and the scale at which some of the biggest platforms operate ensures many companies find themselves in the crosshairs of regulators around the world, especially those platforms dedicated to social media. The extent to which these platforms are expected to censor or allow given forms of speech often devolves into a legal and political battleground.

However, one rare point of universal agreement is the prohibition of child sexual abuse material, or CSAM for short. Combating the sharing of CSAM over the internet is rightly a priority for both law enforcement and tech companies the world over.

Despite the shared desire to tackle this problem, the question of how to achieve this is one of the most fraught topics in an already complicated field. And as of late, the spread of end-to-end encryption (E2EE) has further soured the relationships between regulators and the tech sector.

End-to-end encryption is a critical technology that ensures the security and privacy of messages you send over the internet. For example, when you communicate with your bank over the internet, neither you nor the bank want people to be able to see the content of those messages. If those messages are intercepted, we want them to be unreadable to anyone who shouldn’t have access. You also want to be able to verify that it is indeed the bank that you’re talking to!

Encryption allows us to scramble these messages in such a way that it’s nearly impossible to unscramble them without knowing a specific piece of information. Thanks to what is known as public-key encryption and message authentication techniques, it’s possible to safely exchange all the information needed to protect your communications, even with people you haven’t met in person. This allows us to securely communicate with people and services all over the globe.

Under end-to-end encryption, only the sender and receiver of the messages can unscramble the messages that they send to each other. This is essential to the security and functionality of the modern internet; without it, many of the utilities that we take for granted such as banking apps, e-commerce and private messaging platforms would be too vulnerable to function correctly.

However, when it comes to the fight against CSAM, end-to-end encryption poses a problem. In brief, the difficulty of detecting CSAM on encrypted messaging apps such as WhatsApp is that E2EE ensures that there is no point in the messaging pipeline where the contents of the message can be intercepted and examined.

That’s an obviously a necessary property for protecting communications. However, the argument of regulators is that the nature of this protection offers no distinction between cybercriminals and other malicious parties, who have no business reading anybody’s messages, and law enforcement, who might have legitimate reasons to examine a message. And when it comes to legitimate reasons, intercepting and eliminating child abuse is the ultimate argument.

Of course, encryption and privacy have always been a significant battleground in the tech space, and even within governments there are often dissenting views. For example, in the UK, the Home Office has pushed back strongly against Meta’s plans to introduce end-to-end encryption to Facebook messenger, while the Information Commissioner’s Office (itself a UK government body) takes the view that E2EE has a proportionally greater effect in protecting children by preventing criminals and abusers from accessing their messages.

Proposals that try to find a way around this problem have previously been driven by suggestions on circumventing the encryption itself. That could involve weakening or removing the core cryptography by creating “back-doors” for law enforcement use, or storing user’s private decryption keys in a centralised way that is accessible to law enforcement (a “key escrow” method). However, both of these methods introduce unacceptable security flaws in what should be a very secure process, and the dangers of implementing these approaches vastly outweigh the benefits.

Client-side scanning

To try and find a different way around the difficulties of E2EE, a recent push in this sector has come through the suggested use of so-called “client-side” scanning.

The reasoning and methodology behind client-side scanning goes something like this: If we want to preserve the integrity of end-to-end encryption and evaluate messages for illegal content, we have to check messages before they enter the encrypted pipeline.

Under a regulatory regime in which client-side scanning is mandated, companies such as WhatsApp would have to incorporate an approved technology that can detect CSAM into their platforms. This technology, which would notionally be “accredited” by the relevant regulatory authority, would then inspect user messages before they are encrypted and sent.

However, amongst the many major criticisms levelled against client-side scanning is the argument that a technology that can read and evaluate user messages (and, crucially, report on the content) still damages the protection of end-to-end encryption.

Not only does this capability introduce an attack vector that could be used by malicious parties, but it leaves open the possibility for mission creep, in which authorised parties start scanning for content beyond the original remit under the guise of preventing other crimes.

There’s no easy answer to this criticism, and previous industry-led attempts to apply such technologies have led to bruising clashes between tech companies and privacy advocates. Apple’s own attempt to develop and implement a client-side scanning capability for CSAM in iCloud met with such resistance that the plans were shelved after just a couple of months.

Besides the core technical aspect of encryption, which is understood to be essential to securing the modern web, tech companies are well aware of the damage that failing to respect user privacy can do to their business, and are now much more cautious about implementing measures that could be seen to be infringing on user’s rights.

However, this is now putting them on a collision course with regulators and law enforcement, who are seeking to solve the issue by leaving tech companies no other choice but to comply or withdraw products and services from the market. While the full scope of this battle is taking place at the supranational level, an interesting test-case for the impact of such legislation looks set to present itself in the form of the UK’s forthcoming Online Safety Bill.

The content and function of this bill exceeds the scope of checking for CSAM alone, but does include provisions for client-side scanning. If fully implemented, the consequences will be extremely wide ranging and would provide an impactful real-world test-case for the impact of such approaches on user privacy.

Despite the possible utility of this bill as a test-case, it’s also uncertain as to how useful the specific outcome will be. So broad-ranging are the mandatory changes that many companies (including WhatsApp and Signal) have indicated that they would simply leave the UK as a market.

But what alternatives are there?

An alternative to client-side scanning?

Before we move on to specific technical considerations, we have to ask ourselves what an ideal solution would look like to this problem. With that in place, we can then start to consider if it is possible to build this ideal solution using the technologies that exist in 2023, and then move on to the question of whether this is achievable.

When thinking about this ideal solution, we take the following principles as essential:

There’s never a legitimate reason to send CSAM to anybody else, and there is no privacy or data freedom argument that could reasonably extend to the right to send or receive this material.
A user’s privacy and security over all other material should be maintained to the same standard as under contemporary end-to-end encryption. Only CSAM should be detected.

Here’s what an ideal solution would look like under these constraints. For this first pass, we’ll describe things in terms of high level abstractions; we won’t get into specific details just yet:

Ideal solution

We have two users who want to exchange messages (including images) over an encrypted communications channel.

In the middle of this communications channel, we want to construct a system that can detect any CSAM material, regardless of whether or not this material is encrypted. Even on detecting CSAM, this system cannot view or decrypt the message; it can only communicate the fact that the message contains CSAM.

All communications between the two users pass through this system and are assessed. At the same time, the users receive some proof or verification attached to each message (an attestation) that the system is only able to detect CSAM. This guards against the problem of “mission creep”.

This solution would resolve the following outstanding difficulties:

It would halt the ability of criminals to send CSAM over a communications network and provide a mechanism for law enforcement to detect and investigate any such attempt.
It would preserve the same protections offered by end-to-end encryption for all message content besides CSAM.

So with the above in mind, we now ask ourselves: is it possible to design and build such a solution?

Thus far, the hardest question to answer has always been whether it is possible to build a system that can evaluate encrypted information. Under the existing encryption methods used by platforms such as WhatsApp, the answer is no. These encryption methods don’t preserve the underlying structure of the data, so we can’t perform any kind of analysis on the data.

However, the technology that allows for computation over encrypted data does exist. Fully homomorphic encryption, or FHE, is a novel method of encrypting data in such a way that we can perform any computation that we wish to on that data, without needing to decrypt it first.

So with that in mind, let’s take a look at a possible solution in this context.

Possible solution

As before, our two users want to securely exchange messages. For the majority of their communications (text etc), they use an existing E2EE technology such as the protocol that underpins Signal and Whatsapp. However, their image messages are encrypted such that we can apply an FHE-based computation to the contents.

Whenever a user sends an image, this image is first sent to the messaging service’s servers for analysis.

Let us assume first that an algorithm exists that can perfectly assess whether an image contains CSAM. This is still a big assumption, but we’ll come back to re-examine this point later.

Under FHE, we can execute this algorithm on the image without having to decrypt the image. Of course, to maintain the guarantee of only evaluating for CSAM, we also have to provide an attestation that we ran only the correct detection algorithm.

With respect to the actual code itself, we could attest to the validity of the algorithm through a TEE-based approach. However, for the end-user, we can also consider a method of attestation that doesn’t involve some trusted third party. In this sense, the attestation is instead a form of zero-knowledge proof by which the party that applies the algorithm is forced to act honestly, as efforts to apply a different algorithm would be detected by the users.

Here’s a toy model that describes what we’re talking about. Executing the detection algorithm on an encrypted image file will also alter the underlying content of the file in some way, and we may be able to use this change as a way of assessing whether a specific, agreed-upon algorithm was applied.

For example, consider a simple “perceptual hashing” approach to detecting whether an image contains a depiction of something that we want to filter. We’ll use a very basic “averaging” approach as an example. In this method, we:

Convert the image to greyscale
Downscale the image
Convert the downscaled image to a 1D string of bits

The output of this process is a “perceptual hash”, a binary string (represented above as the greyscale bar) that encodes a reduced representation of the image. The above is a very simplistic approach, but is intended to be relatively robust against attempts to disguise the content. Unlike a traditional cryptographic hash, a perceptual hash does not need to match exactly; we can evaluate how close a given hash of an image is to any hash that we hold in a database of forbidden content.

In the context of FHE, we could add some small additional information that is only known to the sender and receiver, which will (under the application of the correct algorithm) change in some predictable way. An example of such a change (using the algorithm described above) is given below.

We could then append this information to the image prior to encryption, generating a hash that contains both the image and this information.

On delivery of an file containing the output of the encrypted algorithm, the recipient could then reconstruct the 2-dimensional version of the hash and thus prove that the correct algorithm has been applied. Because the computation happens under FHE and the actual content is invisible, this prevents the server from acting maliciously by falsifying the attestation.

This is of course quite a very simple approach; more complex approaches that cover a wider range of security considerations could be applied, but this is a basic illustration of the concept.

So, the output of this computation should ideally gives us one or more encrypted files; these contain values that indicate

Whether the image contains CSAM
That the agreed upon algorithm was run

We can then send the original encrypted image alongside the additional encrypted material onwards to the recipient. However, there’s a few more points that we’d need to consider to eliminate potential sources of weakness. We need to ensure that these are addressed.

Under FHE, it is possible to construct a system whereby the secret information needed to decrypt information can be spread across more than one party.

We can make use of this property to ensure that, while the recipient can participate in the decryption of the additional encrypted material (which is necessary for ensuring the attestation is valid), they alone cannot see the result of the analysis in the clear, and only the messaging service is capable of performing the final decryption step. This prevents the recipient from being able to manipulate the response to suggest the message is “clean”. In this suggested protocol, we do not need to trust either the user or their messaging client to be honest.

This also applies to the server; we can have the server perform a partial decryption on the attestation, which can be finalised and completely decrypted by the user’s device, thus preventing the server from seeing or manipulating the attestation.

We should, for the sake of completeness, also periodically rotate the secret keys held by both the server and the client device under this model. This guards against the possibility of the server being able to reconstruct the client key over time by introducing small errors into the hash, and then examining the response (a so-called “reaction attack”).

With respect to the security aspects, the above solution is based on technology and methods that currently exist. However, we now need to consider the practical complexities.

The achievable solution

First of all, we cannot assume the existence of a “perfect” algorithm for detecting CSAM. This would be an algorithm that perfectly detects all such material even under attempts to disguise the image contents, such as changing the image aspect ratio, trimming the image, or introducing specially-crafted noise to produce incorrect outputs from the algorithm. Such an algorithm would also not generate false positives, where legal material (including legal adult material) would be incorrectly identified as CSAM.

In reality, scanning algorithms intended to detect CSAM (including those proposed for use in client-side scanning) are known to produce both false negatives and false positives, and despite efforts to overcome attacks there are still known ways of evading these technologies. This includes attacks against the “perceptual hashing” techniques used to detect instances of CSAM that are already known to the authorities, as well as techniques used to detect novel CSAM such as artificial intelligence networks trained to detect new images.

However, by far the most significant practical limitation thus far lies in the speed at which we can execute algorithms under FHE.

Such a system would be required to work at least at a national scale. As of 2017, WhatsApp users were sending over 4.5 Bn images per day, and that number is certain to have grown since then. Analysing the sheer number of images sent over WhatsApp in a country the size of the UK would already be difficult even if the images were not encrypted; doing so with a technology that is a million times slower when run on a typical CPU than unencrypted computing would be a non-starter.

In fact, what we’re proposing here isn’t exactly new; fully homomorphic encryption has already been suggested by GCHQ as a mechanism for securely implementing approaches to CSAM detection. Despite the notional capabilities of the technology, the conclusion then was that the computational cost of FHE was simply too high.

Of course, there’s other ways of making this both faster and less intrusive. For starters, if we can reduce the number of messages we want to scan, then we could further and significantly reduce the total compute required. This might involve a “simple” solution, such as randomly checking only a handful of messages. While this could at best only catch a sub-sample of abuse images, the knowledge that the system performs random checks might be enough to deter users from sending CSAM over the service.

Alternatively, we could apply a more complex heuristic, such as checking messages sent by users exhibiting behaviours or patterns of communication that might be associated with CSAM networks.

These approaches certainly help, but even random sampling is going to require a massive improvement in FHE throughput. So, if we’re to be able to implement a more ideal solution than client-side scanning, we need FHE to be faster. That’s where we at Optalysys come in.

Ultimately, the technical and social implementation of such a system is a broader subject than we could possibly address in a single article. However, in light of ongoing developments in the FHE space, it’s time to start thinking beyond the blunt instrument of client-side scanning as a solution to this problem.

FHE: An alternative to client-side scanning? was originally published in Optalysys on Medium, where people are continuing the conversation by highlighting and responding to this story.

FHE and machine learning: A student perspective with examples

Optalysys — Fri, 18 Nov 2022 15:08:35 GMT

With Peter Li, Florent Michel and Joseph Wilson

When it comes to both protecting and using sensitive data, fully homomorphic encryption is a technology that offers security and peace of mind. By maintaining full cryptographic security for data at every stage, including during processing, FHE enables data scientists and developers to collaboratively apply common data analysis tools to the most valuable information without the risks that we commonly associate with data sharing.

Applying machine learning models under this model of computing introduces some additional hurdles that need to be overcome. However, modern FHE libraries are now making FHE increasingly easy to use without the need for expert knowledge in cryptography.

To demonstrate this, Optalysys offered an undergraduate internship over the summer of 2022. The goal of this internship was to introduce a student to the world of encrypted data science, and see what was possible given the current state of the field.

Our intern for 2022, Peter Li, was a second-year (now third-year) undergraduate studying engineering at Cambridge University. Peter’s work focused on the deployment of common machine learning models in the FHE space, and his article below covers the application of the Zama Concrete library to several different network models and data-sets.

For data scientists looking to make the transition into working safely with highly sensitive information, this article features examples of the implementation and execution of several different machine learning techniques in encrypted space.

More broadly, if you’re an organisation looking for the next significant step in data science, Peter’s work demonstrates that developing useful FHE implementations is now well within scope for data specialists. When paired with Optalysys technology that allows FHE to be deployed at scale and over large data sets, the market for novel applications and services running on previously untouchable data will be primed for massive growth.

That’s all from us at Optalysys for now; we’ll leave you with Peter’s article below. We’ll be back with our own work on FHE and optical computing in the near future.

Introduction
Privacy enhancement and FHE
Example 1: Logistic Regression — Sklearn Breast Cancer Data set
Example 2: XGBoost model for predicting corporate credit rating
Example 3: Simple Multilayer Perceptron for Wine classification
Bonus section — how accuracy and maximum accumulator bit width requirement scales with quantisation bit width of inputs
FHE limitations and an optical solution for FHE

Introduction

Machine learning is a domain of AI dealing with systems that can learn patterns from existing data, and then use this learnt information to make predictions about new data. Since 2010, incredibly rapid progress has been made in both the fundamental data science and deployment of machine learning, particularly in the associated domain of deep learning. So what is it that has pushed machine learning to become so prominent?

There are many reasons why, but a good starting point would be our own abilities with respect to data. Humans are conditioned to think and work in three dimensions: this means we can readily visualise and understand the relationships between datasets with three or fewer variables.

Unfortunately, many useful sets of data collected in the real world do not fit that criteria. Datasets with tens or hundreds of variables are commonplace in data science, and although there are techniques for doing so, it would be very difficult for humans to understand all the key relationships between variables. And if you were dealing with, say, a multi-megapixel image for computer vision, you’ll be looking at millions of variables. Needless to say, trying to hand-programme an effective general image recognition algorithm would be a hopeless endeavour.

This is where machine learning comes in. The ability of computers to pick up on relationships that are otherwise locked away in complex multivariate data sets, and then make predictions based on that information, is incredibly useful for countless applications spanning many different industries. These include things features we take for granted in our daily lives, such as personalised recommendations given by the likes of Netflix and Spotify, to things we don’t often think about such as analysing market data for automated high-frequency trading.

We often attribute the popularity of ML to recent advances in deep learning. This is indeed true, although it is important to remember that the reason why ML can be used on such a wide scale is thanks to the more holistic improvements in computing infrastructure, as well as the unprecedented abundance of data.

Back in 2012, IBM said that 90% of all data ever generated by humanity was in the past two years. The rate of data creation has only accelerated since then. Data is essential for training machine learning algorithms, and increasingly powerful models have come into existence as we harvest ever more of it.

More powerful models also require vast quantities of storage and compute power, especially in the case of deep learning. More data also means more infrastructure is needed for storage, processing and (secure) transmission of this data. As a result, there has been an increased use of cloud computing for machine learning applications.

The rise of cloud computing, alongside ever improving computer hardware, addresses the problem of high resource demand for machine learning. It allows companies to outsource the intensive computation associated with ML on a ‘pay for what you need’ basis to third-party servers, which save them the capital cost of buying new hardware. With the growing popularity of machine learning, increasing complexities of deep learning models, and ever increasing quantities of data involved in both training and inference, this trend of machine learning moving to the cloud is unlikely to slow down. Various large cloud providers now offer Machine Learning as a Service (MLaaS) that help machine learning teams worldwide with tasks such as data preprocessing, model training, tuning and evaluation, or just straight up Inference as a Service (IaaS). Providers include Microsoft Azure, AWS, Google Cloud Platform and IBM Watson.

A movement to the Cloud comes with privacy and security concerns. For many applications of ML, the data which serves as the foundation for training or making predictions may be sensitive information that the owners would like to keep confidential. An example would be predictive diagnostics in healthcare. If the machine learning model is hosted on the Cloud, then this would involve the hospital sending sensitive health data to a third party Cloud provider. Another example would be predictive analysis of credit score, which would involve sending large quantities of sensitive financial data to a third party provider.

The involvement of a third party increases the risk of data leaks considerably. Even though data is encrypted during transmission, it must be decrypted once it reaches the third party provider where the ML model is hosted, in order for it to be used. This creates a sufficient window for the sensitive data to become compromised, whether through exfiltration by malicious insiders or external hacking attacks. In 2019, there had been over 5000 confirmed instances of data breaches, with billions of personal files leaked. When it comes to Cloud providers given access to large quantities of potentially sensitive data, such risks are not negligible.

There are of course techniques for preprocessing data to obfuscate the confidential parts of it, but these cannot be solely relied upon. Too much restriction, and the data loses much of its utility. Too little, and there is a significant risk of de-anonymisation.

Privacy enhancement and FHE

Fortunately, there are promising privacy enhancing technologies (PETs) in development that offer the prospect of being able to enjoy the benefits of machine learning in the cloud without introducing the risks that lead to privacy concerns.

The biggest game changer in this category is FHE — Fully Homomorphic Encryption. The concept of homomorphic encryption is straightforward to understand: you can perform whatever calculation you want directly on the encrypted data, without having to decrypt it first.

When you do finally decrypt it (usually when it’s back in safe hands, such as with the original data owner), the output is the same as if you had performed the calculation on the underlying cleartext.

As simple as it sounds, this makes FHE the Holy Grail of cryptography: currently we can securely encrypt data when in transit or storage, but not when data is being processed. Much of the vulnerability of data in the cloud stems from this reason, and FHE addresses this gap.

Once confidential data is sent to the cloud, it would remain encrypted for its entire journey, and any computation can be performed on the ciphertext. After the desired result (still encrypted) is sent back to the owner of the data, they can use the secret key which only they have access to, to decrypt it and obtain the desired result.

The idea of homomorphic encryption has been around for decades, since the first work following the introduction of RSA (which is itself partially homomorphic) in the late seventies.

However, though there had been partially homomorphic encryption and somewhat homomorphic encryption schemes through the years, it wasn’t until 2009 when Craig Gentry published the whitepaper for the first ever Fully Homomorphic Encryption scheme. Since then considerable progress had been made by researchers on newer generations of FHE schemes, but there have been technical challenges that kept FHE largely confined to research papers.

By far the biggest of these barriers is speed; current FHE schemes are impractically slow when executed on conventional computers — up to 1,000,000 times slower than the equivalent operation performed in plaintext. Despite these challenges, significant progress is being made in bringing the advantages of FHE closer to mass adoption than ever.

Another key appeal of current FHE schemes is that they are quantum safe due to their mathematical basis in the Learning with Errors problems, which are secure even against known attacks by quantum computers, which most contemporary encryption techniques are vulnerable against.

Optalysys are developing specialist hardware for tackling the speed issues of FHE, combining the capabilities of optical computing and silicon photonics with electronics designed around FHE workloads. Efficient hardware is itself a holy grail for FHE, as conventional computing is simply never going to be well suited to the kind of calculations it requires.

On the developer side, there are a number of increasingly advanced open source libraries that support FHE computation, including those developed by big players in the tech world: SEAL by Microsoft, and HElib developed by IBM, as well as the Google FHE transpiler. We are also particularly interested in the OpenFHE project, spearheaded by Duality, which aims to unify most of the current FHE schemes and features as well as developing new ones such as scheme-switching. The library we use in this article is the Concrete library, developed by the French company Zama.

The key mission of Concrete is to allow developers to easily convert programs to run with FHE. The specific FHE scheme on which Concrete operates is called TFHE; whenever we talk about the properties of FHE in the context of this article, we are specifically describing the properties of TFHE.

The core TFHE library is written in Rust, but they also have two Python APIs: Concrete Numpy, which converts regular Numpy operations into equivalent FHE circuits; and Concrete ML, an API designed for machine learning developers, that has the capability of converting entire ML models directly into FHE.

In the rest of the article we will focus on the framework Concrete ML, and how it offers a glimpse into a future where data can be passed to machine learning models without ever compromising privacy.

Despite still being in the alpha stage of development, Concrete ML already proved remarkably easy to use. It allows a user to convert entire machine learning models written in the popular ML frameworks of Scikit-learn and PyTorch directly into their FHE equivalent with sometimes only a few extra lines of code.

Machine learning is often split into the categories of supervised, unsupervised and reinforcement learning. It is worth noting that Concrete ML currently (version 0.2, but version 0.3 has since been released) only supports supervised learning, where the model is trained using labelled data.

Here, we put together three simple machine learning models that tackle different problems that reflect real world applications for FHE. All three of the problems fall under the category of classification problems, including both binary and multiclass classification. We will present one example from each of the below categories:

Linear models: Logistic Regression
Tree-based models: XGBoost
Deep-learning: Fully Connected Neural Network (MLP)

Example 1: Logistic Regression — Sklearn Breast Cancer Data set

This is a very simple dataset available to be readily imported from the Scikit-Learn library, used for testing ML algorithms on binary classification. While a very simple model on a small dataset, it links closely to an area where we see great potential for FHE deployment: diagnostic analytics.

Machine learning models have become increasingly apt at predicting the likelihood of a patient having particular diseases, especially with the ever increasing wealth of medical data available. However, medical data also tend to be confidential in nature, and there are increasing privacy concerns as patient data becomes more and more digitised.

Below we demonstrate how FHE can be used to encrypt data when passed through a machine learning model for inference. For the sake of brevity, we’ll only focus on certain operations; an associated Jupyter notebook can be found here. We’ll start with our pre-processing of the Scikit-Learn Breast Cancer dataset.

https://medium.com/media/da61b4c281316e650966c4d195b0d837/href

Data pre-processing is perhaps the most important and time-consuming part of building a machine learning workflow. The breast cancer dataset has 30 explanatory variables, many of which give negligible contributions to model accuracy. Therefore, using the forward feature selection tool from the library Mlxtend, we identified lists of the most helpful features, from 1 to 20 in length.

The negative logarithmic loss scores are as follows, worked out using Sklearn’s regular logistic regression function:

https://medium.com/media/bb59030681ee9d56c2bae146d70c98cf/href

We can see that once we have identified 10 of the best features, adding additional features only give very marginal gains in model accuracy, and hence we proceed with 10 explanatory variables for the FHE model.

In the below, we use these explanatory variables to construct 3 versions of a logistic regression. The first is a “typical” unencrypted example. The second and third are, respectively, a version in which the data is quantised to the same precision as can be achieved using TFHE (but not encrypted), and a version which executes the logistic regression using Concrete ML.

https://medium.com/media/330b0b8b336dc0ecd3dfcaca648b7f8b/href

The outcomes are as follows:

Regular Sklearn model accuracy: 94.7368%
Clear quantised model accuracy: 91.2281%
Homomorphic model accuracy: 91.2281%

First, we will briefly talk about what is meant by ‘Clear quantised model’. One constraint of modern FHE methods, including TFHE — which the Concrete framework is based upon — is that the internal calculations are constrained to operations over specific numerical representations. This includes integers (BGV, BFV) and fixed-point values (CKKS). This may sound like a problem; being able to use efficient number representations to represent the parameters and inputs for a given model is something we take for granted in unencrypted computing.

The examples shown here demonstrate that this is not necessarily the case — we can constrain a continuous, or otherwise very large set of numbers to a much smaller, discrete set by quantising it using a certain number of bits. This is an encoding step that maps numbers represented in one way into an equivalent representation that we can work on under a given FHE scheme.

At the time of writing, TFHE libraries (including the Concrete framework) are limited to quantising values using 8 bits. This limit applies to input and output values, and also to any accumulators that hold intermediary results during the process. The higher the quantisation bit width, the better the precision, and also the more expensive the calculations.

In our example, we specified that 5 bits are to be used for discretising the inputs, and only 2 (4 different values!) used for the weights within the model. The ‘Clear quantised model accuracy’ line in the above is a way of testing the accuracy of a model under the same parameter quantisation as in an actual FHE process, but without encrypting the values — in other words, it is a plaintext simulator for testing how quantisation affects model accuracy. As we can see in the above, the discretisation only lowered the test accuracy by 3.5%.

The ‘Homomorphic model accuracy’ is obtained from inference on test data actually encrypted using FHE. The accuracy is identical to that of the clear quantised model.

The overall drop in accuracy of the model run in FHE is still minimal when compared to the regular Sklearn logistic regression. This error would be reduced even further if a greater numbers of bits could be allocated for quantisation, and an increase in bit-width is an expected future development in Concrete.

However, despite the constraints, the above demonstrates the contemporary practicality of FHE: while staying well within a 8 bit limit, we are still able to achieve model accuracies comparable to operations on the plaintext.

Example 2: XGBoost model for predicting corporate credit rating

Now we tackle a multiclass classification problem using the XGBoost model implemented by Concrete ML, based on Sklearn’s XGBoost implementation. This reflects another area where FHE sees great potential for deployment — Finance.

Financial data, whether it belongs to an organisation or an individual, also tends to be extremely confidential in nature. Finance is also a prime sector for the deployment of machine learning technology: hedge funds and trading firms may use market data to predict asset price trends; banks may use data collected on their customers to predict their likelihood of defaulting on a loan, and fintech companies may use its power to flag up fraudulent payments. It is easy to imagine the revolutionary benefits FHE could bring to finance by eliminating a huge portion of the risk and effort involved in maintaining data security and privacy.

This time, we’ll be shadowing the work of Roi Polanitzer in using multi-class classification for corporate credit ratings. The objective of the original work was to evaluate models that predict the likelihood of a company defaulting on a loan.

Here, we’ll be applying an XGBoost algorithm to the task of predicting defaults based on common financial metrics such as cost-to-income ratio.

Of course, in the real world this information is also highly sensitive; if this dataset were leaked, it could provide competing companies with insight into each other’s financial conditions. It would therefore need to be handled with the utmost care by the entity handling the loan, a requirement that is greatly enhanced by the ability to perform this computation over encrypted data.

We’ll be using the same dataset of corporate credit ratings as in the original article, which has kindly been provided here. As before, our associated Jupter notebook for the full process can be found here. Again, we start with the necessary pre-analysis to determine the properties of the dataset.

https://medium.com/media/4e792a5822345436e0fbcc84dff328b0/href

Taking a peek at the layout of our data:

https://medium.com/media/d6fa2ee4c16c7022ae654586deb84475/href

This verifies that there are 500 rows in the dataset for each of the 10 rating categories. We continue with data preprocessing, and performing a grid search, tuning the hyperparameters of the Sklearn model to optimise performance. We perform the same procedure on both the regular XGBoost by Sklearn, and Concrete ML’s implementation for its homomorphic equivalent. Here, we are using 4 bits for quantisation for the FHE model.

https://medium.com/media/bb1ca31f8f799a0b53fb7aec7397838b/href

Using 4 bits for quantisation, we achieved the following:

Accuracy score of clear model: 0.58
Accuracy score of clear quantised model: 0.46
Accuracy score of FHE model: 0.46

When the number of bits for quantisation is increased to 8, we get the following:

Accuracy score of clear model: 0.5
Accuracy score of clear quantised model: 0.5
Accuracy score of FHE model: 0.5

While 50–60% may not sound great (the original article saw around 80% accuracy for gradient boosting techniques), for a balanced dataset of 10 classes this is sufficiently accurate for our purposes in examining the performance of an FHE compatible XGBoost model built in Concrete.

We first note that the accuracy obtained from the clear model is different in both cases. Indeed, under the larger quantisation, the accuracy of the model drops.

The source of this behaviour comes from the use of a smaller testing set (Only 50 instances are chosen from the test set of 1667) — this is done so that the FHE model can run in reasonable time on conventional hardware. The regular XGBoost model is tested using the same small random sample to ensure fair comparison.

Using a larger training set would improve on this accuracy. Given the agreement between the clear model and the FHE model and the source of the accuracy issues, we would expect to achieve something similar to the original work in both cases. However, this would also require considerably faster processing if we wanted it in a practical time-frame. This is of course one of the many motivations for the work that Optalysys are doing.

As in the previous section, a specific result that is especially worth noting is the effect of quantisation bits on accuracy for the FHE model. With 4 bits, we saw a 20% decrease in accuracy of the FHE model compared to the unquantised model — from 58% down to 46%.

When we used 8 bits for quantisation, the FHE accuracy is exactly the same as the regular plaintext version (with a resolution of 2% of course, since only 50 data points are used in the test sample). Once again, there is no noticeable decrease between the quantised plaintext accuracy and that of actual FHE, a reassuring sign.

Example 3: Simple Multilayer Perceptron for Wine classification

In both the above examples, the FHE implementations of machine learning algorithms used were based on the Scikit-learn’s API, which make it easy for data scientists or machine learning engineers who were familiar with the framework to get started with Concrete ML.

Concrete ML also has a PyTorch API — that enables models defined using PyTorch to be compiled into FHE circuits that can run inference on homomorphically encrypted data. Being a deep learning framework, the PyTorch API makes it easy to build well defined neural networks that also support FHE. Here we have a simple example that uses the Wine Classification dataset that is readily available from Scikit-learn. It contains 13 attributes on the levels of different chemicals’ presence in wine, and the target comprises 3 classes corresponding to 3 different wine cultivators in Italy, where it was collected.

https://medium.com/media/dc4375da93a75c545dbab01b9e19f503/href

The above code includes the definition and training of the simple PyTorch neural network.

Currently, FHE compatible neural networks can only be trained in plaintext. This is because of the huge amount of computation needed. In this case, because the network is quite small and the dataset is fairly straightforward, less than 300 iterations, each involving a batch size of 50, is needed before 100% accuracy is achieved on further training batches.

Modest neural networks used on more realistic datasets often require dozens of epochs with thousands of iterations per epoch before being trained properly. With the speed limitations of current FHE implementations and hardware, training directly in FHE would be unrealistic without Optalysys hardware, although there are certainly privacy use-cases where this would be desirable.

However, the spotlight for homomorphically encrypted machine learning is on encrypted inference — as real machine learning models can often either be trained using publicly available, or otherwise non-confidential, data, or make use of differential privacy and federated learning techniques to hide the training data.

Below, we compile the neural network into a FHE circuit, and compare its FHE test accuracy to its plaintext counterparts. 3 bits are used for quantisation.

https://medium.com/media/c6ab7f135b9dcff2bf3197b342949bdd/href

The resulting accuracies are as follows:

Test Accuracy: 97.78%
Test Accuracy Quantized Inference: 93.33%
Test Accuracy Homomorphic Inference: 93.33%

Despite being a small network, it is sufficient to achieve a very good accuracy of nearly 98% on the test set. The reason why only 3 bits are used for quantisation, is that the 8 bit limit applies also to the intermediary accumulators within the model, and the size of those intermediary values often greatly exceed those of the inputs. Therefore it is wise to choose a modest bit width, which in this case also achieved excellent results: the accuracy drop from regular inference to TFHE was about 4.5%, which still leaves the TFHE inference accuracy at above 93%; there is also no accuracy drop between the quantised plaintext inference result and that of the TFHE inference — showing that the large FHE ciphertexts perfectly preserved the integrity of the plaintext.

Bonus section — how accuracy and maximum accumulator bit width requirement scales with quantisation bit width of inputs

In the PyTorch example, we mentioned how the size of the intermediary values stored in accumulators while the neural network is running inference often greatly exceed that of the inputs. Here we will briefly look at the implications of this on TFHE, and the prospect of running larger deep learning models in TFHE in the future.

We are going to simulate a larger neural network for image classification, trained on the Fashion MNIST dataset, running in TFHE. Due to the size of the images — 728 pixels each — making the input size 728, the model is only capable of running inference in TFHE if 2 bits are used for quantising the inputs. This leads to an expectedly poor accuracy, of only 14%, down from a plaintext accuracy of nearly 90%.

Any larger bit width results in error, as some of the intermediary results end up requiring more than 8 bits to represent — exceeding TFHE’s current limit.

However, using the Virtual Lib tool from Concrete ML, we can simulate running the TFHE circuit in plaintext, with quantisation bit widths that exceeds the current limitations, and obtain results on how the accuracy, as well as how the maximum required accumulator size for storing intermediary values scales with the quantisation bit width.

This is important, as the future development pathway for TFHE incorporates higher bit precision. This tool thus aids understanding in the applications where TFHE will be useful in the future.

Below is the code and the results obtained; the associated notebook can be found here.

https://medium.com/media/a2fd2a73a51db47967e392a0bfd25e9e/href

The result obtained is shown below:

The plaintext model has an accuracy of 86.4% in this case. As the quantisation bit width (shown on the X-axis) increases, the simulated TFHE model accuracy on the test set (y-axis) increases in a logarithmic growth pattern. We see that from 7 to 8 bits upwards, the quantised model accuracy approaches the asymptote that is the actual model accuracy — which uses 32 bit floating point representations. This shows that excessive accuracy is often not necessary for image recognition, and demonstrates the suitability of TFHE for similar deep learning applications.

The small numbers annotating each datapoint are the maximum accumulator bit widths — in other words, what the upper bit width limit for representing values in TFHE must be, for us to be able to run the model at that precision. We see that they greatly exceed the quantisation bit widths by orders of magnitude. It is not hard to see why: in this case we have 728 inputs, and therefore the neurons in the hidden layer would need to represent the sum of 728 values. The current limitation of 8 bits is too low, but this limit is bound to increase in the future, especially as better hardware for FHE becomes available.

FHE — limitations and an optical solution for FHE

The above examples illustrates that in principle — machine learning can be combined with FHE with astounding ease. Having spent most of its lifetime confined to esoteric research papers done by highly specialised experts, recent open source FHE libraries such as Concrete, are testament to this technology becoming ever more accessible to the masses. Using simple APIs as demonstrated in this article, data scientists and machine learning engineers with no prior knowledge of cryptography can very easily transpile statistical analysis/machine learning models into FHE compatible equivalents. We also bear in mind that Concrete ML is still in its alpha stages of development.

Of course, the current limitations of TFHE are also evident. The current 8 bit limitation of the Concrete library meant that the level of homomorphically encrypted machine learning available currently is restricted to simple models on relatively easy datasets. The loss of accuracy due to quantisation — from having to abide by this limit — is also a concern for slightly larger models.

However, in practice such ‘simple’ models are used in many, if not most, practical applications of Machine Learning. While very large models like GPT-3, Dall-E, and Stable Diffusion grab headlines by demonstrating features which were until recently firmly in the realm of science fiction, Machine Learning is also extensively used in Health Sciences, Finance, and Engineering to solve smaller, but high-value, problems for which a logistic regression, tree-based approach, or a very small neural network, is the perfect tool. (This is especially the case when the amount of available training data is small.) As we demonstrate above, FHE already provides results which are close or identical to those of non-encrypted models in these cases.

In order to look at the future potential of FHE, we must look at the root cause of its remaining limitations — the fact that conventional computing hardware is simply not optimised for the specialised type of computation necessary to run FHE. This is due to the nature of the ciphertexts used in most modern FHE schemes, where values are represented by huge vectors containing the coefficients of large polynomials. The number of multiplications involved when multiplying two such ciphertexts together — a very common operation — would result in an unreasonably large number of calculations that would be very slow on most modern computing hardware.

As a result, modern FHE libraries use Fourier or Number-Theoretic Transforms to convert these ciphertexts into a form that is easier to multiply — but this is still extremely slow using most conventional electronics. Hence the main drawback of current FHE technology — speed; the same calculations performed on fully homomorphically encrypted data take up to a million times longer compared to plaintext. This also explains the limitations on precision for Concrete ML: any more bits, and the calculations would become unacceptably slow.

In order to level up the scale at which FHE can be deployed, we need better hardware specialised for FHE, and this is where companies like Optalysys come in.

Optalysys has the key to unlocking the power of FHE, by leveraging the power of physical phenomena using optical computing. Existing optical modulators are capable of encoding data into light, and operate in a ultra-fast timeframe greatly exceeding that of conventional integrated circuits. Under the right conditions, light will instantaneously perform a Fourier Transform on the data encoded into it by undergoing the natural phenomenon of diffraction. By constructing optical circuits on silicon, and integrating with conventional integrated circuits specifically designed to work natively with FHE ciphertexts, we can make use of this remarkable natural phenomenon, to compute the vast quantities of Fourier Transforms and related operations (such as Number-Theoretic Transforms) required for FHE operations at a rate that conventional accelerators such as FPGAs can’t hope to match.

As mentioned previously, the technology of FHE is considered to be the Holy Grail of cryptography, for the revolutionary impacts it would yield for cybersecurity, effectively eliminating the risk posed by data theft. In a world where rapid advances in fields such as cloud computing and AI often seem to be at odds with privacy and confidentiality, FHE holds the potential to reconcile this conflict — enabling businesses and consumers to take advantage of the benefits of technologies such as machine learning, whilst ensuring their data remains more secure than ever.

Appendix: Python Notebooks

FHE and machine learning: A student perspective with examples was originally published in Optalysys on Medium, where people are continuing the conversation by highlighting and responding to this story.

FHE, data sharing and confidentiality

Optalysys — Mon, 14 Nov 2022 16:36:57 GMT

With Joseph Wilson and Florent Michel

The value of data is well recognised, but what does it look like to actually realise this value? Data is often bought and sold in such a way that you can put a nominal price tag on it. However, the value of data is only truly revealed when you can use it to make good decisions.

Whether you’re figuring out who to advertise to, deciding on capital allocation for business projects or attempting to find correlations between lifestyle factors and illness, the unifying aspect of data is that you need to know something about the problem that you’re targeting. This might be knowledge about the preferences of individuals, the performance of competitors, or how active certain people are.

Yet the data that is most useful to you is often in the hands of others. Data about your interests is frequently scattered across a fractured landscape of different services, organisations and authorities. This is why collaboration and data sharing between businesses is becoming an increasingly important part of the modern world.

At the same time, the most valuable data is often the data that can have serious implications in the wrong hands. At the far end of the spectrum of consequences, a failure to manage sensitive data could ruin lives or bring down companies. It could expose the most personal information imaginable to a world of bad actors. That’s why this data is almost always confidential; it needs to be kept out of the wrong hands and shared only with organisations and people that you trust.

However, when companies collaborate on data, there’s a tension between the benefits of collaboration and the risks to confidentiality.

Data has value because it relates to something. Remove that relationship in the interests of protection, and you remove the value. Preserve that relationship, and the risks associated with exposure run higher.

But what if there was a way of resolving this tension? What kind of a world would we see, and what would this mean for businesses and organisations?

What happens when data is valuable?

As an example of just how powerful the ability to collaborate and share data is, consider the COVID 19 pandemic.

An oft-commented aspect of the vaccine development process is that it typically takes many years, yet the first vaccines were developed with incredible speed.

This speed was in part a consequence of an understandable acceleration in the regulatory processes that govern clinical trials. This in itself is no surprise; faced with collapsing economic output and social unrest, the severity of the pandemic made finding an answer a top priority for governments around the world.

However, the vaccine development push was also characterised by a comparatively unheard-of degree of cooperation between companies and other organisations. Writing on the subject in the journal Vaccine, Druedhal et al identified 93 vaccine development partnerships (collaborations featuring more than 2 distinct organisations) of which 35 explicitly engaged in the sharing of knowledge and expertise.

This includes the partnership between BioNTech, Fosun Pharma and Pfizer that was responsible for what is by far the most widely produced and distributed vaccine.

What makes the Pfizer (and related Moderna) vaccines particularly interesting is the use of a novel approach to vaccine technology. In the case of the Pfizer-BioNTech collaboration, BioNTech provided novel mRNA vaccine candidates while Pfizer provided the supporting infrastructure (including clinical trial, legal and manufacturing capabilities) that were essential to proving the capability of the vaccine and delivering the result at scale.

The development of the Pfizer vaccine is notable not only for the extent and speed of the collaboration, but also for the quantity of unique and valuable information that needed to be shared. This not only included the core biomedical understanding of how the vaccine functioned, but also the data proving that it worked.

However, in the earliest stages of the pandemic, BioNTech and Pfizer were even sharing information without a formal agreement in place.

This was in itself quite unusual. Companies often collaborate, but the process is typically lengthy and features extensive negotiations regarding aspects such as intellectual property, royalties, and of course the legal implications if confidentiality is violated.

Even then, only information that must be shared is exchanged.

Confidentiality vs collaboration

The reason for this is as much about the security of information itself as it is about trusting the other party. Digital systems and telecommunications networks have made the transfer and duplication of information a trivial task. This is mostly a good thing, but the ease with which data can be shared has always been understood to carry enhanced risks too.

The readiness with which digital information can be copied, coupled with the difficulty of protecting complex modern computing systems, makes important information comparatively easy to steal.

As a result, the world is well understood to be a hazardous place for valuable data. Quite aside from the ongoing churn of breaches and thefts committed by criminal gangs, reporting on the early stages of the war in Ukraine has been dotted with recurrent concerns about the potential for Russia to lash out with state-sponsored cyberattacks.

While this particular threat has yet to materialise to the extent that was originally predicted, the global cybersecurity threat landscape is extremely broad, ranging from malicious individuals through to state actors with massive resources at their disposal.

And generally speaking, the more valuable the data, the more effort that these threats will put in to acquiring it. If the data you have is valuable enough to come to the attention of the biggest threats, maintaining the security of that data can be extremely hard. Early knowledge in the vaccine sector (especially of the outcome of the clinical trials) would have been incredibly valuable; PWC estimates that up to $110 billion dollars worth of investor money was dependent on the outcome.

And so it is that another notable aspect of the story of the COVID-19 vaccine is the scale of the espionage efforts that were launched against researchers, including the Pfizer-BioNTech-Fosun Pharma work. Accusations of state-backed hacking targeting vaccine data have been levelled against China, Russia and North Korea, albeit to strenuous denials.

However, while large states have at their disposal considerable technical expertise, state sponsored hacking doesn’t necessarily entail sophistication. In the case of the COVID 19 research attacks, Microsoft identified brute-force login attempts and social engineering methods as major attack vectors.

These seemingly crude methods nevertheless highlight the particular problem with collaboration. Even if you trust who you’re collaborating with, the more organisations that have access to your commercially sensitive data, the greater the attack surface and the less you can do about it. All it takes is one weak link in the chain and confidential information can be lost.

Large companies can not only spend significantly on information security, they can directly enforce working policies and protocols to protect data.

This is the principle that these attacks were leveraging. Why put time and effort into more creative or risky techniques when you can simply roll the dice over and over again? You only need to be lucky once; if enough external organisations are admitted into the circle of trust, the ability to control data access becomes more and more limited, and the odds of success climb dramatically. The speed and scope of COVID 19 vaccine collaborations will have made ensuring confidentiality across the full chain of companies, suppliers and even individual academics especially difficult.

Sharing data outside of the security environment of an organisation relies on everyone having sufficient capability to protect that information. If they don’t, information can only be kept safe insofar as the least-capable participant is able to protect it.

The value of data

However, it needn’t be this way. Novel technologies offer us a new model of collaboration, one in which information can be shared and worked on while simultaneously shutting out existing attacks. Here’s why that’s revolutionary.

When it comes to medical research, the current balance is always one of confidentiality against utility. Even identifying potential participants for a clinical trial requires the ability to search over characteristics that can be intensely sensitive, such as age and ethnicity, or medical characteristics such as cardiac function and mental health.

This is some of the most sensitive data that can be held about a person, and on a global level the confidentiality of this data is often enforced by regulation, such as GDPR in the EU, or HIPAA and the HITECH act in the US.

Trust is therefore critical to the process, yet the landscape of scientific research (and who carries it out) poses its own challenges. While it could be inferred that most people would support the sharing of data for social benefit, clinical development is often for commercial gain, and the entanglement between public, private and third-sector organisations poses difficult questions about who can access this data, how it is transferred and handled, and the validity of its use.

At the same time, the ability to safely gather, share and use this information is without question. Data sharing platforms such as Vivli allow data from medical trials (including the Pfizer-BioNTech COVID 19 studies) to be shared, increasing the amount of scientific knowledge that can be extracted from one or more datasets. Meanwhile, companies such as Huma are leveraging mobile platforms to revolutionise the speed and efficiency with which clinical trials can be set-up, run and evaluated.

FHE as a confidentiality solution

The seemingly intractable problem here lies in severing the link between sensitive data and the individuals who generated it. Existing methods of doing this tend to rely on forms of data anonymization to strip away information that might be useful in associating sensitive information with specific individuals. However, not only does this result in the loss of potentially useful data or accuracy (e.g removing or aggregating aspects such as address, or adding statistical noise to the dataset), but anonymization itself is not sufficient to guarantee that this process cannot be reversed, especially in an age of widely available computing power.

Fully homomorphic encryption offers a solution by enabling blind computation. This allows the generation of a statistical analysis (be that relatively simple, or a more complex process such as executing a deep learning model) without ever needing to work on the dataset in the clear. Because the underlying data is never decrypted, the individual data-points that could be used to de-anonymize an individual can’t be seen at any point.

FHE also supports models of secure multi-party computing, allowing organisations to share information and gain insight into data without revealing individual elements of data that could be confidential.

Under this use model, companies can share data without ever exchanging information in the clear. Centralised data silos can now allow data to flow more freely, unlocking insights and ushering in a new era of information utility.

Indeed, FHE even has applications in mobile device contact tracing that overcome some of the known problems in existing techniques. Through blind computation over the proximity and time duration between devices, users can be notified of their exposure through a process that never reveals any identifying data.

Confidentiality at scale

But why stop here? Medical research is an exceptionally valuable application, but the capabilities that FHE gives us are not limited to this field.

What if every organisation could now work with a high level of data sharing and still retain control over critical information?

In short, what if the incredible rate of progress made in developing the COVID 19 vaccine could be the new normal across every industry, and not an exception?

This would require a world in FHE is available to every enterprise, not just those dealing with the most sensitive information. Achieving this vision means realising a world in which FHE is deployable at scale.

It’s worth noting that the notion of scale invoked here is quite flexible. At the top end, a world truly transformed by FHE is one in which everything that needs to be kept secret online is encrypted as a matter of course. However, on an industry-by-industry basis, there is still a significant amount of commercial value to be realised by enabling even a comparatively limited amount of secret computing.

What is undeniable is that the ability to realise the bigger vision is no longer a purely academic question. In the 13 years since the first demonstration of a fully homomorphic encryption scheme, an astonishing amount of progress has taken place to make FHE faster, more efficient, and applicable to a greater range of problems. Work continues apace, with an emphasis now on making FHE more accessible to end-users and non-experts. The sole limiting factor to reaching the very top of the scale is now a matter of engineering, overcoming the problems of speed inherent to executing FHE on existing computing architectures.

At Optalysys, we are solving this last challenge. By tackling a key mathematical operation in FHE using our optical computing technology, we can deliver enormous acceleration while minimising the inefficiency of existing hardware in performing FHE tasks. However, while the core optical technology we have developed is essential to a solution, we also understand that it isn’t the only piece of the puzzle. That’s why we’ve engineered a complete answer to the technical needs of FHE, one that not only brings a new method of computing to bear on the problem, but can be deployed at the scale needed to make these opportunities a reality.

Engineering this solution is not a trivial task. It involves translating a wide range of FHE schemes intended for different use-cases into a hardware architecture that is flexible and powerful enough to support a range of different FHE workflows.

This is the purpose of our Enable product development that we described in our last article.

Enable is designed from the ground-up to be an FHE accelerator that can provide universal acceleration for every contemporary FHE scheme.

This single solution is intended to allow FHE to be deployed at the scale required in the cloud, and alongside development of our core optical chip technology will be the focus of our efforts over the next 18 months.

FHE, data sharing and confidentiality was originally published in Optalysys on Medium, where people are continuing the conversation by highlighting and responding to this story.

Optalysys: What we’ve done (And why we did it)

Optalysys — Mon, 10 Oct 2022 16:18:03 GMT

When we launched this blog in early 2021, we opened with a description of our technology and our reasons for pursuing this kind of optical computing. Since then, we’ve gone on to demonstrate the application of Fourier-optical computing across many different technical fields, from artificial intelligence through to wave propagation problems.

The rationale for optical computing that we presented in our original article has not changed. The optical Fourier transform continues to offer an incredibly powerful solution to otherwise intractable computing problems. However, the specifics of how we implement our technology have undergone some slight revisions since our first article.

These revisions are driven by the technical and commercial needs of a very specific application. While many workflows can benefit significantly from the development and deployment of a high-performance Fourier transform co-processor, it is in the field of encryption that we have found a significant opportunity to unleash the full potential of optics.

As we move forwards with developments in this revolutionary new field of cyber-security, now is a good time to revisit what it is that Optalysys does and consolidate updates on how our technology has been shaped in the context of this development.

The Optical Fourier Transform

The optical Fourier transform (OFT) has always had extremely attractive properties for solving Fourier-based calculations. As things stand, all-digital implementations of the fast Fourier transform (FFT) algorithm run into fundamental limitations to the speed at which they can run.

Modern advances in CMOS electronics simply can’t provide the advances in raw transform calculation speed that fully homomorphic encryption (FHE) demands, nor are conventional computers well suited to executing FHE algorithms. By contrast, performing these critical operations using the Optical FT benefits from incredibly low execution time.

Under the optical FT, individual sub-elements of a transform calculation are performed at the speed of light. This optical calculation also consumes no direct power, and produces little to no heat.

However, using this remarkable capability as part of digital computing requires an efficient way of interfacing electronic and optical computing systems. Solving this problem is by far the most significant aspect of the hardware development work that we do at Optalysys.

Our approach uses the medium of silicon photonics to integrate both the optical processing stages and supporting digital infrastructure on a single chip-scale die. Not only does the use of silicon photonics provide access to extremely high-speed optical modulation technology, but on-die optical-electronic integration allows us to take advantage of contemporary techniques in the design and construction of multi-chip modules.

By designing our photonic processing chips to interface with conventional electronic processing cores over next-generation die-to-die interconnects, we can unify enormous transform processing power with the incredibly high data transfer rates that are needed to transfer the calculation results to electronic processing systems.

In terms of our commercial intentions for this technology, we are now fully focused on developing a system that provides massive acceleration for fully homomorphic encryption (FHE). This involves an expansion in the scope of our proprietary hardware to incorporate fully-digital technologies that support the unique requirements of FHE computing. However, the core advantages of this solution are as ever provided by the unique capabilities of optics.

Fully homomorphic encryption and Optalysys technology

The optical Fourier transform gives us a method for calculating mathematical transforms at speeds that cannot be matched by electronics. This is extremely valuable for FHE, as between 70–90% of the total calculation burden is taken up solely in calculating the transforms needed for more efficient polynomial multiplication.

FHE introduces a set of unique challenges in computing, and we’ve had to find solutions to many of these problems. We won’t go into extensive detail on these problems or how we have addressed them (we’ll cover some of them in later articles), but here’s a quick summary.

Given the data type requirements of FHE, we have transitioned from developing a 2-dimensional optical computing process to a 1-dimensional process.

Rather than ejecting light from the surface of our silicon photonic system into an external free-space optical environment (as described in our early article on free-space optics), we now perform the core optical Fourier transform in a free-space optical environment which is constructed in the plane of the silicon die.

Visualisation of the silicon photonic Etile chiplet for 2-dimensional transforms (left, now deprecated) and 1-dimensional transforms (right). The 1D free-space transform stages are now structures integrated into the silicon photonics itself.

FHE requires large, high-precision transform operations.

In past articles, we have previously described how we can use small, low-precision optical transforms to reconstruct larger transforms at higher precision.

While these articles were written in the context of a 2-dimensional system, the same principles hold for the 1-dimensional approach. To support efficient reconstruction of transforms of sizes that are powers of 2 (as is typical in FHE), we have adjusted the design of our free-space optical transform stages to process 4 data points at extremely high speed.

We can construct multiple such transform stages on a single die. The speed at which these photonic cores work, coupled with the ability to leverage parallelism, makes exceptionally short work of the transforms needed for FHE.

Supporting the full range of fully homomorphic schemes requires that we support two distinct yet related transform operations; the fast Fourier transform (FFT) and the number-theoretic transform (NTT).

While designing an electronic system that is efficient for both of these tasks is difficult, the design and function of of the 4-input optical transform stages described above actively supports the easy calculation of elements of both the FFT and NTT, allowing us to switch between the two without incurring a runtime penalty.

FHE features a relatively simple set of mathematical operations that can be performed on encrypted data.

From these simple operations, we can construct arbitrary computing capabilities. However, conventional processing hardware is designed to work on plaintext data structures such as floats and integers and to a degree of precision (64-bits in modern architecture) which is often not suitable for FHE, introducing additional limitations and bottlenecks. This requires dedicated logic designed around FHE ciphertext data structures.

The Optalysys Enable FHE accelerator

Accelerating transform operations for fully homomorphic encryption is hugely valuable and is fundamental to our plans for bringing FHE to mass scale and adoption. However, while the speed-ups that can be provided solely from photonics are significant, FHE still presents an extreme challenge. Right now, depending on scheme and application, FHE is approximately a million times slower than unencrypted computing. If we’re going to bring FHE to scale, we need to be even faster.

FHE runtime by approximate proportion of calculations using contemporary all-digital methods

Of course, if we reduce only the runtime of the transform operations, the other operations become (proportionally) a much larger part of the processing, so the overall acceleration that can be achieved solely by accelerating transforms is limited by these other operations.

Accelerating only the transform operations will eventually yield diminishing returns relative to the total computational load. While the major benefits of Optalysys technology come from our optical approach to transform operations, we realise the need for effective supporting hardware to tackle the non-transform operations. This includes very large vector operations such as addition and modular reduction. Combining these two solutions will yield a highly effective FHE acceleration platform.

Providing a solution for fully homomorphic encryption that addresses the enormous runtime disparity between encrypted and unencrypted computing thus requires more than a single technology. It requires that we tackle both the transform processing and provide an efficient solution for the other operations.

This is the objective of the FHE solution now under development at Optalysys. This system is called Enable, and it pairs Optalysys photonic computing chiplets with logic dedicated to both the management of transform data and the supplementary operations that are required for greater levels of FHE acceleration.

Visualisation of an Enable Multi-Chip Module (MCM). The smaller chiplets towards the edge of the MCM are Optalysys Etile units that provide high-performance FFT and NTT transforms, while the large central chips feature custom logic for handling the enormous data flow from the Etiles and other FHE operations such as addition and modular reduction.

Optalysys Enable brings mass acceleration to FHE. By targeting both transforms and other operations, the objective of the Enable system is to ultimately bring a factor of 10,000x acceleration to all FHE schemes in a highly scalable form factor.

To learn more our Etile and Enable systems , you can visit our site for an in-depth overview of the advantages of Optalysys technology in FHE. Meanwhile, in subsequent articles, we will be demonstrating world-first demonstrations of FHE applications and techniques that make use of the core optical principles of our systems.

Optalysys: What we’ve done (And why we did it) was originally published in Optalysys on Medium, where people are continuing the conversation by highlighting and responding to this story.

Max, min, and sort functions using Programmable Bootstrapping in Concrete FHE

Optalysys — Thu, 24 Mar 2022 13:12:02 GMT

With Florent Michel, Joseph Wilson and Edward Cottle.

TL;DR

In this article, we show how to compute the maximum and minimum of an encrypted array of data without decrypting it, using Zama’s Concrete implementation of the TFHE programmable bootstrapping. We then use this ability to design a sorting algorithm which can work on encrypted data. Finally, we comment on the potential for optical acceleration, showing that Optalysys technology can accelerate the main bottleneck by more than 100×.

Introduction

Fully Homomorphic Encryption (FHE) allows for arbitrary calculations to be performed on encrypted data, thus removing the need to decrypt it for analysis. In principle this closes the last remaining gap in data security: using previous forms of encryption, data may only be encrypted when at rest (e.g on a hard drive) or in transit (such as passing through the internet). FHE is the only generalisable approach for protecting data in use.

However, FHE currently suffers from two major issues that hinder its commercial viability. The first one is speed: calculations on encrypted data can be thousands to millions of times slower than what can be achieved on plaintext data. The second is the domain knowledge needed to implement and work with FHE schemes efficiently.

Fortunately, rapid progress has been made in the years following Craig Gentry’s seminal thesis where the first FHE scheme was proposed. Two recent advances are Zama’s Concrete library, and the C++ transpiler written by the Google FHE team.

Google’s C++ transpiler is a tool to convert regular C++ code into TFHE code via an XLS intermediate representation generated by XLS[cc], making it straightforward to convert existing code for use in FHE (with some limitations) without expert knowledge of the field.

Zama’s Concrete library is also built on TFHE and uses programmable bootstrapping to perform any univariate operation in a single bootstrap. This can significantly reduce the number of operations which needs to be performed during tasks such as encrypted deep learning inference. It also provides high-level functions for homomorphic operations; these include such things as the addition of two ciphertexts, multiplication by a plaintext, multiplication of two ciphers, or programmable bootstrapping. Concrete not only supports these operations but also provides several sets of default parameters, allowing non-cryptographer developers to start working quickly and safely with FHE. (See the documentation for more details.)

Concrete has already been used to demonstrate several applications with both industrial and academic interest; over the course of testing our optical processing hardware design, we’ve performed many of our own demonstrations in tasks such as recurrent neural network inference, Conway's game of Life, and encrypted string search.

In this tutorial, we’ll add to these previous efforts by demonstrating how to use Concrete to homomorphically compute the maximum or the minimum values in an encrypted array, without needing access to the plaintext data. Computing maxima and minima can be used, for instance, to sort arrays or to implement certain pooling layers in Deep Learning. We give an example of the former at the end of the tutorial.

At Optalysys, we are developing optical computing chips that can accelerate the main components of FHE calculations. These tutorials come about in part as an outcome of our efforts to better understand the requirements of this technology. We aim to develop a solution that not only works for FHE, but works as well as possible.

Beyond this, we also understand the value of working with the community to ensure that FHE continues to develop into a tool which is as user-friendly as possible. This is especially important given that FHE touches upon deeper aspects of the application space in a way that other uses of encryption generally do not. With this expansion in capabilities comes a corresponding degree of complexity, and it is essential to the success of FHE that developers, data scientists and other specialists have the resources to help them build the applications of tomorrow.

Of course, understanding is only half the problem; reaching the stage where that knowledge can be applied requires a massive jump in the speed at which FHE schemes can be executed, and that is the other focus of this article. At Optalysys, we are designing technology that can accelerate the primary bottleneck in FHE operations by a factor 100, and we have our focus set on even higher accelerations in the future.

The key to tackling this problem lies in the same mathematical function that our technology was originally designed to conquer: The Fourier transform. This mathematical tool finds uses in every technical discipline, but it is especially important for FHE.

At the heart of calculations in FHE is the need to multiply very large polynomials together, a task which is considerably easier when executed in Fourier space. This is both well known and well understood, and contemporary FHE schemes executed in electronics already use fast Fourier transform algorithms to make these multiplications easier.

However, the electronic algorithm comes at the cost of many repeated operations that cannot be run entirely in parallel; when massive volumes of Fourier transforms are required, this inevitably leads to additional time spent on the calculation. In the case of FHE, up to 90% of all the computational load is currently occupied solely by Fourier transform calculations, and that number is expected to grow.

We take a different approach; we manipulate the properties of optics to perform Fourier transforms at nearly the speed of light. Our technology is a highly scalable optical computer that can be integrated into existing digital systems.

This unique approach to calculating the Fourier transform has come about at exactly the right moment to provide acceleration for FHE. In this article, we combine our existing system simulators and interfaces with Zama’s Concrete FHE library to get a sense of just how powerful the first generation of our systems will be at tackling one of the most pressing issues in the digital world.

High-level description

Generally speaking (at least with respect to FHE), there are two different ways to perform computations on encrypted data. The first, which we’ve used extensively in our previous articles, is to express tasks as a sequence of logical operations. In this case, the processes we want to apply to a piece of information are converted into a representation in terms of unary and binary boolean gates; these gate operations on the data can then be evaluated homomorphically. Two very good tools for this are the Google C++ transpiler and the Concrete Boolean library.

The second approach, which we will follow here, is to work directly with integers or continuous numbers (to within some level of accuracy) and use the operations provided by a given FHE scheme to reconstruct a desired function. Depending on how we go about things, this approach can be much faster than attempting to do the same operation via homomorphic-Boolean logic. In this article, we’ll apply this approach to the task of computing the maximum and minimum values of an array of numbers using only additions and univariate functions. We will again use the Concrete library, which makes this task much easier.

Before heading into our implementation, we’ll briefly explain our thinking with unencrypted examples. Given two numbers a and b:

Our objective is to find their maximum using only additions, or functions which take only one argument. One possibility for finding the maximum value is the following:

Define a value c as the opposite of a: c = -a.

2. Compute the sum d = b + c.

3. Define the positive part e of d: e = d if d > 0 and e = 0 otherwise (here, e=0).

The maximum of a and b is a + e. Here, we can see that a is the maximum value.

We can perform a similar trick to find the minimum:

Define the opposite c of a: c = -a as before.
Compute the sum d = b + c as before.
Define the negative part e of d: e = d if d < 0 and e = 0 otherwise.

The minimum of a and b is a + e. Here, we can see that b is the minimum value.

This approach takes advantage of the speed benefit that we described above; in FHE, additions and multiplication by a constant (here, -1*a) are typically fast (although not as fast as their plaintext counterparts) because they do not require a bootstrap afterwards. In each of these two algorithms, there is thus only one “slow” operation requiring a bootstrap: taking the positive or negative part of d.

Once you can compute the minimum and maximum of two numbers, it is easy to compute the maximum and minimum values of a larger set by doing multiple comparisons of pairs of entries. For a set of N elements, this requires N-1 comparisons.

Homomorphic version

Let us now illustrate this on a larger set. Consider an array of numbers representing a discretised sine function:

In this figure, it is easy to see that the maximum is 1 and the minimum is -1:

The catch, or course, is that we can’t send this array directly to the server — the point of FHE is that the server where computations are performed does not need access to the plaintext data. Instead, we first encrypt the data before sending it to the server. Each element of the array thus becomes a vector of real values. This vector can be converted to a single number, but this one will have no relation to the original value. On encryption, the array now looks essentially random:

The advantage of this is obvious: since the data looks random (in fact it doesn’t just look random: it can be shown that, without the secret key, it is indeed indistinguishable from random noise), it can be safely sent without risk of data leak. In the following, we will show how we can nonetheless recover the maximum and minimum in encrypted form, as schematically illustrated in the next figure:

Here, the dashed lines represent plaintext data; none of these is available to the server. The latter only has access to encrypted data, represented by the blue dots. What we will show is how, using programmable bootstrapping and other FHE operations, we can generate two new ciphers which, when decrypted by the client, give the maximum and minimum of the original array (purple dashed lines).

Computing the maximum or minimum of two encrypted values

Preliminaries

To run the code in this tutorial, you will need a Rust compiler (instructions to install one can be found on the rust-lang website). Run cargo new --lib homomorphic_min_max to create a new Cargo project. Since we use Concrete, you will also need to add it as a dependency by adding concrete = "0.1.11" in the [dependencies] section of the Cargo.toml file.

Unless said otherwise, the code shown below can go in the src/lib.rs file, which should start with use concrete::*; . This places Concrete's structures and functions in the current namespace.

In the following, we will go through the code step by step, explaining as we go. A full version of the code is available in the Appendix.

In the following, we start by creating functions that use Concrete to cryptographically compute the same sequence of steps that we demonstrated on the unencrypted example. We then take these functions and show how we can apply them to arrays of encrypted values, allowing us to find the maximum and minimum values in these arrays without needing to decrypt the information!

Max algorithm

The first thing we need is the function to compute the maximum of two encrypted values. We can’t just compare the ciphertexts — since a ciphertext should not leak any information about the message it encrypts, there is no way to see which one encrypts the larger value without the decryption key. Instead, we will use Concrete to construct the sequence of additions, comparisons and subtractions used to check for the maximum value, and then generate a new ciphertext which is a (different) encryption of the largest value:

https://medium.com/media/60cc3c42151c42f4e58c4b77e7d1325d/href

This function takes six arguments:

cipher_1 and cipher_2 are two ciphertexts encrypted with the LWE version of TFHE. They encrypt the two values we want to compare.
ksk is a key-switching key: a public key (by ‘public’, we mean that this key can be safely shared as it can not reveal any information on the messages) which can convert a cipher encrypting a message m with one key to a different cipher encrypting the same message with a different key. It is needed because the programmable bootstrap will generally change the encryption key; the key-switching key is used to revert to the original one.
bsk is the bootstrapping key, required to perform the programmable bootstrap. Like ksk, it is a public key.
encoder is, as its name indicates, an encoder, used to map floating-point values to a fixed set of values that the encryption and decryption schemes can work with and back. Strictly speaking, this parameter is not really needed: we could use the same encoder as that of cipher_1 or cipher_2. However, we prefer to include it as an explicit argument to give the function more flexibility, as choosing a different encoder may improve the accuracy in some cases.
zero is an encryption of the value 0 with a carefully chosen encoder to cancel the drift. Indeed, even if cipher_1 and cipher_2 have the same encoder, performing homomorphic operations will generally change it. This is not a problem when comparing only two numbers as the final encoder will always be able to represent the result (assuming cipher_1 and cipher_2 are correctly encoded). However, it may become an issue if this result is then compared with other values. Adding an encryption of 0 with the right encoder cancels this effect, and ensures the output of the function can be compared with other values.

Let us go through how this function works line by line:

https://medium.com/media/816f0ad69445a961862b83b1a4665163/href

computes (an encryption of) the difference between the two messages (it first takes the opposite of cipher_1 and adds the result to cipher_2);

https://medium.com/media/af35c8ff2a4645ba5876be83d2c8c8a6/href

computes the positive part using programmable bootstrapping thanks to the bootstrapping key bsk, a closure computing the positive part of a floating-point number (|x| if x >= 0. { x } else { 0. }), and the encoder passed to the function. At this point, cipher_diff_pos is an encryption of the result, but under a different key than the original.

https://medium.com/media/18f96ade37a69688f5a34c2b8a13e6c8/href

The above then switches the encryption key back to the original one held by the sender. The line

https://medium.com/media/6e602894c4f713419df48f2ce518789d/href

then computes the sum of the result and cipher_1. The operation

https://medium.com/media/c03082e25b86de4d1dd53b0c03d1af37/href

adds an encryption of 0, which has the effect of re-centering the encoder.

Finally, the function returns a Result containing either a ciphertext if all operations were successful, or a CryptoAPIError (an error type defined by Concrete) if one of them failed.

NB: Technically, the key-switching step is not absolutely necessary: we could also choose the input and output keys of the bootstrapping operation to be the same, and thus not need to change the key afterward. In this article, we choose the input and output keys independently to illustrate a very useful property in Concrete, since the runtime overhead of the key switching is quite small in practice.

Min algorithm

The function to compute the minimum of two values is very similar, and proceeds along the same lines as the maximum function:

https://medium.com/media/08f2e5e689f537dd94c79711c78836e6/href

The primary difference is that we add the opposite of the positive part of the difference to cipher_2 instead of directly adding the positive part to cipher_1.

Maximum or minimum value of an array of numbers

Maximum

We can now use the function compute_max above to compute (an encryption of) the maximum of an array of numbers:

https://medium.com/media/d250cbbd351528bc172577599d31e3eb/href

We make this function public (pub) so that it can be accessed from other files. Its arguments are similar to those discussed above except for the first one, ciphers, which is an array of ciphertexts.

The first line of this function may be surprising at first sight: it computes the opposite (in encrypted form) of 0. Of course, this operation will not change the value obtained after decryption (since -0 = 0). But it will change the encoder of the ciphertext.

This is one of those rare moments where we have to be aware of how Concrete works on a deeper level; if zero originally has the same encoder as the elements of ciphers, then the new cipher obtained after taking the opposite has just the right encoder to cancel the drift that compute_max would otherwise entail.

The rest of the function follows quite standard coding practice: we convert the array to an iterator, set the result to its first elements (or raise an error if the array is empty), then loop over the other elements to find the global maximum.

Minimum

As before, the function used to compute the minimum of an array is very similar:

https://medium.com/media/20ff5fef0eeb4589a5a22d5e90ea0b4e/href

The only differences between this and the maximum array iterator are that we don’t need to take the opposite of 0 and, of course, that we are using ourcompute_min function instead of compute_max.

Example

Let us show how to use these functions on a simple example. Create a file src/main.rs containing

https://medium.com/media/f75a757032702461ff9882fbe4422de7/href

The code shown in the rest of this section should go in this main function.

We first define the encoder. We will work with values between 0 and 1, with 4 bits of accuracy. We set the number of bits of padding to 2. (See the concrete documentation for more information on the encoders, keys, and parameters.)

https://medium.com/media/d6e81748dcd44e31e76fe7936f0c71a2/href

We then define three secret keys:

an LWE key to perform encryption and decryption,
an RLWE key which will be used to generate the bootstrapping key,
another LWE key related to the RLWE one.

We use a polynomial size of 1024, with 128 bits of security.

https://medium.com/media/a4d150bfc09a1d4293302b11d65dc570/href

We then define the key-switching key, needed to convert a ciphertext for the key sk_out to a ciphertext for sk_in, and the bootstrapping key. They both take two parameters, which we set to 5. (See the Concrete documentation.)

https://medium.com/media/e779ac7dbbb729733b7d8bfb9256b1b4/href

That’s all for the set-up. We now define the vector of messages:

https://medium.com/media/6ac82f36c8005d7783b5e1bfa3eacee7/href

encrypt it:

https://medium.com/media/c0e7bb68b967589564ab5eb655981785/href

compute the maximum and minimum values:

https://medium.com/media/cec7d3d147f137c3b1b23ede2d9adc4f/href

decrypt the result:

https://medium.com/media/b7622ab3ba8b595e64cdfe3ca617f691/href

round the values:

https://medium.com/media/080c6436c436f4cea332c6fe93140f87/href

print the results and return Ok(()):

https://medium.com/media/b48613f88306a58737d20549f9a7f7ad/href

Running the program using cargo run --release, you should get:

https://medium.com/media/adfe62c3421bfd7f7bfe19870e880bff/href

Sort algorithm

While finding the minimum and maximum of a dataset can already be useful, in practice this truly shines when used as a building block for other algorithms, e.g., sorting an array. In this section, we now show how to implement a basic sorting algorithm working entirely on encrypted data.

Function to compute the maximum and minimum of two numbers

The functions compute_min and compute_max written above are fairly fast (by FHE standards) as they require only one bootstrap each. However, they have two drawbacks which we want to avoid for a full sort function:

They use the function add_centered_inplace, which is not officially documented.
The encoder correction obtained by adding an encryption of 0 is not always perfect, and may require fine-tuning the offset of the encryption of 0.

Here we show a different version using the programmable bootstrap to reset the encoder. To minimise the number of such bootstrap operations, the same function returns both the minimum and the maximum of two numbers.

https://medium.com/media/c88f761b7da22763b254907936b0839e/href

This function will be about three times slower than the previous ones, but it eliminates the drift in the encoder.

Sorting algorithm

Once we can find the minimum of an array of numbers, one can sort it by successively computing the minima of the remaining elements. This requires n (n-1) / 2 comparisons, where n is the array size. There are much faster sorting algorithms: for instance, the merge sort algorithm requires only n ⌈log n⌉ / 2 comparisons in the worst case (where log denotes the logarithm in base 2). However, these algorithms involve conditionals which are not straightforward to evaluate homomorphically (for instance, merge sort requires taking one element from each of two arrays to merge and determine which is the largest, a process which can not be directly implemented without leaking information on the data). For this reason, in this article we will stick to a naive algorithm which, while slower, has the advantage of being easy to implement in FHE.

https://medium.com/media/d7215ac772402b60391d31ba2f2b6615/href

As described above, this algorithm works iteratively: it first finds the minimum value of the array, then the minimum value of the other elements, and so on. To use it, replace the last line in the main function (Ok(())) with

https://medium.com/media/19beea73d3a5278cbc70496f2d5f46fd/href

Optical acceleration

As described in the introduction, the main problem currently facing FHE is speed: the security it provides comes with a huge runtime penalty of between 1,000× to 1,000,000× depending on the kind of computations being performed. This is way too high for most practical applications, so realising the full potential of FHE will require bringing these numbers down to below 10×.

Fortunately, progress in this domain is coming fast. On the algorithmic and software side, newer schemes like TFHE and efficient implementations like Concrete bring orders-of-magnitude speed-ups over earlier frameworks. Taking a slightly different direction, levelled homomorphic encryption can significantly reduce the runtime for evaluating boolean circuits with a pre-determined, small or medium depth.

On the hardware side, the US Defense Advanced research Project Agency (DARPA) has launched a project to explore hardware acceleration of lattice cryptography in general, and FHE in particular. Promising ideas involve large processor caches and registers to work on large integers, as well as dedicated single instruction multiple data (SIMD) instructions.

At Optalysys, we are following a different yet complementary approach: we are developing a silicon-photonic chip to compute the Fourier transform, the main bottleneck in FHE operations, faster and more efficiently than is possible on electronic hardware. Our system is modular and can integrate with both cutting-edge FHE software and hardware to compound the acceleration that each approach brings. We have previously described our technology in these four articles (and take a look at our website for even more documentation). Moreover, we can give an estimate for how quickly our system could tackle the problem at hand.

To this end, we have split the above code into two parts: a client side which encrypts and decrypts data, and a server side performing computations on encrypted data, running on different processes with TCP communication between them. We have also modified the server side to make use of a simulator for our upcoming optical chip. The code is not publicly released yet, but can be made available to participants on our hardware beta program. We have run the same problem (sorting an array of 10 values) on an Intel i7–9700K CPU @ 3.60GHz and on our optical simulator and measured the runtime. To make the comparison valid, we subtracted the runtime for key generation, encryption, decryption, and client-server communication.

When running electronically, and after subtracting some overhead, sorting the array took a bit more than 6s. By comparison, the optical simulator indicated the same operation will run in 0.05s on our silicon-optical chip. Our technology can thus accelerate this sequence of FHE operations by two orders of magnitude. Moreover, this speed-up can be compounded with software improvements and other forms of hardware acceleration, each of which could provide further order-of-magnitude improvements in the coming years.

Based on these results, we strongly believe that FHE will become a commercial reality and revolutionise how the world exchanges data in the next few years — and we are ready to be an integral part of this adventure.

Appendix: Full code

For completeness, we show here the full Cargo.toml, lib.rs and main.rs files for this tutorial. We create the new project using cargo new --lib homomorphic_min_max, compile it with cargo build --release, and run the example using cargo run --release.

Cargo.toml:

https://medium.com/media/6e3973f52f109e19ead838f36693d501/href

src/lib.rs:

https://medium.com/media/d75658a86d6593949d10f5e2909c713e/href

src/main.rs:

https://medium.com/media/34980a8ebab656d12b99504eb0329619/href

Max, min, and sort functions using Programmable Bootstrapping in Concrete FHE was originally published in Optalysys on Medium, where people are continuing the conversation by highlighting and responding to this story.

Encrypted search using fully homomorphic encryption

Optalysys — Wed, 03 Nov 2021 13:41:07 GMT

With Florent Michel, Joseph Wilson and Edward Cottle

Information is the lifeblood of the modern internet. As of January 2021, 4.66 billion people were active internet users, each one sending and receiving a torrent of data.

Quite of lot of this data is (relatively speaking) harmless and can be shared and accessed to various degrees. A vast and lucrative market around analysing this data has sprung up to take advantage of this highly accessible information, albeit with associated concerns about how this data is used.

However, what about information that is both valuable and sensitive? Too sensitive to run the risk that an unauthorised party could even see it, let alone work with it?

The problem of privacy

There are plenty of cases like this, and the standards for how this data can be used and shared are exceptionally high. This can be a pretty significant barrier to getting the full value out of that information. Consider the following example:

A man walks into a bank and wishes to open an account. The bank is legally obliged to ensure that the money entering that account isn’t from the proceeds of crime, so the first thing they must do is establish who this man is and where the funds have come from. This process is known as KYC or Know-Your-Customer, a key part of Anti-Money Laundering (AML) compliance.

Sounds simple right? Companies, banks and other financial institutions keep computerised records, so surely verification is simply a matter of searching the relevant databases?

In fact, such checks are a headache for all involved because of the need to preserve privacy. For example, in the EU, GDPR legislation imposes tight restrictions on how personal data can be used and shared. And that is entirely understandable, because the damage that can be done with important information is enormous.

And all of this is down to a problem with encryption.

A new approach to security

Information can be stored in encrypted form when travelling or at rest, but under current encryption methods, the data must be decrypted when it is being processed. That means that if you want to search a central repository of information, the information has to (at some point) be decrypted.

Securing a large-scale repository of unencrypted information against intrusion is already extremely hard. Being able to safely grant access to external parties is even harder, so information sharing between companies and other record-holders is generally a slow process.

The solution to this (and many other problems) is Fully Homomorphic Encryption, a new model of cryptography that allows computation on encrypted data. Under FHE, an encrypted database can be searched for information without the need to decrypt the information at any point. At a stroke, a major barrier to safe information management can be eliminated, although not without a cost; the computational nature of FHE makes it notoriously slow, even on modern computing systems.

However, FHE is only slower than unencrypted processing from a computational standpoint. When compared against the lengthy (and not to mention still fundamentally insecure) processes that currently ensure data security, an FHE-enabled world will not only guarantee privacy but will also work a lot faster in general.

Speeding up FHE via optical computing

With optical computing, we can start to address the computational overhead too; now the FHE world will move even faster, breaking down the limits imposed by current encryption methods. However, to reach that point, we’re still going to need the software tools and the broader algorithmic understanding to make this a reality.

So, in the interests of illustrating the value that can already be unlocked, this article is dedicated to the task of implementing an encrypted search operation using Zama’s Concrete Boolean library, and running it with the assistance of optical Fourier transform hardware.

Visualisation of the core of our optical Fourier transform system, a chip-scale hybrid CMOS/Photonic processor for ultra-fast Fourier transforms. In this article, we use our system-level simulators (developed for our Beta program) to execute the Fourier transforms required for FHE operations in Concrete-Boolean, allowing us to provide a projected benchmark for the benefits of this technology in performing an encrypted search.

Concrete Boolean is the library we applied in our last article, where we used it to execute Conway’s Game of Life in encrypted space. This library provides us with the programming tools that we need to start working with the computational properties of FHE in a direct and secure way.

Indeed, the speed with which we’ve been able to start using it is testament to the power that it places in the hands of users. Concrete Boolean was released about a month ago; in that time, we’ve been able to put out two articles on non-trivial applications. It’s an extremely encouraging development for the field in general.

Some FHE applications (such as Conway’s Game of Life) are interesting. This is an application which is useful. By using our optical Fourier transform approach, we can then take that useful application and make it run much faster than it would on conventional electronics.

How much faster? We put the string searching model we built to the test on our simulator architecture, which replicates the functionality of our physical demonstrator systems. Under the parameters of the commercial-grade system we have designed, this example search algorithm can be made to run 60x faster when the Fourier transforms are executed by an optical process.

Article contents:

String searching

Techniques for string searching

The naive approach

Tries

The naive approach: bringing Booleans into the picture

Applications

Encrypted string search with Concrete Boolean

Import the Concrete-Boolean library

Encrypt a string

The FHEError type

Checking the equality of two characters

The string search function

An example

Alternative version

Potential for optical acceleration

String Searching

Search algorithms are common tools for information retrieval and assessment. For example, consider the way that email is currently filtered. This usually involves a range of different techniques, but one of the simplest, and earliest methods (the “naive Bayes” classifier) involved detecting key words or phrases. For example, emails containing the following in the subject line or message body:

“Free money”
“Free offer”
“$$$”
“Weight loss”

Are highly unlikely to be legitimate, and may be actively harmful. These phrases are made of sequences of characters (like any other text), which in computing are represented as strings, sequences of bits that represent characters. For example, the “$$$” string in ASCII encoding is a sequence of bytes (8 bit sequences), each of which represents a “$” character:

00100100 00100100 00100100

A straightforward key-word based approach to filtering means that detecting this sequence in an email indicates it is likely to be spam.

This kind of identification is an example of string searching. String searching is an extremely common task in computing; if you’ve ever searched a document for a particular word or typed a message on a phone that auto-predicts what the next word will be, you’ve seen it in action. We’ll use strings as an example in this article, but the approach we take to encrypted searching can be broadly generalised.

Techniques for string searching

As we did with the Game of Life application in our previous article, we’ll start by exploring approaches for unencrypted information.

The fundamental problem of string searching can be described compactly as:

“Given string S1 of length n and search string S2 of length m where m ≤ n, find all instances of S2 in S1”.

For example, finding the word “fox” in a longer string:

The naive approach

The simplest way to approach this problem is to iteratively compare sections or “windows” of length m of S1 against S2. Starting from the beginning of S1…

…we check to see if these characters match and then move on to the next sub-string. Spaces count as characters, so we end up having to check these too.

Eventually, we’ll reach the correct pairing. If we’re checking a long document for multiple instances of a word, we’ll have to continue on like this to the end, making a note of the indices where matches were found.

This is the naive approach; the first thing that would come to mind. However, consider a different S1 and S2:

Now the string that we’re searching for is right at the end of S1, which means that the total number of checks that we have to perform in the worst-case leads to a runtime complexity of O(nm).

What if there were other ways of representing string information in a searchable form?

Tries

If we’re allowed to consider a dictionary of words, we can also use a structured approach. Consider the previous string:

This was the worst-case scenario for the naive example. If we’re allowed to operate on the assumption that these words are stored in a dictionary, there’s a faster way of storing and representing information in the form of a Trie data structure. When we load the string, we can split it on the basis of the “ “ (space) delimiter into a dictionary:

We can then build up a tree structure by taking each word and adding each character to sequential nodes. Each node can itself contains a list of other characters, like so:

The cost of building this structure in runtime complexity is linear, i.e. O(n). This is because, in the worst case, we will have to add one node to the trie per character in the dictionary of words.

By searching over this structure, we can notionally reduce the search time significantly. For example, if we want to search to see if the word “fox” is present, we now only have to perform at worst 3 operations as we search through the nodes. We first query the root node to see if it contains the “f” character, if it does not, then we know “fox” cannot exist in the set of words. If we do find an “f”, then we perform similar checks recursively on the nodes containing “f”, and then “o”, until we find the terminal character.

The cost of checking a node for a character is O(1) if we use an efficient structure to store subnodes (e.g. a hashmap). This means that the runtime complexity of searching for a given word in the trie is O(m) where m is the length of the word we are searching for.

Tries are also more compact than a straightforward dictionary, as many prefixes are duplicated (e.g the sequence “Fo”, which starts many words such as “for” “fortune”, “Fourier” and so on, only needs to appear once). Overall, this approach works very well; tries (or similar approaches) are foundational data structures for tasks like auto-complete and spell-checking.

However, the catch is that implementing such a structure also involves a lot of decisional branching, and that’s an issue for FHE. Over the course of an encrypted computation, there’s no easy way to perform the kinds of comparison that are needed to execute a jump to another section of the computation. So to begin with, we’re going to settle for the naive approach.

That’s not to say that the idea of using a better-than-naive approach to searching is impossible with FHE; we can certainly improve upon things, so after we’re done with the naive method, we’ll also outline an alternative approach that falls somewhere between the naive approach and the more advanced trie method.

The naive approach: bringing Booleans into the picture

Thus far, we’ve described string searching in terms of characters, but paid little attention to the way that computers actually work. As we mentioned before, computers don’t see characters; each character is represented by a sequence of bits.

Bits can be 1 or 0, so we can implement a string check via Boolean operations. This is in fact what’s going on behind the scenes whenever we perform a string check; it’s just that there’s usually one or more layers of abstraction between the code we write and what’s actually going on.

If we have a word such as “fox” and we’re checking through strings where the characters are encoded with ASCII, then each character is represented by an 8-bit value. There are two Boolean operations which are useful here.

The first is the “exclusive not-OR” or XNOR gate. Here’s the inputs and outputs (the “truth-table”) for such a gate.

And the second is the AND gate

We can combine the utility of these gates like so; Let’s say we have two sequences of bits, e.g

S1 = “011000011101110110000101”

S2 = “01110110”

And we want to find S2 in S1 via the naive sliding-window approach. For each comparison window, we can XNOR each bit together and then apply a cascade of AND gates to the output. If the bit sequences don’t match…

… then the output of the final AND bit will always be zero

The only time this process will return a non-zero output bit is if all the bits match:

This allows us to reduce a query over multiple binary values to a single binary outcome.

Applications

This by itself is a pretty big deal! It’s not a new idea, but it is a key piece of understanding before we leap into encrypted string searching, so we’ll stop for a moment and consider what we can do with it.

Thus far we’ve phrased everything in terms of searching a single string of text to see if it contains another string. That’s fine; it’s been a useful tool for introducing us to the link between Boolean operations and the way that information is represented on computers.

However, beyond this, if we can convert the problem of searching human-readable text strings into a Boolean model, we can implement this approach to searching over other data types too. This is what we need to begin developing the more complex applications of encrypted search.

In fact, some of these data types are actively easier to work with than text. For ASCII text representations we’re working with a byte per character, which means comparing 8 binary values. That’s not a big deal on a modern machine, but when working with encrypted FHE operations, every additional Boolean gate we need to apply translates into meaningful extra work.

Text requires an 8-bit representation because written human language contains many different symbols. English contains 26 lower case characters, 26 upper case characters, punctuation, and various other symbols that are used often ($!& etc). If we’re representing the complete English character set with unique binary values, then of course we need quite a few of them.

However, language doesn’t always mean human communication. Language in the broader sense can be seen as any formal system of symbols.

For example, DNA is formed from 4 nucleotide base pairs: adenine, cytosine, guanine and thyamine. This is an example of a symbolic language that only requires 2 bits to represent the full range of characters. In light of the methods we use in our example, an encrypted search over a DNA sequence would be much faster on a per-character basis than over a text string.

We’ll likely come back to this idea in a later article, but for now we’ll move on to how we translate this into an encrypted search.

Encrypted string search with Concrete-Boolean

To perform an encrypted string search, we can combine Concrete-Boolean with the Boolean comparison process. In our example, we’ll be searching the S1 string

For the S2 string

We begin by encrypting each individual bit of each character in both S1 and S2 as a ciphertext. Each discrete character (e.g “l”) can now be thought of as being represented by a vector of 8 ciphertexts.

To compare two characters, we first have to perform an encrypted XNOR operation between each ciphertext just as we did before.

If we do this for each character in S2, we now have a collection of ciphertexts containing the output of all the XNOR operations between the section of S1 and S2. We can then apply encrypted AND operations to these ciphertexts. This will return a single ciphertext containing an encryption of either “0” if the strings don’t match or “1” if they do.

This is just like our initial Boolean search method, but now everything is encrypted. The only information that can be extracted is whether the S2 string is present in S1 and, if so, where it is in S1. At the same time, the only person or organisation that can retrieve this information is the holder of the decryption key.

Let's see how this works in practice.

Import the Concrete-Boolean library

We first import the elements we will use from the Concrete-Boolean library: the gen_keys function used to generate the keys, and the ServerKey, Ciphertext, and ClientKey types. We also import Concrete’s CryptoAPIError type and define the number of bits per character.

https://medium.com/media/294e936cd8532433ed5d4a3e22094fac/href

Encrypt a string

The next thing we need is a function to encrypt a string into a vector of ciphertexts. The simplest approach is to convert the string to bytes, loop over the bits, and encrypt each of them. Since Concrete-Boolean works with Boolean values rather than integers, we identify 0 with False and 1 with True.

https://medium.com/media/438822461ff979dc879c5f38a8bfab72/href

The FHEError type

This step is not strictly needed, but will make error handling easier. The string function we will write may fail for different reasons. Some of them are due to how Concrete-Boolean works, and will result in a CryptoAPIError. But others may not be related to the cryptography algorithm at all. For instance, one of the strings may be empty, and we choose to throw an error in this case. To deal with these different possibilities, we define a custom error type, FHEError. We also implement the Fromtrait so that a CryptoAPIErrorcan be converted to an FHEError.

https://medium.com/media/1faa106747e92c1dbf1894d526cf331d/href

Checking the equality of two characters

We now define the function which checks if two sequences of ciphers decrypt to the same sequence of bits. The function bits_are_equal below takes a server key and two ciphertexts a and b as arguments, and returns another ciphertext which decrypts to trueif a and b encrypt the same boolean value or false otherwise.

The function are_equal does the same for two sequences of ciphertexts, returning an error if they are empty or have different lengths.

https://medium.com/media/9cb7ba673336e8a25ff851d982bdf25c/href

The string search function

Finally, the function search_word takes a server key and two sequences of ciphertexts a and b as arguments. If a is empty, it returns an error. Otherwise, it returns an encryption of true if the string used to generate a is in that used to generate b or of false otherwise.

https://medium.com/media/31666810d684c6c9785bda1b1466ad49/href

An example

As an example, the code shown below will look for the words “like” and “hate” in the sentence “I like making FHE easy!”.

https://medium.com/media/6c91a7cb3700b07b8e8c494e93b69444/href

Here is the result:

https://medium.com/media/50eca763dc2425d1a42fab927d625c42/href

Alternative version

As we mentioned earlier, this algorithm is not the most optimal when working with plaintext data: it is much better to first split the list into words, assemble them into a trie structure, and use it to search each word. There are two reasons why this is not possible in (or, at least not easily transferable to) FHE:

First, using FHE, we can’t use conditionals based on the plaintext data to terminate the calculation early. This is a fundamental property of all well-designed FHE schemes, as the time needed to perform a calculation would otherwise leak information about the plaintext data.
Second, if we want to leak no information at all about the string, we can’t even provide the function with the length of each word, as a statistical analysis on these lengths could reveal something about the text. (For instance, some technical fields are more prone to long words than others.)

However, sometimes faster computation may be worth leaking some information about the original string. If the string is short enough, there is only so much that a statistical analysis will reveal. And, for longer strings, one could imagine padding them with random words to partially hide the information that word lengths reveal.

In cases where this trade-off is acceptable, one can split the string into individual words, encrypt them, and check a word to be searched against each of them sequentially. For additional security, one may pad or crop each word to a fixed length, so that the only information the server gets is the total number of words. This has the added benefit over the above version that the server does not even know the length of each word to be searched for. The complexity of the search algorithm then becomes O(m l), where m is the number of words in the string and l is the fixed length.

For completeness, we give here the full lib.rs file (a large part of which has already been explained above):

https://medium.com/media/cf6f4e02643d83c0c50250138625d897/href

And the associated main.rs file:

https://medium.com/media/878abef4582116fa370577a232d4d1e5/href

This version has a slightly different trade-off than the previous one. Indeed,

In the previous version, the function performing the computation had access only to the total length of the text and to the length of each word which is searched.
In this version, it has access to the total number of words in the text but not their individual length nor the length of the words which are searched for.

In both cases, the small information leak can be resolved (at the expense of a longer runtime) by padding the text with additional words and, in the first version, by padding each word with spaces to be searched for and adding a comparison to an encryption of a space to the function comparing characters.

As far as the runtime is concerned,

The first version requires about nm character-to-character comparisons, where m is the length of the text and n that of the word to be searched for.
The second version requires wl character-to-character comparisons, where w is the number of words in the text and l the maximum length of a word.

The second version will thus be faster on average if l is smaller than the squared average length of a word, and slower if it is larger.

Potential for optical acceleration

The above example runs in about 40s on an Intel i7 CPU @ 3.6 GHz. By comparison, we project (again, based on the parameters of our optical device design) that the Fourier transforms required for the full computation, which are by far the main bottleneck, would take less than 0.4s on the Optalysys optical Fourier transform system. The chip-scale technology we are developing could thus accelerate this more advanced application by a factor of 100.

Encrypted search using fully homomorphic encryption was originally published in Optalysys on Medium, where people are continuing the conversation by highlighting and responding to this story.

Concrete Boolean and Conway’s Game of Life: A Tutorial

Optalysys — Fri, 29 Oct 2021 09:53:05 GMT

A guest article by Optalysys on implementing Conway’s Game of Life using homomorphic encryption

With Florent Michel, Joseph Wilson and Edward Cottle.

Fully Homomorphic Encryption is a remarkably powerful cryptographic technique that allows computation on encrypted data.

However, cryptography can be a difficult subject. Data is valuable and computers are complex machines; attacks on encrypted data rarely focus on the underlying mathematics of the cryptosystem, but instead target the practical implementation. Ensuring that cryptographic applications are secure requires expert oversight.

FHE is powerful because it allows for secure computation, but to achieve the full potential of what it offers it has to be accessible to everyone who has a use-case for it. Tools are needed to make it possible for non-specialists to safely implement the next generation of encrypted processing applications. This is what Zama have done with their Concrete Library, and we (Optalysys) are here to demonstrate that.

We’re a company developing a new kind of optical computing hardware, one that has tremendous promise for addressing the computational overhead of FHE. What we aren’t are specialists in the development of cryptographic libraries. However, Concrete is so user-friendly that we’ve not only been able to implement relatively complex FHE applications, but also test them out on our novel hardware concept. More than that, we’ve been able to do this quickly; the example we’re going to show here only took between 3–4 hours to develop.

In less than a month since the release of Concrete-Boolean, we’ve been able to develop multiple very different FHE applications of varying complexity. We’re currently in the process of writing articles on these too (which takes much longer than the actual development), but when we say that Concrete-Boolean is a groundbreaking advance for both end-users and FHE as a field, we mean it.

The first of the test applications we created was a classic simulation known as Conway’s Game of Life, carried out entirely in encrypted space. We’ve released our own article on how we executed this task using a combination of our optical computing system and Concrete, but here we’ve turned that article into a tutorial focused on Concrete Boolean itself.

Introduction to the Game of Life

When developing any application, we first need an understanding of what it is that we want to achieve. In this case, we want to implement Conway’s Game of Life through encrypted Boolean operations.

The Game of Life is an example of a cellular automaton, a simulation with simple rules from which complex behaviours can emerge. It is “played” on a 2-dimensional square grid. The size of this grid isn’t fixed; it can be any size, as long as you have sufficient computing memory. Each cell within this grid can be in one of two states, “alive” or “dead”, and each cell has 8 neighbours (including ones at the edge if the boundaries are treated as periodic). The system evolves over discrete update intervals according to the following rules:

Birth: A dead cell with three (no more, no less) live neighbours becomes live on the next iteration.

Survival: A live cell with two or three live neighbours lives on to the next iteration

Death: A live cell with less than two or more than three live neighbours dies on the next iteration.

While simple, these rules can give rise to complex and unexpected behaviours as the system evolves in time. For example, dynamic-but-stable configurations like the one shown below, which moves through the grid in repeating patterns.

Boolean construction

Concrete Boolean provides us with a way to directly implement the majority of unary or binary boolean gate operations on ciphertexts with a single line of code, so we start by constructing boolean algorithms for the Game of Life.

Since each cell can be in either one of two states (‘alive’ or ‘dead’), the state of any cell may be represented by a single Boolean variable. We use ‘True’ to indicate that a cell is alive, and ‘False’ for dead.

After an update, a given cell is alive if and only if one of these two conditions is true:

It was already alive and had 2 or 3 neighbours.
It was dead and had exactly 3 neighbours.

If the variable alive(i) denotes the state of the cell and neighbours(i) its number of neighbours after i iterations, then the state of the cell after i+1 iterations can be written in a boolean way as:

alive(i+1) = (alive(i) AND (neighbours(i) = 3 OR neighbours(i) = 2)) OR (NOT(alive(i)) AND neighbours(i) = 3)

which can be simplified to:

alive(i+1) = (neighbours(i) = 3) OR (alive(i) AND neighbours(i) = 2).

This is our update algorithm; we can apply it to each cell on each iteration and the system will evolve according to the rules. However, as part of this algorithm we need to be able to calculate the number of living neighbours each cell has.

Without encryption, this is straightforward: for each cell, we can just sum the states of the 8 neighbours, with ‘true’ counting for 1 and ‘false’ for 0. This is also technically possible in FHE: given ciphers corresponding to some numbers, summing them (at least under TFHE; other schemes may require more complex operations) will give you a cipher containing their sum. However, there is no similarly easy way to compare the result against the values 2 and 3 without decrypting it. While this is certainly possible using a combination of gate operations and programmable bootstrapping, we’ll give a simpler solution using only the former.

Additions, like any other computable operations, can be evaluated using boolean circuits. For instance, if b1 and b2 are individual bits, their sum can be expressed as the two-bit number c1 c2 where

c1 = b1 AND b2
and
c2 = b1 XOR b2

In our case, since each cell has 8 neighbours, the sum of their states can be encoded in a 4-bit number (we’ll see later that this can actually be reduced to 3 bits). We can use the Boolean addition method to calculate this number for any given cell; this is the central principle through which we can begin building our algorithm for calculating the number of living neighbours.

Summing the neighbour cells via Boolean operations and accumulating the result for the centre cell. From the image we can see that there are 3 live cells surrounding the central cell, and this is reflected in the total accumulated binary value (ACC) after addition. This value can then be passed to the earlier algorithm that determines if the cell will be living or dead on the next iteration.

The problem of determining the number of neighbours for a cell can now be reduced to the following steps:

Design an accumulator circuit which can sum up to 8 bits (sequentially), with a 4-bit output.
Use it to compute the number of neighbours of each cell and plug it into the formula used to update its state.
Convert the circuit to work on encrypted data using Concrete-boolean.

That’s the core logic completed. We’ll now show how to use the concrete-boolean library to implement a homomorphic version of Conway’s Game of Life. In the following sections, we will write some code to:

Define and encrypt the initial state of the board,
Update the board homomorphically without decrypting anything,
Decrypt and print the result.

This step by step guide shows how to write a simple single-threaded version of our example code which prints the output in the terminal. For more advanced users who want to see how this works with multiple processing threads, a multithreaded version with a basic graphical interface can be found here.

In this tutorial, we will assume you have a Rust compiler (it should work with any version of rustc from 1.51) and Cargo installed. Instructions on how to install them can be found on the rust-lang website.

Note: The implementation below was designed to be relatively simple, and is not optimised for speed. It is possible to further simplify the boolean circuits, although these tricks are beyond the scope of the tutorial. Remember, every operation counts in FHE, so it’s worth taking the time to optimise.

Set-up

The first thing we need to do is create a new cargo project using Rust. We can do this using the terminal command cargo new homomorphic_game_of_life.

Using this command will create a folder labelledhomomorphic_game_of_life. Inside this folder will be a new folder calledsrc.

The only dependency (apart from Rust’s standard library) is concrete-boolean. To use the code shown in this tutorial, you'll need to add concrete-boolean = "0.1" to the dependencies section of the Cargo.toml file, which will look like

https://medium.com/media/1000cf3d56be5d5cb99c1f8aa4546743/href

Create a new file labelledlib.rs in the src folder that contains the line:

https://medium.com/media/8a30ee37402675fdb8809ff657816d8d/href

This will place the traits and functions from concrete-boolean in the current namespace, and also allow their use inmain.rs(wich we will write later). Unless stated otherwise, the next pieces of code should also go in lib.rs.

(What we’re doing here inlib.rs is creating a new library of functions that we can call upon when we execute the main section of our code, the part that establishes the initial starting conditions for the grid and then applies the core algorithm over subsequent iterations. These functions are specific to this application and implement the boolean algorithms that we described in the last section.)

Accumulator

We begin by writing an accumulator which will be used to add the states of the neighbours of each cell, with the state ‘alive’ interpreted as 1 and ‘dead’ as 0.

The function add_1 uses Concrete-Boolean operations to homomorphically add the contents of a ciphertext a (which encodes a boolean value) to a triplet b that encodes three boolean values.

The triplet b is the accumulator for the central cell; adding a ciphertext a containing a True (encryption of the value “1") boolean value to the accumulator will cause the binary value stored by the accumulator to increment by 1

https://medium.com/media/e636a23d2aa17eed050076274d9a6548/href

We use a 3-bit accumulator, which means that we can count from 0 to 7. A cell can have 8 neighbours, but we do not need to distinguish between the values 8 and 0 since these both lead to the same state for the cell after the update. A 3-bit accumulator is enough, and it means we can cut down on additional complexity.

We then call this accumulator in our next function to compute the sum of a sequence of ciphertexts modulo 8 (i.e., where 8 and 0 are identified with one another).

https://medium.com/media/072602e76aae0b69837c57328658e7e4/href

This function takes three arguments: server_key, of type ServerKey (we will explain why this is needed later; for now, we’ll just say that it’s an essential part of performing homomorphic operations), elements, an array of ciphertexts, and zeros, and a triplet of encryptions of false (which may be identical). This function assumes the array elements is not empty and will panic if it is; we could easily change this by replacing it with

https://medium.com/media/be904a0c17486a19491a88d4aada1418/href

However, in our case, we know that elements will always be of length 8, so the original function sum will do.

Checking if a cell is alive after the update

We now have the tools we need to perform an encrypted computation of the state of each cell after one update using the following method:

Given an array containing the ciphertexts for the boolean states of its neighbours, we can add them together using oursumfunction, check if the output is 2 or 3, and determine the new state of the cell as follows:

if the sum is 2 or 3 and the cell is alive, it stays alive,
if the sum is 3 and the cell is dead, it becomes alive,
otherwise, the cell is dead after the update.

https://medium.com/media/2433a08bd8986440f5b44bf044c6b8ed/href

And that’s all! Well, at least as long as the core logic is concerned. We still need to define a Board structure which will store the states of the cells and a way to print it. But this can be achieved using more standard Rust.

The Board

We now need to define the structure of the board and the operations that can be applied to it. In Rust, we can use astructto create the board. This is somewhat similar to a class in Python, although the usual conceptual caveats when translating ideas between languages apply.

We use periodic boundary conditions for the edges of the board, so that each cell has exactly 8 neighbours. In the following code, the function new creates a new board from a number of columns and a vector of ciphertexts defining the initial state (the number of rows is inferred from the vector length). This function assumes that the length of the vector is a multiple of n_cols. (Updating the board will panic if that is not the case; here we aim at keeping the code simple, but a production-ready implementation should check the length of the vector and raise an error if it is not a multiple of the number of columns.)

https://medium.com/media/d312f81130c80f56fc69d2a0fde5b814/href

We also define theupdate process for updating the cells based on the states of their neighbours as a public function of the board. This process steps through each cell on the board, acquires the ciphertexts corresponding to the neighbouring cells, and then applies theis_alivelogic to determine if the cell will be alive or dead after the update. It then pushes the ciphertext output of is_aliveto a new array. Once every cell has been evaluated, the ciphertexts in the oldstatesarray are overwritten by the ciphertexts in thenew_states array, and the process is ready to begin again.

Main Program

Ok, that’s all the individual tools we needed to create. We can now start writing our main program algorithm. We start by generating the board and theclientandserverkeys.

The FHE scheme applied by Concrete is a symmetric version of TFHE. This means that information is encrypted and decrypted with the same key, which must be kept secret. This is theclientkey, which allows a user to encrypt information, send it for processing and then decrypt the result.

The role of the client key is a familiar concept in cryptography. Theserverkey occupies a role which is specific to FHE; it’s the thing that enables the bootstrapping process.

The structure of the server key can be more complex; however, the core idea is that it is an encrypted version of the client key that is used when executing the homomorphic decryption circuit during the bootstrap. Theserverkey needs to be sent with the data to enable processing, which is why we need to include it in oursum function.

We then define the initial state of the board

https://medium.com/media/c5b7ff9dac8588f6183a8ce6bfba5a58/href

Now, your cargo project should contain a folder labelled “src”, containing the files “lib.rs” and “main.rs”, and a “cargo.toml” file.

We now need to compile and run this code. In a terminal window, navigate to the homomorphic_game_of_lifefolder and run the commandcargo run --release. This will compile and run the code. If you’re using the default settings included in the above code, this will define a 6 by 6 board with a ‘glider’: a configuration which moves diagonally (in this case, from the top left to the bottom right) while keeping its shape. The output is printed in the terminal after each update (it looks a bit better with a square font).

As you will probably notice, this is a bit slow, taking about a minute per update on an Intel i7 CPU @ 3.6GHz. The main reason is that it makes use of only one thread, while modern CPUs generally have several cores which can work in parallel. And, at a first sight, it may seem that the Game of Life should be easy to parallelize: since each cell can be updated independently of the others and since evaluating the new state of each cell is, by far, the most intensive part of the code, one can in principle use as many cores in parallel as there are cells in the board with negligible overhead.

A small difficulty in this regard is that the server key can not be shared between threads (technically, this is because the ServerKey type does not implement the Sync trait; the reason is that a ServerKey has a buffer which is modified during the bootstrap, so only one thread should access it at any given time). However, there are ways around this, for instance duplicating the server key so that each thread can have its own (an example is given here, taking about 10s per update).

As previously mentioned, there is also certainly room to make the boolean circuit used to update the cell states more efficient — for instance, the sum of the the states of three consecutive cells on a row or column could be used for updating one cell on each side.

At Optalysys, we are using optics and silicon photonics to compute Fourier transforms at the speed of light and accelerate Deep Learning, scientific computing, and Homomorphic Encryption. Check out our website and Medium page for more information on what we are doing!

Concrete Boolean and Conway’s Game of Life: A Tutorial was originally published in Zama on Medium, where people are continuing the conversation by highlighting and responding to this story.

Fully Homomorphic Encryption and the Game of Life

Optalysys — Wed, 20 Oct 2021 16:09:20 GMT

Concrete Boolean for efficient FHE meets optical computing

With Florent Michel, Joseph Wilson and Edward Cottle.

Fully Homomorphic Encryption (FHE) offers the ability to perform arbitrary operations on encrypted data, providing an elegant solution to one of the largest and hardest-to-solve security vulnerabilities in cloud computing: the need to decrypt data before processing it.

Imagine a world where organisations can share and collaborate on sensitive data without any risk of it being leaked. A world without the endless stream of database breaches or thefts. A world in which you can have a smart speaker in your house without worrying about who might be listening.

FHE is the solution, but right now there’s a catch: it is incredibly slow relative to unencrypted processing.

Thus far, the speed of FHE operations has been the main barrier to its uptake (if you need details, we include a summary at the end of this article) It’s simply not fast enough to keep up with the vast amounts of data that need to be handled every single day. However, that’s starting to change.

New tools and new technologies are not just making FHE faster, they’re also making it considerably easier to use. The field is rapidly evolving, and we’re starting to see the practical groundwork being laid for applications that go beyond limited demonstrations.

To highlight the changing face of FHE, in this article we’ll be combining cutting-edge advances in FHE and next-generation optical computing techniques to securely execute a classic example from computer science and complexity theory: Conway’s Game of Life.

And in our next article, we’ll be demonstrating how to perform a string search using Concrete Boolean. This will allow encrypted analysis of text for keywords, and is a precursor to more complex applications such as searching encrypted databases.

Several fairly big advances have recently conspired to make these articles possible. The first is that we’ve finished demonstrating how we can use a core 2-dimensional optical Fourier transform to execute arbitrary Fourier transforms of any size, shape and precision.

This allows us to apply the optical system to FHE schemes that use these operations to make multiplication much more efficient, without having to make any changes to the mathematics of the scheme (as we did in our last article on the subject, where we converted TFHE into a form that used 2d multinomial operations instead of polynomials). This in turn ensures that the security properties of these schemes are left untouched by the use of the optical calculation method.

The second is that we’ve finished making some adaptions to our simulator architecture to help us work with optical Fourier transforms. We’ve described this simulator before in our paper on optical convolutions for neural networks; these recent updates were intended to make the simulator more modular, allowing us to use different sources for modelling or executing the core optical Fourier transform.

These sources range from full electronic simulation of the optical result through to results generated via linear combinations of real optical data (as used in our paper and previous articles), with the ultimate goal being to provide a connectivity framework and standard interface for our next generation of physical demonstrator chip systems.

The third is that Zama, one of the biggest names in FHE, recently released their Concrete-Boolean library.

Concrete Boolean

Zama is a big name in FHE for a reason. They’re the developers of a library called Concrete, which provides the programming tools needed to implement an FHE scheme known as TFHE (FHE over the Torus). We’re not going to delve too deeply into the maths behind TFHE here as we’ve covered it to a degree in our previous article on the subject, but there are some deeply clever features in Concrete that make it extremely powerful.

The major draw of Concrete lies in the ability to perform what is known as “programmable” bootstrapping. Bootstrapping is the process in FHE that reduces the noise build-up in a ciphertext after repeated homomorphic operations, and normally represents a major source of computation which is otherwise wasted.

Programmable bootstrapping turns that around by allowing the bootstrapping stage to be used for useful calculation, specifically the application of non-linear functions to encrypted data. This provides a massive boost in speed to FHE calculations, especially for applications such as encrypted AI processing.

By using programmable bootstrapping to apply a ReLU activation function, Zama has achieved cutting-edge performance in homomorphically executing an AI network featuring hundreds of neurons.

Zama have recently expanded Concrete with a new library that provides users with the tools to directly apply Boolean operations (computing gates) to ciphertexts. This includes the essential NAND and NOR gates, although Concrete Boolean directly supports the majority of computing gates (specifically, the one-input NOT gate, two-input AND, OR, XOR, NAND, NOR, XNOR gates, and the three-input MUX gate).

This is a major step forward that provides an additional level of abstraction and creates a handy target for “transpilers”, tools that convert source code from one language into another. It is now plausible to write a transpiler that breaks a program written in one programming language like C or Python down into Concrete Boolean gate operations, similar to the way that a compiler converts code into assembly language for physical circuits. This is an essential tool if complex programs are to be run in encrypted space.

So how fast is FHE these days? On a laptop running a modestly powerful processor (specifically, an Intel i7 CPU @ 2.6GHz in single-thread mode), the time to evaluate each binary gate under Concrete Boolean ranges between 18–11ms depending on the security parameters used. That’s remarkable by the standards of the field, and an indication of how far FHE has come; the days of an entire server rack unit taking between 6 seconds and half an hour to execute a single bootstrap are long past.

When working on plaintext data the same laptop processor can evaluate multiple gates within a nanosecond, so FHE is still some way off being described as “quick”. That’s where our optical Fourier transform approach comes in, targeting the 70–90% of the total computational load of FHE that is taken up by performing Fourier transforms for polynomial multiplication. Through the optical method, we can provide enough of a boost in speed and power efficiency to definitively shift the balance and allow for the kind of high-value yet high-intensity applications that make FHE attractive.

Beyond the immediate utility of providing the gate-level abstraction, Concrete Boolean also takes a lot of the sting out of the cryptography itself by providing default 128-bit security parameter sets.

In short, not only does Concrete Boolean provide another significant bump in FHE performance by providing an optimised solution for gate operations, but you no longer need to be a cryptography pro to use it safely. Beyond any consideration of performance, that’s the kind of progress needed to make practical FHE a reality.

To demonstrate, let’s put this to the test.

Conway’s Game of Life

We’ve chosen Conway’s Game of Life (named after its inventor, the mathematician John Horton Conway) to demonstrate the capabilities of Concrete-Boolean. But what exactly is the Game of Life, and why is it such an important demonstration?

Well, first off, the Game of Life isn’t a game in the conventional sense; there are no players. It’s a computational process that you set up with some initial conditions and then watch it change over time. It also has a very basic rule set.

The Game of Life is “played” on a 2-dimensional square grid. The size of this grid isn’t fixed; it can be as large or as small as you like (at least, within the limits of computing memory). Each cell within this grid can be in one of two states; “alive” or “dead”, and each cell has 8 neighbours (including ones at the edge if the boundaries are treated as periodic). The system evolves in discrete iterations according to the following rules:

Birth: A dead cell with three (no more, no less) live neighbours becomes live on the next iteration.

Survival: A live cell with two or three live neighbours lives on to the next iteration

Death: A live cell with less than two or more than three live neighbours dies on the next iteration.

The rules are only slightly more complex than, say, tic-tac-toe, but the behaviour of the system can be wildly unpredictable. The Game of Life is an example of a cellular automaton, a simple system from which complex and unexpected behaviour can arise. Stable configurations of cells can arise within the Game of Life; from the above rules, it’s fairly easy to see that the following configuration will remain stable over any number of iterations:

This shape is static; it doesn’t change across iterations. What is far more remarkable are the configurations that remain stable even as they move throughout the grid. Below is an example of one such configuration known as a “glider” as it steps through 5 iterations, eventually returning to the original configuration in a different location.

Gliders are not the only class of structure that can arise within the game, but they are one of the more recognisable. The Game of Life and the related concepts that arise from cellular automata have had a significant impact on many different scientific fields (see this for more information), but why is the ability to run the Game of Life such an important showcase for Concrete Boolean?

Despite the simple rules, the Game of Life isn’t trivial. While deterministic in the sense that the outcome will always be the same for a given grid and initial starting configuration, it’s also very hard to predict an outcome from intuition. Structure and order arise seemingly spontaneously from random starting conditions; this will be an example of FHE doing something truly computational. In fact, the Game of Life is itself Turing complete, which means that it is theoretically capable of accomplishing any sequence of operations that can be executed on a computer.

It’s even possible to run the Game of Life within the Game of Life! People have also taken things further, constructing functional computing architectures (such as an 8-bit Arithmetic Logic Unit) within the game.

https://medium.com/media/85be5faf60e1b5b9164da70a509d1a12/href

So to summarise, by running the Game of Life through FHE computations, we’re using a physical Turing machine (a desktop computer) to execute a homomorphic Turing-complete process (Concrete-Boolean) to simulate a cellular automaton that is *also* Turing-complete.

Constructing the Game of Life using Concrete Boolean

General idea

To build a homomorphic version of the Game of Life, we first need to convert the rules of the Game of Life into boolean expressions. Since each cell can be in either one of two states (‘alive’ or ‘dead’), its state may be represented by a single boolean variable. We use ‘True’ to indicate that a cell is alive, and ‘False’ for dead.

After an update, a given cell is alive if and only if one of these two conditions is true:

It was already alive and had 2 or 3 neighbours.
It was dead and had exactly 3 neighbours.

The first thing we can do is figure out a way to implement this through Booleans in an unencrypted form. If the variable alive(i) denotes the state of the cell and neighbours(i) its number of neighbours after i iterations, then the state of the cell after i+1 iterations is given by:

alive(i+1) = (alive(i) AND (neighbours(i) = 3 OR neighbours(i) = 2)) OR (NOT(alive(i)) AND neighbours(i) = 3)

which can be slightly simplified to:

alive(i+1) = (neighbours(i) = 3) OR (alive(i) AND neighbours(i) = 2).

This is the governing expression; we can apply it to each cell on each iteration and the system will evolve according to the rules.

This is already close to a form that we can work with in Concrete Boolean. However, we still need to compute the number of neighbours. Without encryption, this is easy: for each cell, we can just sum the states of the 8 neighbours, with ‘true’ counting for 1 and ‘false’ for 0. This is also technically possible in FHE: given ciphers corresponding to some numbers, summing them (at least under TFHE; other schemes may require more complex operations) will give you a cipher for their sum. However, there is no easy way to compare the result against the values 2 and 3 without decrypting it. While this is certainly possible using a combination of gate operations and programmable bootstrapping, we’ll give a simpler solution using only the former.

Additions, like any other computable operations, can be evaluated using boolean circuits. For instance, if b1 and b2 are two bits, the two-bit number c1 c2 where

c1 = b1 AND b2
and
c2 = b1 XOR b2

is their sum. In our case, since each cell has 8 neighbours, the sum of their states can be encoded in a 4-bit number. We can use the Boolean addition method to calculate this number for any given cell; this is the central principle through which we can begin building our algorithm for calculating the number of living neighbours.

We can therefore reformulate the problem as follows:

Design an accumulator circuit which can sum up to 8 bits (sequentially), with a 4-bit output.
Use it to compute the number of neighbours of each cell and plug it into the formula used to update its state.
Convert the circuit to work on encrypted data using Concrete-boolean.

Before we turn to the implementation, there’s an additional simplification we can make. We do not actually need to know the exact value of the number of alive neighbours: we only need to know whether it is equal to 2, 3, or something else. In particular, we do not need to distinguish the values 0 and 8, since the cell will be dead in either case. For this reason, we actually only need 3 bits for the output of the adder, with the value 8 identified with 0. This will naturally happen with a 3-bit accumulator as a result of integer overflow.

Implementation

Let’s now put the above into practice. Our code, like the Concrete library, is written in Rust. More information on the Concrete Boolean library and how to use it can be found in the documentation. We’re also writing this with replication in mind, so for those so inclined to get to grips with Concrete Boolean we include some of the behind-the-scenes information about how Rust works.

Here, we start by first importing the relevant function and types from Concrete-boolean. Don’t worry too much about the syntax if you’re unfamiliar with Rust; this is conceptually similar (at least from the user point of view) to importing a library in C or Python:

https://medium.com/media/9047ffa527614eed80ed37b5edc67baf/href

Here, gen_keysis the function used to generate the client and server keys, ServerKeyis the type of the server_key, andCiphertextis the type of the ciphertext. For Concrete Boolean, these key and ciphertext types correspond to the default pre-set security parameter options, so we won’t be interfering with them.

NB: If you use this code in a library, you may want to add the keyword pub beforeuseso that users have access to these function and types.

To use it, you will need to import the concrete-booleanlibrary by adding the following line to the dependenciessection of your Cargo.tomlfile:

https://medium.com/media/07d019debd2eda560567e37245e55309/href

Let’s first tackle how to build the accumulator. For simplicity, we can break this up into two parts. The first part takes one server key, one encrypted bit, and a tuple of three ciphertexts encoding a 3-bit number, and returns an encryption of their sum. This is essentially a simple adder circuit made out of homomorphic Boolean gates:

https://medium.com/media/5d581579742211f90bd3331ab912e39c/href

This code is relatively straightforward: for each bit of the result, we perform a XOR operation with either the input aor the carry, and compute the next carry via an AND (except for the last bit, where no carry is needed). All operations are performed homomorphically thanks to the server key.

One element which may surprise readers who aren’t familiar with Rust (or C++, which uses a similar notation) is the use of the ampersand (&). This is used to pass objects by reference rather than by value; a function with an argument starting with an ampersand takes the memory address of the object to find it when needed, rather than creating a new object or moving it. (The details are, of course, a bit more involved than that. For the interested reader, the Rust book provides a good introduction to how objects can be passed to functions in Rust.) Contrary to C++, in Rust the ampersand must be specified both in the function declaration and when called.

Another peculiarity of Rust compared with other popular low-level languages is that a return statement is not always needed. Specifically, if the last line of a function is an expression which does not end with a semicolon, then the value of this expression is returned.

We can now use the above function to write the second part, the accumulator which sums a sequence of encrypted bits modulo 8:

https://medium.com/media/2963cd8659b1e584c0ccf3e6d80663f0/href

Again, this function is fairly straightforward. It takes as arguments a server key, a vector of ciphertexts, and a tuple of three encryptions of 0. It then defines the result by summing the first element with zeros, accumulates the other elements, and returns the result. Notice that this function will crash (or ‘panic’ in the Rust terminology) if elementsis empty. This is not a problem in our case as we know that it will always be of size 8.

Finally, the function is_alive below returns an encryption of ‘true’ if the cell is alive after the update, and ‘false’ otherwise:

https://medium.com/media/549d360099a67ce94bc601320151450a/href

This is a direct implementation of the formula we described earlier for updating the state of a cell. The only difference is that we check whether the sum has value 2 or 3 rather than just 2 for the second option. This is because this check is actually simpler, as it does not require us to read the lowest bit.

The core logic is now implemented! Now we only need to define a board structure and a function to handle the evolution in time. For definiteness, we use periodic boundary conditions, which ensure that every cell has 8 neighbours:

https://medium.com/media/6bf437af6f1bcc85fb99ab6068a0b784/href

Example

Here is a simple example of use, with a 6 by 6 board:

https://medium.com/media/dc505558716bfd0c78e253355c1ecc1d/href

This initialises and calculates the same ‘glider’ configuration that we showed pictorially before, a pattern that moves diagonally while keeping its shape.

To be able to see the system evolve, we also need some way of extracting the information each time the system changes. At each iteration, we return a ciphertext that encodes the updated state of the board. The above code also decrypts each of these ciphertexts and prints the result on the terminal (the output thus looks better with a square font). Here is the result:

Optical Acceleration

The above code works on electronic computers; you can try it out for yourself by copying the above gists into a single Rust file and building it (you’ll also need to have a Rust compiler, the FFTW Fast Fourier Transform library and the Concrete library).

However, as we said above, we aim to demonstrate that key operations in this process can be executed and accelerated by optical Fourier transform hardware. To this end, we also wrote a version of the above code that makes use of a simulated optical Fourier transform and returns performance benchmarks.

The simulated optical Fourier transform

In previous articles, everything we’ve done has used results from physical demonstrator hardware to perform calculations. In this article, we’re using a simulator written in support of our beta program to execute the calculations. This is mainly because executing a serious amount of FHE already requires intensive resources; doing so using the current demonstrator system (which uses thermal modulators that operate in the kHz range) would be slower still, so here we reach for a compromise between speed and accuracy.

This simulator also uses the software interface designed for the physical system, and will be available to partners on our beta program for benchmarking and testing applications.

At the heart of the simulation is a 2-dimensional complex Fourier transform executed to 10 bits (signed 4-bit real and imaginary values) of precision. The raw value also contains an amount of simulated optical noise representative of what we see in real optical Fourier transforms. We then take this value and process it according to the steps we’ve described in previous articles for re-shaping and increasing the precision of an optical transform.

Step-by-step overview of the cycle of operations executed by the simulator. The reshaping and precision-increase operations, while executed in software here, are functions that can be executed at very high speed by the supporting digital logic for the optical chip. The optical simulator therefore acts according to the same workflow as the physical system we have designed. Performance estimates are given based on the number of individual optical frames required to process the ciphertexts for both the forward and inverse Fourier transform stages.

Performance

We executed the Game of Life FHE model as described in the previous section, but with the simulated optical Fourier transform process used in place of the standard FFT function used by Rust. The only difference to the code is made through the use of a custom version of Concrete where the call to execute the Fourier transform is replaced with a call to the custom FT function.

This custom version is somewhat slower than the original, for two reasons. First, since the optical device is fundamentally different than a CPU, the optimal way to perform a Fourier transform on the former is not optimal on the latter. This is because our current optical device effectively works with a radix-49 decomposition for the FFT algorithm (climbing to radix-121 for the final hardware design), while CPUs tend to work better with radix-2 or 4 decompositions. Second, we need to simulate the behaviour of the optical device, which takes longer than just computing a Fourier transform in the usual electronic fashion.

But the speed of our customised version of Concrete is not directly relevant here: our goal is not to develop a faster version of this library (simulating a non-electronic device would certainly not be the right way to proceed), but to estimate how fast it can run using an optical Fourier transform chip operating with the speed and efficiency of the architecture we have developed. Based on these parameters and the results from our simulations, there is good news here: the system could execute the Game of Life demo much faster than a CPU or GPU!

To be more specific, for the 6 by 6 glider example, a modern CPU (Intel i7 @ 3.6GHz) takes about one minute per iteration. Most of this calculation is expended in executing the Fourier transforms needed for efficient ciphertext multiplication. By comparison, an optical device running at 1GHz (the target frequency of our design) would take less than a second to perform the Fourier transforms required for each iteration. We can thus expect to reach a speed-up of a factor of 60, simply by using the optical Fourier transform in place of the electronic one!

In practice, the speed-up for the whole calculation will be smaller because other operations will become the limiting factor. Eliminating the Fourier transform as the primary bottleneck will introduce memory access rates as the limiting factor to FHE, although this is a problem that can be addressed through engineering; overcoming the fundamental limits of the electronic Fourier transform is considerably harder, and our technology is the only hardware that can achieve this improvement.

Taking these caveats into account, we can realistically expect an order-of-magnitude improvement. And this is just the beginning: greater improvements in speed and efficiency will be realised with next-generation optical devices and software improvements that make better use of the optical Fourier transform.

Appendix: A brief intro to FHE (and why it’s currently so slow)

We’ve described why FHE is so slow in some detail in a previous article, but here’s a quick(ish) introduction that describes what FHE is, how it works and what that means for calculations.

Most encryption schemes don’t preserve the property of homomorphism. Under most encryption schemes, adding or multiplying pieces of encrypted information will not return the correct output on decryption. This is possible under FHE.
If you add or multiply two pieces of encrypted data together within an FHE scheme, then when the result is decrypted it should be the same as if you added or multiplied two unencrypted pieces of data together. (Note, however, that the encryption and/or multiplication in encrypted space may be more complex than the usual ones.)
If you can perform additions and multiplications, you can perform the basic operations of Boolean algebra on the data. Ultimately this allows you to perform any sequence of circuit operations that a CPU would, but on encrypted data.
This is inherently slower than working on unencrypted data, with the bulk of the computation lying in the multiplication of large polynomials. However, to make matters worse, we have to contend with noise.
Practical FHE schemes are built on some variation of the learning-with-errors problem, which means that some noise has to be added to the information as part of making it secure.
Adding or multiplying encrypted data increases the amount of noise present in the information. If this noise becomes too large, the data cannot be decrypted. This would impose a limit on the depth of the circuit you can construct.
To get around this problem and allow fully homomorphic encryption, you can construct and run the decryption circuit for the scheme using homomorphic operations.
This process “decrypts” the data you are working on into a form that is still encrypted, but reduces the noise back down to a manageable level. This process is called “Bootstrapping”, and allows us to evaluate circuits of arbitrary depth.
However, now you have to keep executing that decryption circuit repeatedly to manage the noise. This makes FHE even slower; generally speaking, it’s about a million times slower than working on unencrypted data.

So how can we speed this up? Fortunately, where optical computing is concerned, there’s a very easy win.

Optical Fourier transforms for FHE

Most FHE schemes use a variant of the ring-learning with errors problem known as ring-learning with errors (RLWE), where data is expressed as the coefficients of polynomial functions. This makes the ciphertexts a bit more compact and means that all the multiplication steps involved are polynomial multiplications; these can be efficiently accelerated by using the Fourier transform and the convolution theorem.

This is well-established; most FHE libraries and tools already require that a dedicated fast Fourier transform library be installed. TFHE (the basis for Concrete) advises the use of FFTW3, which stands for “The Fastest Fourier Transform in the West (version 3)”. However, while these libraries are highly optimised, they’re still limited by the algorithmic complexity of the electronic FFT.

Despite this limitation, executing Fourier transforms is still more efficient than the naive polynomial multiplication approach (the method typically taught in schools, where the number of operations scales quadratically with the size of the polynomials). While more efficient, this is still remarkable expensive; between 70–90% of the computational work done in executing a sequence of FHE calculations is taken up just by calculating the Fourier transforms!

If you have a very fast, very power efficient optical computing system for executing the Fourier transform, you can vastly accelerate these calculations in a way that no other hardware can. An individual optical Fourier transform is nearly instantaneous and is an O(1) operation, which means that the time taken to perform the task doesn’t increase even as you add more data points to the calculation.

A system that executes a billion 11x11 optical Fourier transforms every second (and can rapidly recombine and reshape the results into an arbitrary transform) can blast through the bulk of the calculations required for FHE faster than any corresponding electronic device. Even systems designed for mass parallel computation (such as GPUs) are at a disadvantage; not only do these systems still need to perform an electronic Fourier transform, but they also need sufficient data to be passed to the system to make effective use of the parallel cores.

That works for the kind of multiplications you might do for deep learning (we can also accelerate that too, by using the same convolution theorem principles that we apply to FHE to eliminate excess operations in certain deep learning network types), but FHE ciphertexts typically aren’t so large that GPUs are tremendously useful, certainly not to the extent that they need to be to make FHE a practical reality. By contrast, an optical Fourier transform system is effectively an extremely fast serial device, delivering maximum performance under a broader range of computational loads.

In summary, the optical Fourier transform approach is a uniquely powerful tool not just for FHE, but for any process where rapid, large scale Fourier transform calculation is either essential (e.g signals analysis and telecoms) or useful (convolutions).

Fully Homomorphic Encryption and the Game of Life was originally published in Optalysys on Medium, where people are continuing the conversation by highlighting and responding to this story.

Higher numerical precision for optical Fourier transforms

Optalysys — Mon, 23 Aug 2021 16:52:35 GMT

We’ve described before how our optical computing system has some inherent advantages over electronic methods when it comes to executing the Fourier transform. Beyond questions of speed or efficiency, because we’re performing calculations using a near-instantaneous 11x11 optical Fourier transform as a basic element (corresponding, more or less, to a radix-121 decomposition), we can cut down on the fundamental algorithmic complexity of this process.

What we’re now covering in this article is the second half of a bigger picture, the question of how we can use a physical Fourier transform of a given size and shape to compute arbitrary digital transforms. We’ve already laid out how we can use this ability to construct larger Fourier transforms in terms of spatial dimension, but thus far we haven’t addressed the question of precision.

Precision in a result matters. Without it, calculations can return answers which are badly wrong. Ensuring the precision of the results returned by our optical system is critical to everything that we see the system being used for.

Combined with our prior work on extending the size and dimension of the transforms we can process, we can now comprehensively demonstrate how the Fourier-optical method of calculation can be used to construct a general-purpose Fourier transform processor of tremendous efficiency.

How efficient? Top-range electronic hardware for mass Fourier transform calculation can perform approximately 3 billion 64-bit operations per Joule of electrical energy transferred to the system. The combined optical-electronic architecture that we are developing can improve on this by a factor of more than 4!

When it comes to an entirely new form of computing, the rules which governed precision in previous technologies can no longer be assumed to hold. In this article, we provide rigorous mathematical demonstrations alongside illustrations that lay out the rules for ensuring that the output of a Fourier-optical computing system is both correct and accurate.

The method we adopt is not just highly efficient from a computational perspective. It also gives us immense flexibility with respect to different applications, as it allows us to provide accurate results at any precision. This allows us to work to the desired precision of any host system where ultra-fast transforms are needed with no changes to the optical architecture.

This is a long article, and it may be that not all of it is of interest to every reader, so we provide links to the different sections below.

Article contents

Precision in digital and analogue systems

In an electronic context, precision refers to the number of bits that are used to describe a number. For example, a 4-bit binary value can represent 16 (2⁴) different levels in base 10 representation.

Performing calculations to a given precision in digital systems is notionally a solved problem. In practice, precision (and more generally how numbers are represented in computers) is still a subject of ongoing development, especially in an age where advances in silicon feature size have slowed.

In recent years, this has led to a move towards low-precision computing in certain applications as a way of getting more effective performance out of a silicon manufacturing node. The advantages are immediate; lower-precision values require far less hardware real estate in terms of logic elements, memory and bus width, thus allowing more the performance of more parallel operations with the same number of components. In a world where the OPS rate is an important metric, that’s a big deal.

This move has been performed alongside research into the suitability of low-precision representations. The findings can be surprising; even 4-bit numerical representations have been found to be suitable for some AI networks.

However, this research has been performed in purely digital systems. Our optical computing system is at heart a kind of analogue computer. Analogue values are continuous; they don’t take discrete levels, but represent numbers to an arbitrary degree of precision. In our case, we use the intensity and phase of light as a way of representing complex numerical values and performing calculations on them. Notionally, these properties of light can take any value, which would pose a problem for digital systems that can only accept a limited range.

In practice, because we are encoding digital values into the light, we should be able to extract discrete values from the output. This is still true, but we face a couple of unique challenges:

One of the properties of the Fourier transform is that the output of the transformation of a step function is a spatially continuous function that must be accurately sampled.
The fundamental mathematics behind the Fourier transform mean that the values we retrieve are often irrational, and therefore difficult to quantise.

We need a way of overcoming both of these problems. Fortunately, they’re quite closely related, which means that we can develop a solution that addresses both at the same time.

Let’s start from the basics:

The Fourier transform of a step-function (left) is a continuous sinc function in frequency space (right). This is an inherent property of the Fourier transform, not a consequence of analog computing.

If we consider a input grid of spatially discrete sources of light (with perfect fill-factor of the grid), we can see from the above that the output of these points will be continuous even with discrete inputs in terms of amplitude.

A 2-dimensional grid of discrete input amplitudes.

A 3D representation of the above; discrete amplitudes (represented by height and colour) over discrete spatial locations

The continuous Fourier transform of the discrete values. Ensuring that this is correctly quantised into a digital signal is not trivial; any errors will grow if the result is used as part of a higher-precision calculation

While spatially discrete sampling of the above should return the correct answer, the continuous nature of the transform means that in practice we need some additional guarantees. What we need is a sure-fire method of mapping discrete inputs to discrete outputs, like so:

The problem that we’re trying to solve in this article: how to go from a discrete input to a discrete output in an analog optical computing system such that we can reliably build higher precision in a result.

Then, even in the presence of noise or other aberrations that might otherwise flip a bit the wrong way, it should be possible to correctly assign each signal to the corresponding bit-value.

Quantised 2-bit outputs with some optical noise; as long as the noise falls within half the distance between levels, we can successfully assign each signal to the correct digital representation.

The above has important implications if we want to be able to increase the precision of a calculation. There is a technique that we can borrow from digital electronics called bit-shifting that allows us to do this, but attempting to use bit-shifting without addressing the above can lead to errors which grow exponentially at each iteration.

We need a solution to this; one which overcomes this problem in a way that yields results which are both correct (returns the output we would expect for a transform executed to the same precision in a digital system) and deterministic (sending the same input several times should yield the same outcome) when the output of an optical Fourier transform is converted into a digital representation. One which is also fast, and doesn’t hinder the advantage that optics can provide.

Is there such a solution? There is. In this article, we lay out how we can use the properties of convolution to ensure that these conditions are met.

Bit shifting

First, let’s start with a description of how we can reconstruct a higher precision digital result from operations performed in a lower-precision system. Doing so involves manipulating the properties of binary numbers. Consider a binary value such as

110011100110 = 3302

The individual values running from left to right represent decreasing powers of 2; we can re-assemble the base-10 value represented by this binary value by adding these powers of two together.

100000000000 = 2¹² = 2048

010000000000 = 2¹¹ = 1024

000010000000 = 2⁸ = 128

000001000000 = 2⁷ = 64

000000100000 = 2⁶ = 32

000000000100 = 2³ = 4

000000000010 = 2¹ = 2

2048 + 1024 + 128 + 64 + 32 + 4 + 2 = 3302

Bit shifting in practice

We can further manipulate this property; for example,

1100 + 0001 = 1101

is normally equal to (8+1 = 13), but if we then shift that value to the left by 4 places (here, we do this simply by adding zeroes on the right), this changes the value of the result

11010000 = 208

This is the same as directly adding

11000000 + 00010000 = 208

Alternatively, if we then perform another operation on a pair of 4-bit values, e.g

0101 + 0011 = 1000 (5+3 = 8)

we can append this result to the end of the first value as we shift it left, resulting in

11011000 = 216

which is equivalent to writing 208 + 8 = 216.

The above process is also equivalent to directly adding 11000101 and 00010011 together! If we can only work on a limited number of bits during an operation, this gives us a way of building up the precision in a result. However, this is only possible if the operation we are performing is linearly separable. Here’s what that means for Fourier transforms.

Linear separability in the Fourier transform

Consider a set of values arranged over a 2-dimensional grid.

We can take the Fourier transform of this grid directly, or we can separate this grid like so…

…and then perform the Fourier transform on each sub-grid, before summing the results to retrieve the Fourier transform of the original grid. The values over which we perform the Fourier transform can thus be linearly separated; you can check this property using the following gist:

https://medium.com/media/feaa8e2c516ac92863cf3eb9f5090fa9/href

The exact result for the above is also an integer-valued complex number.¹ If the system is deterministic and the outputs are also expected to be integers, then we can use linear separability and bit-shifting indefinitely for greater precision in the Fourier transform.

But what about the case where we’re working with floating-point numbers, the standard digital method for representing decimal values (e.g 1.156832)? These come up quite frequently in mathematics and we have to be able to account for them in our system. We also need to account for them if the size of the Fourier transform in one direction is larger than 4, as even an integer-valued input will then yield (except in specific cases such as a uniform input) a complex-valued output with irrational real and imaginary parts.

Fortunately, this is a simple process; most programming languages do this automatically (e.g. Python, Julia, and C/C++) or explicitly (e.g. Rust), allowing mixed use of floats and integers with conversion performed as needed. Here’s how this is done in electronic systems:

The integer is converted into a representation in terms of the sign (positive or negative) and a binary representation which is always positive.
This positive binary number is then converted to a fixed-point representation by shifting the decimal point (hence the “floating-point” name) to the right of the most significant bit in the binary value.
The number of times the decimal needs to be shifted is called the exponent. The binary value to the right of the decimal is now known as the mantissa.
The exponent is expressed in an offset-binary representation.
The floating-point value is represented as a combination of the sign bit, the exponent, and the mantissa

This operation can be performed very quickly in digital systems. Floating-point values can be converted to integers, processed optically, bit-shifted and converted back into floats much faster than performing the Fourier transform directly on a grid of floats. Despite operating on two different types of number, the outcome of the above methods is identical.

However, all the above is dependent on ensuring those vital conditions of determinism and correctness in the result, so let’s now take a look at how we ensure this.

Increasing accuracy using convolutions: the chirp z-transform and Bluestein’s algorithm

In this section, we show how we can solve the discrete-to-discrete mapping problem for an analogue-optical Fourier transform. This algorithm can actually be used to compute a more general transformation called the chirp z-transform.

Chirp-z transforms are a generalisation of the discrete Fourier transform. Before we explain what this means, we first consider the term highlighted in the discrete Fourier transform equation below:

This term represents a “complex root of unity”, complex-number solutions to the equation

for some positive integer n. The complex roots of unity all lie on the same circle in the complex plane, called the unit circle, with centre 0 and radius 1.

Complex roots for n=6 lying on the unit circle

The symmetries in the complex roots of unity are the key to understanding the utility of most Fast Fourier transform algorithms, as it is this symmetry that allows the FFT to cut down on what would otherwise be duplicate calculations in the discrete Fourier transform. However, this also presents a problem for precision limited systems; the complex roots of unity (when expressed in the usual a + bi form of representing complex numbers) generally² have irrational coefficients.

To relate this back to an optical Fourier transform calculation, what we are detecting at the photodiode array is also a complex number in the form a + bi, where the values of a and b are determined by the quantisation of the photocurrent. However, attempting to quantise an irrational value with a limited number of bits will introduce a source of truncation error into the result. If we then bit-shift these results, the magnitude of this error will also grow to the same scale as the maximum shift.

A “pure” complex number. The real and imaginary parts can have irrational values.

The same complex number relative to the discrete grid of points that can be represented with increasingly large binary value representations of complex numbers. The distance between the “true” complex value and the nearest grid point represents a source of inaccuracy known as a truncation error; this error decreases as the precision of the result in bits increases.

We can see in the above that increasing the precision of the result will reduce the truncation error, but what do we do if we need to increase this precision through bit-shifting? Bluestein’s algorithm gives us a way of overcoming the truncation error by re-phrasing the Fourier transform in terms of a discrete convolution. Since this can be used to compute any chirp-z transform using an optical system, we first briefly describe this transformation, which has its own set of applications.

Chirp z-transforms as generalisations of Fourier transforms

Formally, a chirp z-transform is a function that takes a complex-valued vector as input and returns another complex vector, obtained by summing its coefficients with weights given by powers of some fixed complex number. More precisely, the chirp z-transform of input size N and output size M with parameter z, where N and M are positive integers and z is a complex number, is the function defined by:

Notice that, if M = N and z = exp(-2 i π / N), cz(x) is a discrete Fourier transform of x. In this sense, chirp z-transforms are direct generalisations of discrete Fourier transforms.

As with the DFT and the complex roots of unity, they may be understood geometrically as follows: a chirp z-transform corresponds to summing the components of a vector multiplied by weights which are distributed on a logarithmic spiral in the complex plane, with a fixed angle difference between two consecutive points and starting from 1.

Illustration of the effect of changing the parameter z on the chirp z-transform for N = 12. Black dots show the positions in the complex plane of the powers of z by which the coefficients of the input vectors are multiplied, and the blue line materialises (part of) the logarithmic spiral where they are located. In the middle panel, z has modulus 1 and its phase is an integer fraction of 2π. Its powers are then regularly spaced on the circle with centre 0 and radius 1, and the chirp z-transform is equivalent to a discrete Fourier transform. In the left and right panels, the modulus of z is, respectively, smaller than and larger than 1, and its phase is not an integer fraction of 2π. The corresponding transformation is thus different from the discrete Fourier transform.

In the next few sections, we show how the chirp z-transform (and thus the discrete Fourier transform, as a particular case) can be reformulated as a convolution. This is a bit more technical, so at the end of this article we also provide a visual interpretation that ties everything together.

The reason we may want to re-phrase the chirp z-transform as the outcome of a convolution is that we can then ensure that the results fit into known quantisation bands, eliminating the potential for error. Of course, any optical noise present in the result will also be carried forwards with the convolution, but as described previously this can be accounted for in the quantisation scheme.

This rephrasing can be efficiently performed using an algorithm attributed to Leo Bluestein, often referred to as Bluestein’s algorithm. When used in conjunction with the ability of the optical system to perform an instantaneous Fourier transform, this approach allows us to compute a chirp z-transform or discrete Fourier transform with an arbitrary level of precision.

In the following, for simplicity we will assume that the input size N and output size M are equal. It can easily generalise to different values by either padding the input with zeros if M > N or truncating the output if M < N.

Bluestein’s algorithm

The main idea of Bluestein’s algorithm is to rephrase the chirp z-transform as a convolution with periodic boundary conditions, which can be performed efficiently using Fourier transforms. It relies on a very simple observation: for any two integers m and n,

Using this identity, it is easy to show that, for all complex vectors x with N elements and all integers m between 1 and N,

So far, this formula looks more complicated and less useful than the definition of the chirp z-transform. But it can be greatly simplified.

To achieve this, we first need to define a few things. For each positive integer L, we define the two operations ⚬ (element-wise multiplication) and ⊛ (periodic convolution) on complex vectors with L components as follows: for any two such vectors and each integer l between 1 and L,

and

We also choose a positive integer K no smaller than 2N-1 and define the following two vectors:

a vector with N elements containing the relevant powers of z:

a vector with K elements which contains the relevant negative powers of z as its first N components, the inverse powers of z in reversed order as its last N-1 components, and is padded with zeros:

Finally, we define the padding function P (input: N-components vector; output: K-components vector) and the truncation function T (input: K-components vector; output: N-components vector) by:

and

Then, for each complex vector x of size N, we have

This means that the chirp z-transform of an input x can be computed in 5 steps:

Compute the element-wise product of x with c.
Pad the result with zeros to size K.
Compute the periodic convolution with a pre-computed vector.
Truncate the result to size N.
Compute the element-wise product of the result with c.

Complexity considerations

We mentioned previously that we can use Bluestein’s algorithm as an efficient tool for executing the chirp-z transform; let us now take a moment to estimate the complexity of each of the steps covered above.

Steps 1 and 5 involve N-1 complex multiplications each (technically N, but the first component of the vector c is 1, so there is no need to perform the multiplication explicitly). This can be done with either 4 (N-1) real multiplications and 2 (N-1) real additions, or 3 (N-1)real multiplications and 3(N-1) real additions.

Step 2 requires us to append K-N zeros to a vector. In practice, one may directly store the result of step 1 in a larger array pre-filled with zeros. Step 4 simply involves reading N elements from an array. So, steps 1, 2, 4, and 5 all have a linear complexity in K.

Step 3 is the real bottleneck. Indeed, the convolution of two vectors with length K, if performed using the above definition, requires (K-1)² complex multiplications and K (K-1) complex additions. Except for the smallest values of K (and thus of N), this is much more demanding than all the other steps combined.

Fortunately, it is possible to significantly reduce this complexity using Fourier transforms. Indeed, denoting by F the discrete Fourier transform of size K, we have for any vectors x and y of size K:

Using a fast Fourier transform algorithm, the asymptotic complexity of this operation becomes O(K log K). However, this can itself be efficiently accelerated using Fourier-optical hardware, as detailed in these articles. Notice also that, in Bluestein’s algorithm, one of the two vectors in the convolution is independent of the input x. Its Fourier transform can thus be pre-computed, eliminating the need to perform an additional operation using the optical system.

Why use Bluestein’s algorithm to compute a Fourier transform?

At this point, it may seem like we are going round in circles: we mentioned that discrete Fourier transforms are a special case of chirp z-transforms, but we now need Fourier transforms to compute the latter efficiently. And, indeed, if all you are interested in is the Fourier transform function F, Bluestein’s algorithm will be of little use to you. There are, however, at least two special cases where it is very useful.

The first one is to compute Fourier transforms of arbitrary sizes. The reason why most fast Fourier transform algorithms such as the Cooley-Tukey algorithm are efficient is that they split the original Fourier transform into many smaller ones and then recombine the results using the symmetries in the complex roots of unity. For instance, if N=2ⁿ for some integer n, one can evaluate a Fourier transform of size N by performing n2ⁿ⁻¹ Fourier transforms of size 2 and as many complex multiplications. The resulting asymptotic complexity is O(N log N), which is much better than the complexity O(N²) of using the raw definition of the Fourier transform. However, this is only efficient if N is the product of many small factors. If N has large prime factors, such algorithms become much less efficient — typically, their complexity (without other optimizations) is at least Ω(p²) where p is the largest prime factor of N.

Bluestein’s algorithm solves this problem by allowing us to evaluate a Fourier transform of size N using Fourier transforms of a different size K, with the only constraint that K be no smaller than 2N-1. For instance, for N = 7919 (which is a prime number), it can be evaluated using Fourier transforms of size K = 16384 = 2¹⁴. While K is larger than N, Fourier transforms of size K are typically must faster to evaluate in practice because they can be decomposed into Fourier transforms of size 2.

The same argument also holds for hardware designed to efficiently compute Fourier transforms with a size different from 2, such as the Optalysys technology: starting from a Fourier transform of size S (and assuming S > 1, of course), a Fourier transform of an arbitrary size N can be computed by using the Cooley-Tukey algorithm to perform Fourier transforms with length equal to the smallest power larger than or equal to 2N-1, then using Bluestein’s algorithm to reduce this size to N.

The second important use case is when the Fourier transform can not be directly computed with the desired accuracy. Indeed, Bluestein’s algorithm allows us to compute Fourier transforms with an arbitrary level of accuracy by using a base convolution function with a finite accuracy. Let us sketch how this works.

Increasing the accuracy of the convolution by splitting the inputs in different frames

We focus on the convolution step of Bluestein’s algorithm, as this is the main bottleneck in the process and the one which uses Fourier transforms. The convolution has three important properties:

It is a bilinear function, i.e., it is linear in each of its inputs.
It maps integer vectors to integer vectors: if x and y are two complex vectors with the same number of elements and if the real and imaginary parts of each coefficient of x or y is an integer, then each coefficient of x ⊛ y also has integer real and imaginary parts.
It is bounded: if x and y are two complex vectors with the same number K of elements, and if the absolute values of the real and imaginary parts of each element of x and y are smaller than some number M, then the absolute values of the real and imaginary parts of each element of x ⊛ y is smaller than K M².

We’re especially interested in the second point, which allows us to map discrete integer inputs to discrete integer outputs despite the irrational numbers which appear in the Fourier transform.

The second and third properties imply the following: if each coefficient of x and y can be represented with b bits, where b is a positive integer, then each coefficient of x ⊛ y can be represented exactly with 2b+2 + ⌈log N⌉ bits (or 2 + ⌈log N⌉ if b = 1), where the logarithm is in base 2. The last term can be reduced if x or y has some vanishing coefficients.

To return briefly to our earlier description of the chirp-z transform, note that this property does not apply to the generic Fourier transform: since the DFT involves multiplications by complex numbers with irrational real and imaginary parts (except in the particular cases of size 1, 2, or 4), the second property above is not satisfied, and the output will generally not be representable exactly with a finite number of bits even for integer inputs.

Let us now see how this can be used to increase the accuracy of the convolution performed by a finite-precision compute device.

Let x and y be two vectors. Assume we wish to compute their convolution with b₁ bits of accuracy for both the input and output, using convolution hardware (for instance, an optical Fourier engine with additional logic to perform the component-wise multiplication) with b₂ bits of accuracy, where b₂ < b-1. For simplicity, we assume that b₂ > 2 + ⌈log N⌉.³

One possible strategy is the following (note that it can be adapted and optimized for specific workflows, e.g. if the required accuracy on the input and output are different, or to use powers of an integer different from 2):

Find two scale factors sₓ and sᵧ such that the integer parts of each coefficient of sₓ x and of sᵧ y is accurate to b₁ bits. Define the vectors x̄ and ȳ by discarding the fractional parts of the real and imaginary parts of sₓ x and sᵧ y, respectively.
Decompose x̄ and ȳ into vectors with at most b bits per real or imaginary part of each coefficient, where b is a positive integer such that 2 b + ⌈log N⌉ < b₂ if such an integer exists, or b = 1 otherwise, so that

and

3. The convolution of x and y (the output of which is, as a reminder, the discrete Fourier transform or chirp z-transform that we seek) can then be computed using the formula:

where each of the convolutions now only involve real numbers with at most b₂ bits (and can thus be computed using the convolution hardware), and the b₁ under the equal sign means ‘equal within b₁ bits of accuracy’.

If x is a vector of outputs from the detection of an optical Fourier transform, the above convolution process will (via the properties of the chirp-z transform) thus produce an output which is exact with respect to the precision of the detection scheme!

A visualisation of increasing precision

We’ll now tie together everything we’ve outlined above using a visual perspective.

Let us say that we have already computed a Fourier transform to a high degree of numerical precision. For illustrative purposes, we select one of the values of this output and display it in complex space as a red dot.

Let us now assume that we want to use a precision-limited optical device and bit-shifting to perform the same transform to the same degree of precision.

Using the property of linear separability in the Fourier transform, we can feed the input value into the system as separate “frames” of bit-words which are then encoded into light and emitted into the free-space processing stage of our system, with the Fourier transform of these values being detected by the receiving side of the system.

In this example, for the purposes of visualisation, we assume a device capable of 2-bit signed complex output. In practice, the system we are developing has a higher output precision.

In the following, we pick the same output value from the high-precision Fourier transform to focus on. The complex binary outputs closest to this value are represented as a green dot. The purpose of Bluestein’s algorithm can be summarised as ensuring that the quantised outputs of the optical Fourier transform correctly “snap” to these values every time.

For each frame that we pass through the system, the use of bit-shifting means that we are effectively “zooming in” to the region that contains the higher-precision result on each frame.

It may not look like the precision of the result is increasing in the above, but if we instead stack the output of each frame, scaled and positioned relative to the last…

… we can see that outputs of each frame are indeed steadily converging on the correct answer. Because the Fourier transform is linearly separable, we can then bit-shift the results to retrieve the higher-precision binary value:

Bit-shifting and summation of the results. In practice, the negative values wouldn’t be subtracted, but would instead be represented in two’s complement form and then added.

Using this algorithm and the methods outlined above, discrete Fourier transforms with arbitrary accuracy can be efficiently obtained from finite-precision analog hardware. Furthermore, this technique can be efficiently combined with the method of calculating larger Fourier transforms of different size and dimensions to the optical output, allowing for the construction of a general-purpose Fourier transform engine that can execute results to arbitrary size and precision.

Impact on optical computing efficiency

The above methods make it possible to reliably increase accuracy when digitising an optical Fourier transform. But what additional overheads does this pose for the efficiency and speed of the optical method?

This is a crucial question. In this section, we break down the process outlined above to give an overview of what this means for performance relative to other approaches to calculating the Fourier transform.

Our system as designed passes a full frame of data through the optics at a rate of 1 GHz; 1 11x11 frame every nanosecond. The precision of the input is a 4-bit real value; the native precision of the result is a 10-bit complex number with signed 4-bit real and complex components.

The algorithmic complexity of reconstructing a larger Fourier transform using a base 2-dimensional optical transform can be shown to be:

where nx, ny are the x and y dimensions of the base 2D grid. Precision can be built in a large-transform result obtained using this technique by using the same methods described in this article for a native 11x11 grid.

With respect to the complexity of boosting precision, we’ve already given a breakdown of the algorithmic complexity of Bluestein’s algorithm in the section on complexity considerations, in which we demonstrate that the asymptotic complexity of Bluestein’s algorithm is notionally O(K log (K)) where K is the size of the x and y vectors in the convolution. This is a consequence of the complexity of the electronic Fourier transform; however, we have access to an O(1) optical transform. As the other operations in Bluestein’s algorithm are linear in time, we can therefore use the optical system to perform the transforms involved in the convolution, reducing the overall complexity to O(K).

Normally, this process would involve 2 forward Fourier transform operations and an inverse transform, all performed in optics. However, as one of the forward transforms is static and can be pre-computed, this further eliminates an additional optical pass.

So how many frames (or, more usefully for transforms of arbitrary dimension, distinct layers of additional transformation) do we require?

It’s helpful to use a transform of a given size to set some benchmarks; here we choose to estimate the evaluation of a 1331x1331 transform to 60-bit complex precision using a device with a native 11x11 grid working with 10-bit complex numbers. There are three main steps we need to consider.

First, the 1331x1331 input grid needs to be decomposed into 11x11 ones. This yields 121x121 = 14641 sub-grids.
Second, the results of the smaller Fourier transforms need to be combined to reconstruct the 1331x1331 Fourier transform. As we sketched in this article, this can be done using a two-dimensional equivalent to the Cooley-Tukey algorithm, which requires 3 optical frames for each sub-array.
Finally, each of the 60-bit inputs need to be split into 10-bits slices. This step multiplies the number of frames needed by 6²+6 = 42.

The total number of frames needed to process the result to the required precision is thus 14641×3×42 = 1,844,766. This may look like a lot, but the equivalent on an electronic system requires about 183,857,824 64-bit operations!

So what does this mean in the context of the hardware that we’re developing? We’ll have a dedicated article out soon on the specs of our system, but we can openly state some of them here.

The optical computing architecture that Optalysys are developing will process 1,000,000,000 optical frames per second, corresponding to a processing speed of 99.7 giga-operations per second. The power draw of the core architecture is just 2W; since we intend to use 4 Etiles to process fully-complex inputs, this yields a normalised processing efficiency of 12.5GopJ (giga operations per Joule). By comparison, the best current Fourier transform hardware for mass FT calculation can perform at about 3GopJ for the same operation.

Not only is this a massive improvement over electronic systems (even at higher precision!), but the combination of speed and the calculation methods we’ve adopted means that the precision of a result can be controlled by software without losing either the speed advantage or wasting architectural real-estate. This means that an optical approach isn’t just suitable for making a general-purpose Fourier transform processor, it also allows us to make the best FT processing core possible.

We’ve been developing our next-generation optical system for some time now. In our next articles, we’ll be releasing details not only on the operating parameters of the core architecture, but our intentions for making them a reality on the mass production scale.

Until then, you can visit www.optalysys.com for more information on what we’re up to.

¹ This property is specific to Fourier transforms of size 1, 2, or (as in the present case) 4 in each direction. For other sizes, the Fourier transform of an array of integer will generally involve irrational numbers.

² Except for n = 1 (the only root is 1 itself), n = 2 (the two roots are 1 and -1) and n = 4 (the four roots are 1, -1, i, and -i).

³ If that is not the case but b₂ > 1, one can first write x as a sum of several vectors with at most k coefficients each, where k is a positive integer such that b₂ > 1 + ⌈log N⌉, go through the algorithm with x replaced by each of these vectors, replacing N by k, and sum the results.

Higher numerical precision for optical Fourier transforms was originally published in Optalysys on Medium, where people are continuing the conversation by highlighting and responding to this story.

Stories by Optalysys on Medium

The distinction between FHE and TEEs: The Downfall attack

Side channel attacks

Spectre and Meltdown…

… and now Downfall

Why is FHE different?

FHE: An alternative to client-side scanning?

Background

Client-side scanning

An alternative to client-side scanning?

Ideal solution

Possible solution

The achievable solution

FHE and machine learning: A student perspective with examples

Table of contents:

Introduction

Privacy enhancement and FHE

Example 1: Logistic Regression — Sklearn Breast Cancer Data set

Example 2: XGBoost model for predicting corporate credit rating

Example 3: Simple Multilayer Perceptron for Wine classification

Bonus section — how accuracy and maximum accumulator bit width requirement scales with quantisation bit width of inputs

FHE — limitations and an optical solution for FHE

Appendix: Python Notebooks

Breast Cancer Dataset

XGBoost for Loan Analytics

Wine classification perceptron

Encrypted fashion MNIST

FHE, data sharing and confidentiality

What happens when data is valuable?

Confidentiality vs collaboration

The value of data

FHE as a confidentiality solution

Confidentiality at scale

Optalysys: What we’ve done (And why we did it)

The Optical Fourier Transform

Fully homomorphic encryption and Optalysys technology

The Optalysys Enable FHE accelerator

Max, min, and sort functions using Programmable Bootstrapping in Concrete FHE

TL;DR

Introduction

High-level description

Homomorphic version

Computing the maximum or minimum of two encrypted values

Preliminaries

Max algorithm

Min algorithm

Maximum or minimum value of an array of numbers

Maximum

Minimum

Example

Sort algorithm

Function to compute the maximum and minimum of two numbers

Sorting algorithm

Optical acceleration

Appendix: Full code

Encrypted search using fully homomorphic encryption

The problem of privacy

A new approach to security

Speeding up FHE via optical computing

Article contents:

String Searching

Techniques for string searching

The naive approach

Tries

The naive approach: bringing Booleans into the picture

Applications

Encrypted string search with Concrete-Boolean

Import the Concrete-Boolean library

Encrypt a string

The FHEError type

Checking the equality of two characters

The string search function

An example

Alternative version

Potential for optical acceleration

Concrete Boolean and Conway’s Game of Life: A Tutorial

A guest article by Optalysys on implementing Conway’s Game of Life using homomorphic encryption

Introduction to the Game of Life

Boolean construction

Set-up