Stories by Tom White on Medium

Synthetic Abstractions

Tom White — Thu, 23 Aug 2018 09:18:59 GMT

My previous post describes the process and methodology behind my recent series of ink prints. This post is an update on the project that also examines more closely the outputs themselves and what they might represent.

Perception Engine Series

After the Treachery of ImageNet series, I completed a subsequent series of ten similar Riso prints called Perception Engines.

Perception Engines: cello, cabbage, hammerhead shark, iron, tick, starfish, binoculars, measuring cup, blow dryer, and jack-o-lantern

The main change in this series is that I wanted all prints to be in exactly the same style. So all ten in this series are from the same version of the codebase and with nearly identical hyper-parameters. This means that the only differences in the resulting prints are the result of the “creative objective” — in this case different ImageNet label targets.

By keeping the style constant, it is more evident how the Perception Engines settle on distinct targets from similar starting conditions. In fact, it is possible to interpolate between drawings such as the hammerhead shark and the iron:

Interpolation between two finished drawings with responses on six tested neural net architectures

This visualization makes it clear that the system is expressing its knowledge of global structure independent of surface features or textures.

Generalization and Abstraction

A goal of this work is to investigate visual representations independent of any particular neural architecture. Research on Adversarial Images has a related term called “transferability” when the goal is that the conclusion of one neural network “transfers” to another specific target. This work extends that idea to “generalization” — the idea that the result should transfer as broadly as possible to unknown architectures and weights.

As an example, let’s consider the tick print. It was created using inputs from six neural network architectures: InceptionV3, MobileNet, NASNet, ResNet50, VGG16, and VGG19. However, it has since been shown to generalize to almost every other neural architecture (usually with ridiculously high scores) including those like DenseNet, which did not exist when the print was made.

A photo of the tick print yields very high scores across neural architectures. The graphs with yellow background are from the six models used to create this print, and the other seven results suggest that this result generalizes well across architectures.

Another way to quantify how well this print generalizes to new neural architectures is to compare the result with the ImageNet validation data. ImageNet validation images are identical to those used in training (taken from the same distribution), so those serve as the best benchmark of how well a network would be expected to respond after training. When using trained weights from InceptionResNetV2 (a model not used to make the print), a photo of this print scores higher than all fifty official ImageNet tick validation images.

Tick print with all fifty tick ImageNet validation images in order of response on InceptionResNetV2

I was floored by this result: This ink print more strongly elicits a “tick response” than any of the real images the network is expected to encounter. To describe this effect I’ve coined the term “Synthetic Abstraction”. My interpretation is that it is possible for neural networks to create a visual abstraction that serves as a more idealized representation than any specific instance. Just as one is able to imagine the Platonic ideal of a perfect circle from seeing only imperfect examples, this process can similarly succeed in creating a visual abstraction that represents the character of the target class more strongly than any particular instance.

Screen Printing

Riso prints have limitations in available ink colors and size of print. However, the same technique of layering ink used in Riso printing is also used in screen printing. I’ve recently adapted my technique to include screen printing, starting with some larger reprints from my Perception Engines series.

Screen-printing process

There is much more overhead in creating screen prints — each layer needs its own dedicated screen which must be burned in, cleaned, and dried. I’ve faithfully recreated larger and crisper versions of four prints, and thus far the results have been worth the effort. Testing has shown these prints to elicit responses in neural networks as strong as the originals smaller prints.

Large format (60x60cm) screen prints of cello, hammerhead shark, tick, and binoculars

These large limited edition prints are currently being shown at Nature Morte gallery in New Delhi as part of their innovative gallery show Gradient Descent, which highlights art created with artificial intelligence techniques.

Transferring outside of ImageNet

ImageNet has been the perfect laboratory to conduct this first set of experiments as the ontology (sets of labelled classes) is fixed and there are many trained models to test how well generalization is working. Similar to physics experiments performed assuming a frictionless table, perfectly aligned ontologies are an idealized case useful as starting point to gain intuitions. But how well might the system work if we relax the constraint that the systems have to share training data and labels?

Google, Amazon, and other companies provide online AI services that serve as a useful reference point. For example, Google Cloud considers the cello screen print to be a cello (perhaps even with cellist). This seems to suggest that results generalize well outside of the training set.

Labels assigned to the cello print by Google’s Cloud AI service

However, Amazon Web Services (Rekognition) registers the same image only as an abstract form — either Art/Modern Art or perhaps a letterform like an ampersand.

Labels assigned to the cello print by Amazon’s Rekognition AI service

This result is not entirely disappointing since both are reasonable interpretations, and Amazon’s failure to recognize this print as a cello suggests that the sets of labels, training data, and distributions across the services might be strikingly different.

There was a similar result with two “Hotdog” prints commissioned by the THOTCON 0x9 conference. “Hotdog” is an ImageNet category, but here I also did testing against the popular Not Hotdog mobile application. (This targeted approach is much more similar to the Adversarial Image technique of transferability.) Results were promising: the HotDog app more often than not reported photos of the print to be a hot dog even with various croppings.

Hotdog print using ImageNet category that transfers well to the Not Hotdog application

With my next series I plan to look at how to get these prints to transfer to these publicly available models trained on other datasets.

Shifting Ontology

Most recently I have decided to shift away from ImageNet and instead focus more on online services such as those offered by Google and Amazon. One important market thus far for online AI services is filtering “inappropriate content.” Google Cloud exposes a version of its “SafeSearch” filter through their Vision API and offers multiple sub-categories such as Adult, Violence, and Racy. Amazon similarly offers an “Image Moderation” service through their Rekognition API, which can label images as Nudity or Suggestive (e.g. swimwear). And Yahoo has open-sourced their NSFW classifier that can report when images are “Not Safe For Work.”

The Perception Engines pipeline was modified to target these three services. I’ve also updated my screen printing-based drawing systems to use several layers of different colored ink. Here are some of the first results:

Two visual abstractions of what online filters consider “inappropriate content”

Photos of each of these prints score strongly as:

“Explicit Nudity” according to Amazon Rekognition
“Racy” according to Google SafeSearch
“Not Safe for Work” according to YAHOO NSFW

This can be verified through the web interfaces for these products.

Amazon’s AI service reports this photo as “Explicit Nudity” with high confidence

Google’s SafeSearch reports this photo as “Racy” at its highest confidence level

Note that for all three of these image filters the ontologies are very much misaligned, the criteria are somewhat vague, and there is no access to the training data for any of these classifiers.

Presumably these models use different training data, so it’s not clear how much overlap these models are expected to have. However, the fact that one image can trigger similar reactions across three models suggests that they may have some shared understanding of this print. I don’t have an intuition of why this particular arrangement of bright shapes elicits this response. If “Interpretable Machine Learning” were more mature, one might be able to get the system to provide a succinct explanation. But in this instance I prefer the mystery of not knowing exactly what subgenre of racy NSFW nudity this print might be eliciting in the models.

Unlike the ImageNet based prints, I’m not yet sure how strongly these results generalize. I plan on investigating further by continuing this series of “Inappropriate Content” images and hope to use this as an opportunity to try out more variations in style.

Perception Engines

Tom White — Wed, 04 Apr 2018 23:51:17 GMT

A visual overview examining the ability of neural networks to create abstract representations from collections of real world objects. An architecture called perception engines is introduced that is able to construct representational physical objects and is powered primarily by computational perception. An initial implementation was used to create several ink prints based on ImageNet categories, and permutations of this technique are the basis of ongoing work.

Ink prints: Forklift, Ruler, and Sewing Machine

Introduction

Can neural networks create abstract objects from nothing other than collections of labelled example images? Neural networks excel at perception and categorization, so it is plausible that with the right feedback loop perception is all you need to drive a constructive creative process. Human perception is an often under-appreciated component of the creative process, so it is an interesting exercise to to devise a computational creative process that puts perception front and center. In this work, the creative process involves the production of real-world, non-virtual objects.

Given an image, a neural network can assign it to a category such as fan, baseball, or ski mask. This machine learning task is known as classification. But to teach a neural network to classify images, it must first be trained using many example images. The perception abilities of the classifier are grounded in the dataset of example images used to define a particular concept.

In this work, the only source of ground truth for any drawing is this unfiltered collection of training images. For example, here are the first few dozen training images (from over over a thousand) in the electric fan category:

Random samples from the “electric fan” category

Abstract representational prints are then constructed which are able to elicit strong classifier responses in neural networks. From the point of view of trained neural network classifiers, images of these ink on paper prints strongly trigger the abstract concepts within the constraints of a given drawing system. This process developed is called perception engines as it uses the perception ability of trained neural networks to guide its construction process. When successful, the technique is found to generalize broadly across neural network architectures. It is also interesting to consider when these outputs do (or don’t) appear meaningful to humans. Ultimately, the collection of input training images are transformed with no human intervention into an abstract visual representation of the category represented.

Abstract ink print generated from the category “electric fan”

First Systems

The first perception engine implementations were not concerned with physical embodiment. These pixel based systems were inspired by and re-purposed the techniques of adversarial examples. Adversarial examples are a body of research which probes machine learning systems with small perturbations in order to cause a classifier to fail to correctly assign the correct label.

Early perception engine outputs: birdhouse, traffic light, school bus

Adversarial examples are usually constrained to making small changes to existing images. However, perception engines allows arbitrary changes within the constraints of a drawing system. Adversarial techniques also often target specific neural networks. But in this work we hope to create images that generalize across all neural networks and — hopefully — humans as well. So perception engines use ensembles of trained networks with different well known architectures and also includes testing for generalization.

Architecture

Perception Engines Architecture

As the architecture of these early systems settled, the operation could be cleanly divided into three different submodules:

Drawing system — The system of constraints involved in creating the image. The early systems used lines or rectangles on a virtual canvas, but later sytems achived the same result for marks on a page under various production tolerances and lighting conditions.
Creative objective — What is the expressive goal? Thus far the
focus has been on using neural networks pre-trained on ImageNet with an
objective of maximizing response to a single ImageNet class. This
is also consistent with most Adversarial Example literature.
Planning system — How is the objective maximized? Currently random search is used, which is a type of blackbox optimization (meaning no gradient information is used). Though not particularly efficient, it is otherwise a simple technique and works well in practice over hundreds to thousands of iterations. It also finds a “local maximum”, which in practice means it will converge to a different solution each run.

Growing a fan

The perception engine architecture uses the random search of
the planning module to gradually achieve the objective through iterative refinement. When the objective is to maximize the perception of an electric fan, the system will incrementally draw or refine a proposed design
for a fan. Combining these systems is a bit like creating a computational ouija board: several neural networks simultaneously nudge and push a drawing toward the objective.

Early steps in planning the electric fan print

Though this is effective when optimizing for digital outputs, additional
work is necessary when planning physical objects which are subject to
production tolerances and a range of viewing conditions.

Modeling physical artifacts

After the proof of concept I was ready to target a physical drawing system. The Riso Printer was chosen as a first target; it employs a physical ink process similar to screen printing. This meant all outputs are subject to a number of production constraints such as limited number of ink colors (I used about 6) and unpredictable layer alignment between layers of different colors.

Left: Loading Purple Ink drum into Riso printer
Right: “Electric Fan” print before adding second black layer

At this point in my development I was awarded a grant from Google’s Artist and Machine Intelligence group (AMI). With their support, I was able to print a series of test prints and iteratively improve my software system to model the physical printing process. Each source of uncertainty that could cause a physical object to have variations in appearance is modeled as a distribution of possible outcomes.

Issue #1: Layer Alignment

It is common for Riso prints to have a small amount of mis-alignment between layers because the paper must be inserted separately for each different color. This possibility was handled by applying a small amount of jitter manually between colors.

Example of layer jitter being applied to produce a distribution of possible alignment outcomes.

In practice this jitter keeps the final design from being overly dependent on the relative placement of elements across different layers.

Issue #2: Lighting

The colors of a digital image can be given exactly. But a physical object will be perceived with slightly different colors depending on the ambient lighting conditions. To allow the final print to be effective in a variety of environments, the paper and ink colors were photographed under multiple conditions and then simulated as various possibilities.

Variations from applying different lighting conditions.

The lighting and layer adjustments were independent and could be applied concurrently.

Combining the jitter and lighting variations into a larger distribution of outcomes.

Issue #3: Perspective

In a physical setting, the exact location of the viewer is not known. To keep the print from being dependent on a particular viewing angle, a set of perspective transformations were also applied. These are generally added during a final refinement stage and are done in addition to the alignment and lighting adjustments.

Examples of perspective transform being added to model a range of viewing angles.

Final Print

This system typically runs for many hours on a deep learning workstation in order to generate hundreds to thousands of iterations on a single design. Once the system has produced a candidate, a set of master pages are made. Importantly, the perspective and jitter transforms are disabled to produce these masters in their canonical form. For the fan print, two layers were produced: one for the purple ink and one for black.

Final aligned layers of Electric Fan as they are sent to the printer

These masters are used to print ink versions on paper.

An ink print from the master above (no two are exactly alike)

Evaluating

After printing, a photo is used to test for generalization. This is done by querying neural networks that were not involved in the original pipeline to see if they agree the objective has been met — an analogue of a train / test split across several networks with different architectures. In this case, the electric fan image was produced with the influence of 4 trained networks, but generalizes well to 9 others.

This electric fan design was “trained” with input from inceptionv3, resnet50, vgg16 and vgg19 — and after printing scores well when evaluated on all four of those networks (yellow background). This result also generalizes well to other networks as seen by the strong top-1 scores on nine other networks tested.

Constraint System as Creativity

A philosophical note on creativity and intent: Using perception engines inverts the stereotypical creative relationship employed in human computer interaction. Instead of using the computer as a tool, the Drawing System module can be thought of a special tool that the neural network itself drives to make its own creative outputs. As the human artist, my main creative contribution is the design of a programming design system that allows the neural network to express itself effectively and with a distinct style. I’ve designed the constraint system that defines the form, but the neural networks are the ultimate arbiter of the content.

Treachery of ImageNet

In my initial set of perception engine objects I decided to explicitly caption each image with the intended target concept. Riffing off of Magritte’s
The Treachery of Images (and not being able to pass on a pun), these first prints were called The Treachery of ImageNet.

All 12 prints in the Treachery of ImageNet series

The conceit was that many of these prints would strongly evoke their target concepts in neural networks in the same way people find Magritte’s painting evocative of an actual, non-representational pipe. The name also emphasizes the ImageNet’s role in establishing the somewhat arbitrary ontology of concepts used to train these networks (the canonical ILSVRC subset) which I also tried to highlight by choosing an eclectic set of labels across the series.

Ongoing work

Additional print work is in various stages of production using the same core architecture. Currently these use the same objective and planner but vary the drawing system, such as using multiple ink layers or a more generic screen printing technique. Some more recent experiments also use very different creative objectives and more radical departures from the current types of drawing systems and embodiments. As these are completed I’ll share incremental results on twitter with occasional write-ups here. Additional photos of all completed prints can be found in my online store which is also used to fund future work.

Perception Engines was originally published in Artists + Machine Intelligence on Medium, where people are continuing the conversation by highlighting and responding to this story.