Imandra Inc. - Medium

Managing Complexity with Math and Logic: Changing Stripe Payment Flow with Claude and CodeLogician

Denis Ignatovich — Fri, 13 Mar 2026 20:50:37 GMT

There’s a growing belief that AI coding agents can handle complex business logic — payment flows, approval workflows, compliance rules. And they can, sort of. They produce code that *looks* correct, *reads* correctly, and passes the tests they write for themselves.

But what happens when the AI is both the author and the reviewer? It misses the same edge cases in both roles. I recently ran an experiment that made this concrete: I used Claude (via Claude Code) to formalize a Stripe payment flow, add new features, and verify correctness. Claude wrote the IML (Imandra Modeling Language) models, wrote the verification goals, and called the CodeLogician CLI itself — invoking `codelogician eval check`, `check-vg`, `check-decomp`, and `gen-test` as part of its own reasoning loop. When a verification goal was refuted, Claude read the counterexample, diagnosed the issue, and fixed it — all within the same conversation.

The difference between Claude working alone and Claude working with CodeLogician/ImandraX was not subtle. This post walks through exactly what happened. Here’s the repo with complete code.

The Process

Before diving into the details, here’s the overall workflow. The key insight is that this isn’t a linear pipeline — it’s an iterative loop where the formal verifier acts as an adversary to Claude, catching mistakes it can’t find on its own:

Claude + CodeLogician/ImandraX process

The refinement loop in the center is where the real value lives. Crucially, Claude drives this entire loop — it invokes the CodeLogician CLI commands, reads the output, and acts on the results. Each time the verifier refutes a goal, it returns a concrete counterexample — not a vague warning, but an exact input that breaks the claim. Claude then decides: is this a bug in the model, or a flaw in the property? Either way, the fix is precise.

The Starting Point: A Production Stripe Payment Flow

The system under test is a real payment processing flow built on the Stripe API: create order, attach payment method, create and confirm a PaymentIntent with manual capture, approve, capture, refund, cancel, and handle disputes via webhooks. It’s a Flask application with ~700 lines of Python — the kind of code Claude generates confidently and well.

The goal: formalize this as a pure state machine, add two new features (high-risk approval policy and 3DS/SCA challenge flow), verify the result is correct, and generate exhaustive tests.

As CodeLogician’s documentation puts it: there are two categories of programming. The first — UI rendering, CRUD endpoints, data serialization — is where AI coding agents excel. The second — financial transactions, approval workflows, risk assessment — is where “complexity grows combinatorially rather than linearly.” This payment flow is squarely in the second category.

Step 1: Formalize in IML

I had Claude translate the Python payment flow into IML (Imandra Modeling Language), a pure functional language with formal semantics used by CodeLogician’s automated reasoning engine. The result is a state machine with 10 order statuses, 8 actions, and 7 transition functions:

type order = {
  amount                : int;
  amount_captured       : int;
  amount_refunded       : int;
  status                : order_status;
  requires_review       : bool;
  approved_for_capture  : bool;
  payment_intent_exists : bool;
  latest_charge_exists  : bool;
}

Each transition function is pure — takes an order, returns a new order:

let capture_payment (amt: int) (o: order) : order =
  if may_capture o && 0 < amt && amt <= o.amount then
    { o with
      amount_captured = amt;
      status = Captured
    }
  else o

This was admitted cleanly — ImandraX accepted all definitions. Claude was confident. I was confident. We were both wrong.

Step 2: The Verifier Finds What Claude Can’t

Claude then verification goals — properties that should hold for *every possible input* — and ran them through ImandraX using `codelogician eval check-vg`.

The first property: every transition should preserve the amount invariant (`0 <= amount_refunded <= amount_captured <= amount`).

Result: REFUTED. ImandraX returned this counterexample for `capture_payment`:

amt = 1
o = {amount = 8957; amount_captured = 8957; amount_refunded = 1238;
     status = ApprovedForCapture; ...}

The bug: `capture_payment` sets `amount_captured = amt` but *doesn’t reset* `amount_refunded`. After capture, the order has `amount_captured = 1` but `amount_refunded = 1238` — violating `amount_refunded <= amount_captured`.

This isn’t an exotic edge case. It’s a straightforward bug that occurs whenever an order is re-captured after a partial refund. But Claude didn’t think about re-capture sequences when writing the code, and it wouldn’t think about them when writing tests either.

The fix is one line:

{ o with
  amount_captured = amt;
  amount_refunded = 0;    (* <-- this was missing *)
  status = Captured
}

Step 3: The Verifier Finds What Claude Thinks It Knows

Second class of failures was more subtle. Claude wrote this goal:

let goal_no_capture_without_approval (amt: int) (o: order) =
  not o.approved_for_capture ==>
  (capture_payment amt o).status <> Captured

The order is *already* Captured. `capture_payment` returns it unchanged. The output status is Captured — not because capture succeeded, but because it was already there. The property was logically wrong. Claude confused “capture shouldn’t *change* the status to Captured” with “the output shouldn’t *be* Captured.”

The corrected goal:

let goal_no_capture_without_approval (amt: int) (o: order) =
  inv_amounts o &&
  not o.approved_for_capture &&
  o.status <> Captured ==>
  (capture_payment amt o).status <> Captured

This is exactly what CodeLogician’s documentation describes: “LLMs explore likely paths. Logic explores all reachable paths.” Claude stated what it *meant* informally, but the formal statement didn’t match. The verifier showed exactly where the gap was.

Step 4: Adding New Features with Confidence

With the base model verified, Claude added two new features to the IML model:

High-risk flag: orders marked `high_risk` require 2 approvals before capture (replacing the simple boolean `approved_for_capture` with an integer `approval_count`)
3DS/SCA challenge: orders with `three_ds_required` must complete a 3DS authentication step before authorization

The `may_capture` predicate grew from a simple boolean check to a multi-condition gate:

let may_capture (o: order) =
  (o.status = Authorized || o.status = ApprovedForCapture) &&
  o.latest_charge_exists &&
  ((not o.three_ds_required) || o.three_ds_completed) &&
  o.approval_count >= required_approvals o

Claude wrote 5 new verification goals for the new features:

(* 3DS blocks authorization until completed *)
let goal_3ds_blocks_authorization (o: order) =
  o.status = PaymentIntentCreated &&
  o.payment_intent_exists &&
  o.three_ds_required &&
  not o.three_ds_completed ==>
  (confirm_payment_intent o).status = RequiresAction

(* High-risk needs 2 approvals *)
let goal_high_risk_needs_two_approvals (amt: int) (o: order) =
  inv_amounts o &&
  o.high_risk &&
  o.approval_count < 2 &&
  o.status <> Captured ==>
  (capture_payment amt o).status <> Captured

(* Combined: high-risk + 3DS requires BOTH conditions *)
let goal_high_risk_3ds_requires_both (amt: int) (o: order) =
  inv_amounts o &&
  o.high_risk &&
  o.three_ds_required &&
  (not o.three_ds_completed || o.approval_count < 2) &&
  o.status <> Captured ==>
  (capture_payment amt o).status <> Captured

Note that these goals only proved on the first attempt *because* the refinement loop in Steps 2–3 had already taught us how to write precise preconditions. Without that loop, Claude would have written the same class of flawed properties again.

All 14 verification goals proved — 7 amount preservation properties, 2 base safety goals, and 5 new feature goals. Every property holds for every possible input, not just the inputs Claude happened to think of.

Step 5: 84 Regions, 84 Tests

Here’s where CodeLogician’s region decomposition delivers something Claude simply cannot produce on its own.

Claude annotated the `step` function with `[@@decomp top ()]` and ran:

codelogician eval check-decomp --check-all stripe_flow_updated.iml

ImandraX partitioned the entire input space of `step` into 84 disjoint regions — 84 distinct execution paths, each with unique constraints and a simplified output expression.

`capture_payment` alone produced **53 regions**. Each represents a unique combination of:

`high_risk` (true/false)
`requires_review` (true/false)
`three_ds_required` (true/false)
`three_ds_completed` (true/false)
`approval_count` threshold (0, 1, or 2)
Amount validity (amt <= 0, amt > amount, valid)
Status (Authorized vs ApprovedForCapture vs other)

If Claude were to enumerate all capture scenarios, it would have produced maybe 8–10. The decomposition found 53 — and proved they are exhaustive and disjoint.

Claude then generated concrete Python test cases:

codelogician eval gen-test stripe_flow_updated.iml -f step -l python

This produced 84 test functions, each with a concrete input satisfying that region’s constraints and the expected output. For example, region 67 — high-risk capture with 2 approvals, no 3DS, valid amount:

def test_67():
    """High-risk capture succeeds with 2 approvals, no 3DS"""
    result = step(
        Action(ActionKind.CAPTURE, 39),
        Order(
            amount=39, amount_captured=1, amount_refunded=3,
            status=OrderStatus.AUTHORIZED,
            requires_review=False, high_risk=True,
            three_ds_required=False, three_ds_completed=False,
            approval_count=2,
            payment_intent_exists=False, latest_charge_exists=True,
        )
    )
    expected = Order(
        amount=39, amount_captured=39, amount_refunded=0,
        status=OrderStatus.CAPTURED,
        requires_review=False, high_risk=True,
        three_ds_required=False, three_ds_completed=False,
        approval_count=2,
        payment_intent_exists=False, latest_charge_exists=True,
    )
    assert result == expected

After adapting the generated tests to import from the actual Python module: 84/84 passed. The Python implementation matches the verified IML model across every execution path.

What This Demonstrates

The article “Tackle Complexity with Math and Logic” frames this clearly: when software encodes decisions — rules, states, money, risk — complexity grows combinatorially, and statistical reasoning systematically misses the gaps that logical reasoning finds.

In this experiment:

Claude wrote a real bug in `capture_payment` that it would never have caught through self-review or self-generated tests. The bug requires reasoning about field interactions across a specific multi-step sequence (capture, refund, re-capture) — exactly the kind of combinatorial path that AI coding agents don’t explore.

Claude wrote logically flawed verification goals with full confidence. It confused informal intent with formal meaning. The verifier’s counterexamples were immediate and unambiguous — not “this might be wrong” but “here is the exact input that breaks your claim.”

Claude would have written ~10 tests. The formal decomposition found 84 distinct execution regions. Not 84 random inputs — 84 mathematically guaranteed partitions covering every path through the state machine. The difference between “I tested the cases I thought of” and “I tested every case that exists.”

The Workflow, Revisited

Referring back to the process diagram at the top, the practical steps are:

Claude writes the IML model — translating Python logic to pure functional form
Claude runs `codelogician eval check`— validates the model is admitted by ImandraX
Claude writes verification goals — it knows what *should* hold, even if it states it imprecisely
Claude runs `codelogician eval check-vg` — ImandraX proves or refutes each goal with counterexamples
Claude fixes based on counterexamples— the refinement loop (Steps 3–5 repeat until all goals prove)
Claude runs `codelogician eval check-decomp`— ImandraX enumerates all execution regions
Claude runs `codelogician eval gen-test`— ImandraX generates concrete test cases from regions
Claude adapts tests to production code — mechanical translation

Claude orchestrates the entire pipeline. CodeLogician provides the formal reasoning that Claude cannot do on its own.

Claude is the author. ImandraX is the auditor. Neither works as well alone. The refinement loop — where the verifier returns concrete counterexamples and Claude decides whether to fix the model or fix the property — is where the real value is created. Each iteration produces either a bug fix or a more precise specification. Both are valuable.

Conclusion

AI-generated code that passes AI-generated tests is the model agreeing with itself. For business logic with real consequences — payment flows, approval policies, compliance rules — that’s not enough.

CodeLogician/ImandraX doesn’t replace Claude. It breaks the self-validation loop by providing something an AI coding agent fundamentally cannot: exhaustive logical reasoning over every possible state. The result isn’t “code that looks correct” — it’s code with a proof.

The full source code (Python implementation, IML models, and generated tests) is available at https://github.com/imandra-ai/codelogician-demos.

Managing Complexity with Math and Logic: Changing Stripe Payment Flow with Claude and CodeLogician was originally published in Imandra Inc. on Medium, where people are continuing the conversation by highlighting and responding to this story.

Vibe Coding was phase 1. Logic-first AI is phase 2. It is here now.

Denis Ignatovich — Sun, 15 Feb 2026 18:36:16 GMT

Imandra CodeLogician (www.codelogician.dev) is an end-user tool that makes AI coding agents reason explicitly instead of guessing. It forces the LLM to structure its understanding of your code into logic, invokes a dedicated reasoning engine to explore the consequences, and returns concrete artifacts — edge cases, behavioral boundaries, and impact analyses — alongside every decision-critical result.

Claude Code working with CodeLogician (Full demo below)

AI Can Generate Code. You Still Have to Trust It.

Vibe coding is incredible right up to the moment you have to trust it. The agent generates a clean refactor, a new feature, or a “simple” change to some business logic — and now you’re rereading everything, mentally simulating edge cases, and writing tests just to figure out what it actually did. We didn’t remove the cost of being correct; we just moved it onto the human. CodeLogician flips that workflow. It forces the coding agent to model what the code does, runs automated reasoning on the decision-critical parts, and gives you artifacts that expose real behavior — edge cases, boundaries, and impact — before you merge. This is the shift from AI that writes code to AI that has to show its work.

From Token-Space “Reasoning” to State-Space Exploration

CodeLogician is the developer-facing workflow, but the actual reasoning is performed by ImandraX. When decision-critical logic appears, the LLM (or CodeLogician on its behalf) translates that logic into a structured model and sends it to the reasoning engine. From there the process is no longer “generate and hope.” ImandraX systematically explores the behavior of the system: it searches for boundary conditions, finds counterexamples to incorrect assumptions, partitions the input space into distinct behavioral regions, and proves when certain bad outcomes are impossible. The output is not more explanation — it’s concrete evidence about what the code can and cannot do.

This is fundamentally different from the kind of reasoning LLMs perform. When an LLM “reasons,” it produces a convincing chain of text by sampling from patterns in its training data. That statistical process is extremely good at generating typical cases and plausible logic, but it only sees what is likely. The bugs that ship to production live in what is possible but unlikely. An automated reasoning engine doesn’t sample; it constrains the system with logic and explores the full space of behaviors defined by that logic. It answers questions like: Is there an input that breaks this invariant? Under exactly what conditions does this state transition occur? Which combinations of rules interact in unexpected ways?

To measure this precisely, we conducted a study (available at https://github.com/imandra-ai/code-logic-bench) which measured LLMs’ ability to reason about software logic compared with formally augmented analysis based on region decomposition and verification. The results demonstrated a significant improvements in performance among several key metrics. Sometimes improving performance by over 50%!

The metrics assess complementary aspects of reasoning quality: State Space Estimation Accuracy measures quantification of behavioral regions; Outcome Precision evaluates correctness of exact constraints; Direction Accuracy checks whether conclusions about safety or correctness are right; Coverage Completeness measures how fully scenarios are enumerated; Control Flow Understanding evaluates grasp of branching and override logic; Edge Case Detection captures identification of rare or boundary behaviors; and Decision Boundary Clarity assesses extraction of precise Boolean conditions. See the paper link at www.codelogician.dev.

With CodeLogician in the loop, the LLM can generate code, build a formal model of the behavior it just described, query the reasoning engine, and incorporate the returned artifacts into its response. That moves reasoning out of the developer’s head and into the system itself. LLMs explore token space. ImandraX explores state space. One produces plausible outputs; the other produces behavioral guarantees and counterexamples. CodeLogician is the bridge that lets them work together.

How the process is changing: the before and after

Before — The Human Is the Reasoning Engine

In a typical vibe-coding workflow, the loop looks fast — but the correctness work is still manual:

You prompt the agent for a change.
It generates code that looks reasonable.
You reread the diff and try to simulate the behavior in your head.
You write tests to search for the edge cases you think might exist.
You merge with a bit of uncertainty — because you can’t exhaust the state space mentally.

The LLM helped you write the code, but it didn’t actually analyze what the code can do.

So the responsibility for understanding behavior — all the weird inputs, boundary conditions, and cross-rule interactions — stays with you.

You’re not reviewing syntax.

You’re reverse-engineering semantics.

That’s why non-trivial AI-generated changes still feel slower than they should.

Generation is instant. Trust is expensive.

After — The Agent Has to Show Its Work

With CodeLogician in the loop, the workflow changes at the point where correctness matters:

You prompt the agent.
Decision-critical logic is detected.
The agent structures its understanding of that logic into a model.
The reasoning engine explores the behavior of that model.

The result comes back with artifacts:

edge cases
behavioral boundaries
state-transition constraints
change impact

Now you review evidence about behavior, not just code that “looks right.”

The key shift is simple but profound:

Before, the LLM generated and the human reasoned.

After, the LLM generates and the system reasons.

You stop asking:

“Did the model get this right?”

and start seeing:

“Here is the set of cases where this logic behaves differently.”

The Practical Effect

This doesn’t slow down the flow — it moves the expensive part to where machines are better than humans:

Humans are good at intent and design.
LLMs are good at synthesis.
Reasoning engines are good at exhaustive behavioral exploration.

So instead of spending your time mentally executing code paths and inventing tests to catch surprises, you get the surprises surfaced automatically — before the merge, not after the incident.

Vibe coding made writing code cheap.

This makes understanding code cheap.

Demo: A Complex Payment Gateway with Claude Code

To get started (and after installing CodeLogician and setting IMANDRA_UNI_KEY as per instructions on codelogician.dev) I asked Claude to learn how to use CodeLogician:

Then, I asked Claude Code to do something that looks straightforward on the surface:

“Come up with a complex payments system example and use CodeLogician to make sure you understand all the edge cases.”

Claude generated a full payment gateway model: payment methods, a transaction state machine, tiered fees, fraud checks, refunds, and merchant balance tracking. Then it ran the CodeLogician CLI to analyze it.

Step 1: Generate + Type-check

Claude wrote a model file and ran:

codelogician eval check payment_gateway.iml

The model type-checked cleanly. That’s the easy part.

Step 2: Ask the System to Explore the Logic

Next it ran the analysis pass:

codelogician eval check-vg — check-all payment_gateway.iml
codelogician eval check-decomp — check-all payment_gateway.iml

This is where the workflow becomes different from “prompt and hope.” The analysis doesn’t just rubber-stamp the code — it tries to break the assumptions.

Step 3: The Engine Found a Real Edge Case

One of the intended properties was:

“Fees never exceed the transaction amount.”

Sounds reasonable, right? Claude expected it to hold.

The reasoning engine produced a concrete counterexample:

BankTransfer, amount ≈ $1.51
Fee = $1.50 fixed + 0.8% × amount
Result: the fee equals (or exceeds) the transaction amount.

This is the exact kind of cross-cutting boundary condition that’s obvious in hindsight, but easy to miss when you design the components separately (“$1.50 fixed seems fine” + “0.8% seems fine”).

And it’s not a toy bug. In a real system, this means a merchant can accept a small bank transfer and lose money.

At that point, we had a choice:

change the fee policy (e.g., minimum transaction amount)
or explicitly document the boundary where the rule holds

Claude fixed the property by adding a minimum-amount constraint (e.g., “for amounts > $2”), reran the analysis, and the updated logic passed.

Step 4: The Second Insight Was Even More Interesting

The exploration also revealed a subtler weakness: some functions implicitly rely on callers maintaining invariants (e.g., never passing negative refunded amounts). The engine didn’t “invent” that issue — it made it visible by showing that the input space includes cases you might not defensively guard against.

This is a common failure mode in real systems:

you assume upstream validation
then an integration path violates it
and you ship a bug that only appears in production

CodeLogician doesn’t just check “happy path correctness.” It forces you to confront what your logic is actually permitting.

The Takeaway: This Is Phase 2 of Vibe Coding

Vibe coding is about speed.

Logic-First AI is about turning that speed into something you can trust.

The moment you stop treating the agent as a code generator and start treating it as a system that must model and explain behavior, you unlock a different level of reliability — without slowing down everything, and without trying to “prompt harder.” Check out www.codelogician.dev

Vibe Coding was phase 1. Logic-first AI is phase 2. It is here now. was originally published in Imandra Inc. on Medium, where people are continuing the conversation by highlighting and responding to this story.

First steps with ImandraX

Samer Abdallah — Thu, 08 May 2025 17:53:57 GMT

This blog entry describes my experience of getting started with ImandraX using the VS Code ImandraX extension. I decided to take some functions and proofs I made when I was learning Agda a couple of years ago, and see
how they translate into ImandraX.

Quick background on ImandraX

ImandraX supports many approaches to formal verification, including a seamless integration of bounded and unbounded verification. Every goal can be attacked first with bounded verification (via the `[@@upto n]` annotation), and then subjected to unbounded verification (e.g., proof by induction with `auto` or via more nuanced tactics) only once all counterexamples found have been eliminated.

We always suggest beginning with bounded verification. Why? (1) Most conjectures one has about code are actually false, and finding counterexamples quickly is much more useful than wasting time trying to prove false goals. In ImandraX, all counterexamples are computable objects which are directly available for inspection, computation and experimentation. (2) In many practical engineering applications, bounded verification is sufficient, and with ImandraX’s `[@@upto n]` these bounded results can be made “first class” verification results which form a key part of safety cases.

In this article, I will explore unbounded verification via `auto` and other nuanced tactics. Here’s the complete model if you’d like to follow along.

Prequisites

It will help if you are familiar with the functional programming language OCaml, upon which Imandra Modelling Language (IML) is closely based.
Install ImandraX by running the install script in the Github repository. You can download and run it, or pipe it directly from the repo like this:
curl https://raw.githubusercontent.com/imandra-ai/imandrax-api/refs/heads/main/scripts/install.sh | bash
Get an ImandraX API key, e.g. by signing up to Imandra Universe at [https://universe.imandra.ai/]. Once you have it, save it in file called $HOME/.config/imandrax/api_key, where $HOME is your home directory.
Install Microsoft VS Code if you don’t have it already.
Install the VS Code ImandraX extension. When installed, it should look like this:

The ImandraX VS Code extension

Some code to reason about

First we’ll create ourselves a new file in VS Code, and call it, for example, test.iml . Let’s then introduce a function to play with. We’re going to look at reversing lists. The first list reversing function, rev, is based on a simple structural recursion, where we pattern match on the two possible list constructors. It is sometimes called ‘naive’ reverse because it is (a) easily understood just by looking at it and (b) much slower : O(N²) in the length of the list , rather than the optimal O(N) method we will look at later.

let rec rev = function
 | [] -> []
 | x::xs -> rev xs @ [x]

Proving some properties of reverse

We would like to prove that reversing a list and then reversing the result reproduces the original list, that is, reverse is its own inverse. More formally, we want to prove that for any list x, rev (rev x) = x . From my experiences with Agda, I suspected that it would be useful to have a couple of lemmas available which prove that concatenation (the @ operator in OCaml and IML) is associative and reversal is ‘distributive’ in the sense that the reverse of the concatenation of two lists is the reverse concatenation of the reverses of the lists. However, before introducing the lemmas and exploring how ImandraX can make use of them, let’s see if ImandraX can prove the self-inverse theorem on its own. We encode it like this:

theorem rev_self_inverse x =
   rev (rev x) = x
[@@by auto]

When entered into VS Code, the ImandraX extension should decorate the function with ‘Check’ and ‘Browse’ clickable links. Clicking on ‘Check’ will tell ImandraX to try to prove the theorem below. Alternatively, from the menu, ‘File > Save’ (or Cmd-S) will save the file and check all the lemmas and theorems it contains. The result should look like this:

Successfully proven theorem in the VS Code editor

The little smiley face to the left indicates that the proof was successful. Clicking on the ‘Browse’ link should open another tab in VS Code containing a description of how ImandraX went about proving the theorem and how long it took.

Report on the proof of `rev_self_inverse`

We will examine this proof (by clicking on the triangle next to ‘view’) later, but for now let’s go back and introduce the two lemmas we mentioned earlier, before the main theorem, again relying initially on the automatic proof method that is built in to Imandra.

lemma append_assoc x y z =
   (x @ y) @ z = x @ (y @ z)
[@@by auto]

lemma rev_distrib x y =
   rev (x @ y) = rev y @ rev x
[@@by auto]

theorem rev_self_inverse x =
   rev (rev x) = x
[@@by auto]

Let’s browse the report on the proof of rev_distrib. We won’t go through the description in detail, but notice how long it took ImandraX to prove rev_distrib using the auto method (or ‘tactic’, as it’s called in ImandraX — more on tactics later). When I ran it, it took about 1.5 seconds, which is quite slow in the scheme of things. We will see what we can do to speed that up.

Report on the proof of `rev_distrib`

Speeding up the proof using lemmas

If you expand the ‘view’ triangle in the ‘Report’ section of the proof browser for rev_distrib, which lets you view a report of the reasoning done by Imandra’s inductive waterfall (auto) tactic, you might notice something which looks like the associativity proposition for list concatenation (represented by the function List.append here — in ImandraX, the @ operator is an infix macro that expands to an application of the List.append function). Buried in the description there is this:

Part of the proof of `rev_distrib`

which is a particular case of associativity for the three lists gen_2 , gen_1 and [x1] . ImandraX then goes on to prove this without using the lemma we carefully prepared above. If we can get it to use the already-proven general form of the lemma, we might save some time, and moreover, effectively ‘teach’ ImandraX how to reason more efficiently in this domain.

The way to do this is to tag the lemma as a ‘rewrite’ rule, which makes it available for use in the auto tactic. We do this by adding [@@rw] to the definition, like this:

lemma append_assoc x y z =
   (x @ y) @ z = x @ (y @ z)
[@@by auto] [@@rw]

We can then click ‘Check’ to refresh the definition and then ‘Check’ and ‘Browse’ again on rev_distrib to see what the effect is. For me, the proof time was reduced to about 450 ms, a distinct improvement!

Report on the sped-up proof of `rev_distrib` after making `append_assoc` a rewrite rule.

Examining the proof, we find this, showing that the automatic tactic did indeed find and use the lemma.

Part of the proof of `rev_distrib`, showing application of `append_assoc` rewrite rule

Looking further up the proof, we can also find this:

Part of the proof of `rev_distrib`, showing use of built-in `List.append_to_nil` rule

It turns out that the proof was already using a lemma defined in the ImandraX List module. This lemma, List.append_to_nil, states that, for any list x, List.append x [] = x . The snippet says that, assuming list x is not of the form _ :: _ , i.e. it’s an empty list [], we can conclude that right hand side (RHS) of the goal rev y @ rev x = rev y @ [] = rev y . Since the LHS, x @ y = [] @ y = y is true from the definition of List.append, ImandraX is able to reduce the goal to rev y = rev y, which is trivially true.

Let’s have a look at the proof of rev_self_inverse— as we noticed before introducing the lemmas, it goes through in about 740 ms without using rev_distrib, not surprisingly because we have not marked rev_distrib as available for rewriting. There is a point where rev_distrib could be applied, here:

Part of the proof of `rev_self_inverse` without using `rev_distrib` lemma

On the LHS of the goal, we have rev (List.append _ _). What happens if we make rev_distrib available for use? We add [@@rw] to the definition:

lemma rev_distrib x y =
   rev1 (x @ y) = rev y @ rev x
[@@by auto] [@@rw]

Saving and reviewing the proof of rev_self_inverse, we find:

Report on sped-up proof of `rev_self_inverse` after enabling use of `rev_distrib` lemma

Success! The proof time is down to about 250 ms, and expanding the report, we find exactly what we were hoping for:

Part of proof of `rev_self_inverse` showing use of `rev_distrib` rewrite rule

Weaning off the ‘auto’ proof tactic

When looking through the proof reports, you may get the impression (correctly) that ImandraX is somehow arriving at a long sequence of steps, often including nontrivial inductions and simplifications, each of which transforms the goals of the proof, until magically, at the end, the goal reduces to something which is trivially true and the proof is done. At each stage, there might be several transformations which could be applied, and ImandraX has to search through or use heuristics to find a sequence of steps which results in success. In this section, we will gradually reduce our reliance on the auto tactic, not because there’s anything wrong with using auto — it is an extremely powerful mechanism that is robust to semantics-preserving changes in the code. Rather, our purpose here pedagogical, and by learning how the various tactics work, we can use them judiciously to improve or direct harder proofs.

When building a proof manually, its very useful to be able to see the ‘state’ of a partial proof, i.e. goals that remain to be proved. To do this, you can either click on the search box at the top of the VS Code window and click on ‘Show and Run Commands’:

or press Shift-Cmd-P. Then type ‘ImandraX’ and choose ‘Open goal state’.

This should open a new panel called goal-state.md:

Now let’s start a new lemma called rev_distrib_manual , but instead of using the auto tactic, we’ll use skip. Important: we must also remove the [@@rw] annotation from our previous version of rev_distrib to prevent ImandraX from using rev_distrib to prove rev_distrib_manual, as this would rather defeat the point of the exercise!

lemma rev_distrib x y =
   rev (x @ y) = rev y @ rev x
[@@by auto] (* REMOVE [@@rw] *)

lemma rev_distrib_manual x y =
   rev (x @ y) = rev y @ rev x
[@@by skip]

Since skip doesn’t advance the proof at all, the proof fails:

However, if we look at the goal state, we see

We have goal to prove below the line and no hypotheses above it. We can reproduce the first step of the automatic proof by using the induction tactic. This will produce two subgoals, one for each case of a structural induction on one of the list arguments. The induction tactic must be composed with two further tactics to handle each of the cases. We are free to use auto for both of these, which results in a sucessful proof:

lemma rev_distrib_manual x y =
   rev (x @ y) = rev y @ rev x
[@@by induction ()
      @>| [auto; auto]]

For reference, this proof took about 425 ms for me.

When we invoke induction () on its own, ImandraX uses its induction heuristics (the same that are used by auto) to analyze the goal and synthesize an appropriate induction scheme. Unlike auto, however, induction () then simply returns the base and inductive cases as subgoals and lets us continue proving the goal manually. We can give induction instructions on which specific induction schemes we want, but in this case (and usually) we are happy to let it synthesize the appropriate induction scheme itself automatically.

By winding this back to just the induction step, we can get to see the two subgoals:

lemma rev_distrib_manual x y =
   rev (x @ y) = rev y @ rev x
[@@by induction ()]

The first subgoal, the base case, assumes the list x is empty, while the second assumes it’s not empty and an inductive hypothesis, which we can write more clearly without type decorations and using the @ operator instead of List.append:

rev (List.tl x @ y) = rev y @ rev (List.tl x)

Crucially, this is just the rev_distrib_manual lemma with x replaced with List.tl x. From now on I’ll illustrate the goal state not as a screenshot, but as text below the lemma code, without type decorations and using the @ operator.

The base case can be dealt with using the tactic simplify (). This can exploit function definitions and rewrite rules, but at this point, using it is no more enlightening than using auto, so let’s see we can do at a lower level. The normalize tactic can expand and simplify a target term using function definitions and hypotheses. We apply it to rev x in the first subgoal: since x = [] and rev [] is defined as [], the term reduces to [].

lemma rev_distrib_manual x y =
   rev (x @ y) = rev y @ rev x
[@@by induction ()
      @>| [ [%normalize rev x]
          ; auto
          ]]

 H0. x = []
|----------------------------------------------------------------------
 rev (x @ y) = rev y @ []

Similarly we can normalize x @ y using x = [] and the definition of List.append, to get x @ y = [] @ y = y . The two normalize steps are combined using the @> operator.

lemma rev_distrib_manual x y =
   rev (x @ y) = rev y @ rev x
[@@by induction ()
      @>| [    [%normalize rev x]
            @> [%normalize x @ y]
          ; auto
          ]]

 H0. x = []
|----------------------------------------------------------------------
 rev y = rev y @ []

We are quite close now. The built in List.append_to_nil lemma we found earlier can be applied with the use tactic to simplify the RHS:

lemma rev_distrib_manual x y =
   rev (x @ y) = rev y @ rev x
[@@by induction ()
     @>| [    [%normalize rev x]
           @> [%normalize x @ y]
           @> [%use List.append_to_nil (rev y)]
         ; auto
         ]]

 H0. x = []
 H1. rev y @ [] = rev1 y
|----------------------------------------------------------------------
 rev1 y = rev1 y @ []

Notice that this does not immediately simplify the goal — rather it inserts the hypothesis associated with the lemma above the line (H1 above). We can use the replace tactic to simplify a term using the hypotheses:

lemma rev_distrib_manual x y =
   rev (x @ y) = rev y @ rev x
[@@by induction ()
     @>| [    [%normalize rev x]
           @> [%normalize x @ y]
           @> [%use List.append_to_nil (rev y)]
           @> [%replace rev y @ []]
         ; auto
         ]]

 H0. x = []
|----------------------------------------------------------------------
 rev y = rev y

Finally, the trivial tactic deals with the final remaining tautology in the goal. If we add trivial to the chain and replace auto with skip for the second subgoal, we’ll complete the first subgoal and see what needs doing for the second.

lemma rev_distrib_manual x y =
   rev (x @ y) = rev y @ rev x
   [@@by induction ()
     @>| [    [%normalize rev x]
           @> [%normalize x @ y]
           @> [%use List.append_to_nil (rev y)]
           @> [%replace rev y @ []]
           @> trivial
         ; skip
         ]]

 H0. x <> []
 H1. rev (List.tl x @ y) = rev y @ (rev (List.tl x))
|----------------------------------------------------------------------
 rev (x @ y) = rev y @ rev x

We’ll start by using normalize on the LHS of the goal. From here on, for brevity, I’ll omit the List. qualifier when referring to the list head and tail functions hd and tl respectively.

lemma rev_distrib_manual x y =
    rev (x @ y) = rev y @ rev x
[@@by induction ()
    @>| [    [%normalize rev x]
          @> [%normalize x @ y]
          @> [%use List.append_to_nil (rev y)]
          @> [%replace (rev y) @ []]
          @> trivial 
        ;    [%normalize rev (x @ y)]
        ]]

 H0. x <> []
 H1. rev1 (tl x @ y) = rev y @ rev (tl x)
|----------------------------------------------------------------------
 rev (tl x @ y) @ [hd x] = rev y @ rev x

Next let’s normalize rev x in the RHS too:

lemma rev_distrib_manual x y =
    rev (x @ y) = rev y @ rev x
[@@by induction ()
    @>| [    [%normalize rev x]
          @> [%normalize x @ y]
          @> [%use List.append_to_nil (rev y)]
          @> [%replace (rev y) @ []]
          @> trivial 
        ;    [%normalize rev (x @ y)]
          @> [%normalize rev x]
        ]]

 H0. x <> []
 H1. rev (tl x @ y) = rev y @ rev (tl x)
|----------------------------------------------------------------------
 rev1 (tl x @ y) @ [hd x] = rev y @ (rev (tl x) @ [hd x])

Notice that we have a term rev (tl x @ y) in the LHS of the goal, and a hypothesis, the induction hypothesis in fact, about this term in H1 above the line. We can use H1 to rewrite the term in the goal using the replace tactic:

lemma rev_distrib_manual x y =
    rev (x @ y) = rev y @ rev x
[@@by induction ()
    @>| [    [%normalize rev x]
          @> [%normalize x @ y]
          @> [%use List.append_to_nil (rev y)]
          @> [%replace (rev y) @ []]
          @> trivial 
        ;    [%normalize rev (x @ y)]
          @> [%normalize rev x]
          @> [%replace rev (List.tl x @ y)]
        ]]

 H0. x <> []
|----------------------------------------------------------------------
 (rev y @ rev (tl x)) @ [hd x] = rev y @ (rev (tl x) @ [hd x])

Nearly there. Looks like all we need to do now is apply the append_assoc lemma to rewrite the LHS of the goal to match the RHS.

lemma rev_distrib_manual x y =
    rev (x @ y) = rev y @ rev x
[@@by induction ()
    @>| [    [%normalize rev x]
          @> [%normalize x @ y]
          @> [%use List.append_to_nil (rev y)]
          @> [%replace (rev y) @ []]
          @> trivial 
        ;    [%normalize rev (x @ y)]
          @> [%normalize rev x]
          @> [%replace rev (List.tl x @ y)]
          @> [%use append_assoc (rev y) (rev (List.tl x)) [List.hd x]]
        ]]

 H0. x <> []
 H1. (rev y @ rev (tl x)) @ [hd x] = rev y @ (rev (tl x) @ [hd x])
|----------------------------------------------------------------------
 (rev y @ rev (tl x)) @ [hd x] = rev y @ (rev (tl x) @ [hd x])

The goal is now the same as the hypothesis, so we can finish the proof by adding the trivial tactic:

lemma rev_distrib_manual x y =
    rev (x @ y) = rev y @ rev x
[@@by induction ()
    @>| [    [%normalize rev x]
          @> [%normalize x @ y]
          @> [%use List.append_to_nil (rev y)]
          @> [%replace (rev y) @ []]
          @> trivial 
        ;    [%normalize rev (x @ y)]
          @> [%normalize rev x]
          @> [%replace rev (List.tl x @ y)]
          @> [%use append_assoc (rev y) (rev (List.tl x)) [List.hd x]]
          @> trivial
        ]]

This time the proof succeeds, taking about 190 ms for me.

There are more things we can do to break down this proof into still smaller steps — for example, the normalize tactic is quite clever in how it exploits function definitions and hypotheses, and can be replaced with a more explicit sequence of operations — but we’ll leave that for now. It is also possible to explore how some of the proof steps can be reordered when they operate independently on different parts of the goal.

A more efficient list reversal method

In this section, we’ll introduce a new function rev_fast, which reverses lists more efficiently than rev. It is defined with the help of a tail recursive function shunt with an accumulator argument that simultanously tears down the input list while building up the reverse list, all in one pass through the elements of the list:

let rec shunt tail = function
    | [] -> tail
    | x::xs -> shunt (x::tail) xs

let rev_fast x = shunt [] x

Let’s see what we can prove about this function. Again, from my previous exploration of this topic in Agda, I knew that there is a useful lemma associated with shunt , which we can encode in ImandraX like this:

lemma shunt_lemma acc items =
   shunt acc items = rev1 items @ acc
[@@by auto] [@@rw]

It states that shunt reverses (according to our now verified and trusted definition of reverse rev) the given items and appends the list in the accumulator. Clearly, this is very close to proving that rev_fast produces the same result as rev. As we’ve already enabled shunt_lemma as a rewrite rule, let immediately try to prove that rev_fast x = rev x for all lists:

lemma rev_fast_eq_rev x =
   rev_fast x = rev x
[@@by auto]

This proof succeeds in about 30 ms, making use of both shunt_lemma and List.append_to_nil . Intuitively, if rev_fast produces the same result as rev , and rev is self-inverse, then rev_fast must be self-inverse. ImandraX can prove this by itself, in about 700 ms if we don’t make rev_self_inverse and rev_fast_eq_rev available as rewrite rules, or about 40 ms if we do. Let’s instead prove it manually:

lemma rev_fast_self_inverse x =
   rev_fast (rev_fast x) = x
[@@by skip]

|----------------------------------------------------------------------
 rev_fast (rev_fast x) = x

Instead of our previous approaches where we start by expanding terms using function definitions, let’s immediately use our rev_fast_eq_rev lemma:

lemma rev_fast_self_inverse x =
   rev_fast (rev_fast x) = x
[@@by [%use rev_fast_eq_rev x]]

 H0. rev_fast x = rev x
|----------------------------------------------------------------------
 rev_fast (rev_fast x) = x

Like when we used the List.append_to_nil lemma in an earlier proof, the goal is left unchanged but the lemma is inserted above the line. We can use replace to consume and apply it below the line:

lemma rev_fast_self_inverse x =
   rev_fast (rev_fast x) = x
[@@by [%use rev_fast_eq_rev x]
   @> [%replace rev_fast x]]
 
|----------------------------------------------------------------------
 rev_fast (rev x) = x

We’ll repeat the process to rewrite rev_fast (rev x):

lemma rev_fast_self_inverse x =
   rev_fast (rev_fast x) = x
[@@by [%use rev_fast_eq_rev x]        
   @> [%replace rev_fast x]
   @> [%use rev_fast_eq_rev (rev x)] 
   @> [%replace rev_fast (rev x)]

|----------------------------------------------------------------------
 rev (rev x) = x

The remaining goal is none other than the proposition of rev_self_inverse, so the proof is completed with a use and a trivial, and completes in under 1 ms:

lemma rev_fast_self_inverse x =
   rev_fast (rev_fast x) = x
[@@by [%use rev_fast_eq_rev x]       @> [%replace rev_fast x]
   @> [%use rev_fast_eq_rev (rev x)] @> [%replace rev_fast (rev x)]
   @> [%use rev_self_inverse x]
   @> trivial]

Note that there are many other ways to do this — for example, if we’d wanted to let ImandraX’s rewriter apply rev_fast_eq_rev automatically, we could have used the simplify or normalize tactics. Nevertheless, it’s useful to see how to do these various steps manually!

Going further

As mentioned at the end of the section on doing manual proofs, there are many ways to break down proofs into more elementary steps, and there are many more tactics that can be used. The best place to find an authoritative list of available tactics is in the ImandraX source code: if you right click on any of the tactics in the code in VS Code, and click on ‘Go to Definition’, VS Code will bring up prelude.iml in a new panel.

Finally, a summary of the code in this article is available in this Github gist, and a short ‘cheat sheet’ of available tactics is available here.

First steps with ImandraX was originally published in Imandra Inc. on Medium, where people are continuing the conversation by highlighting and responding to this story.

Connecting Digital Twins and Systems

Filippo Sestini — Mon, 14 Oct 2024 21:06:06 GMT

In the world of financial trading system APIs, precision and accuracy are paramount. Imandra Protocol Language (IPL) models serve as formal digital twin counterparts of real-world trading systems’s APIs (e.g. FIX). However, for these models to be truly useful and valuable, we need rigorous methods to ensure they are functionally equivalent with the actual systems.

Such verification process becomes particularly challenging when: 1) the systems being modeled are not directly accessible (via source code, for example), and 2) the systems “downstream” implement incredibly complex logic (e.g. stock exchange). This is where Imandra’s automated reasoning comes into play.

Modeling FIX Gateways as Digital Twins

The description of how IPL models trading system APIs (e.g. FIX gateways) is beyond the scope of this text, but we urge the reader to view IPL product and documentation pages. At a high-level — the problem IPL solves is modeling system interfaces that may be in virtually infinite number of possible states as formal digital twins. IPL has many built-in powerful techniques and features for doing so.

For example, a stock exchange may respond to a customer order with 0 fills (successful trades), 1 fill or 100,000 fills. IPL tackles all such cases with symbolic description of possible events conditioned on the current state of the session. With IPL — fill events are modeled as symbolic actions that are described with 1) data they contain (e.g. fill price), 2) constraints describing when such actions might occur (e.g. if order is active) and 3) how the system reacts to such events. Reasoning about such actions and reconciling them with logs requires powerful reasoning tools like Imandra. Note that the underlying interaction of the order with other orders (e.g. exchange rule book) is beyond the scope of the model.

Connecting Digital Twins to Systems

Imandra’s Log Analysis tool is a high-level, behavior analysis tool that consumes trading logs produced by the trading system interface being modeled. It then checks these logs for coherence with the IPL model. In other words, it verifies whether the sequence of messages from the real-world logs fits a trading scenario described by the model/specification.

The performed tests range from basic checks of well-formedness of individual messages to complex verification of the entire log sequence against the model’s message flow and logic. This type of testing is crucial for two main reasons:

It helps identify potential issues in the model itself (for example, when the model doesn’t implement a valid state transition).
It flags behaviors in the real system that deviate from the intended specification.

In essence, these tests can be used to exercise both the model and the deployed system simultaneously, providing a comprehensive verification approach.

Checking Log Consistency Against Fields/Messages

The first stage of testing in our Log Analysis tool checks the well-formedness of messages (our examples use FIX protocol) contained in the input logs. This phase looks at aspects such as proper formatting, checksum validity, and correct encoding of fields with respect to the field’s data types as specified in the IPL model.

For example, consider this (excerpt of a) NewOrderSingle message:

8=FIX.4.4|9=615|35=D|38=SOMEQTY|...

Running this FIX log through our tool produces an error message similar to the following:

correct traces: 0% (0/1)
reason: cannot convert fix message to model message
attempting to match: 8=FIX.4.4|9=615|35=D|38=SOMEQTY|...
details:
Parsing app data : WrongValueFormat:38

This error occurs because the OrderQty field (38) is declared as a numeric field, but the quantity in the FIX log contains a string payload.

Conditional Validations

IPL allows for the addition of validation statements to message declarations. For instance, we can specify that Limit orders must have a Price field defined:

message NewOrderSingle {
  req ClOrdID
  req OrdType
  opt Price
  ...
  validate {this.OrdType==OrdType.Limit == present(this.Price)}
  ...
}

Our Log Analysis tool uses these validation statements from the source IPL model to check the consistency of a message’s fields with each other.

Consider this FIX message as an example:

8=FIX.4.1|9=103|35=D|34=3|49=SENDER|52=20121105-23:24:42|56=EXEC|11=1352157882577|21=1|38=10000|40=2|54=1|55=RECV|59=0|10=062

Here we have a limit order (tag 40 = 2) that does not contain a price (field 44). Predictably, this message fails our tool’s validation checks:

correct traces: 0% (0/1)
reason: message failed model validations

details:
invalid message: 8=FIX.4.1|9=103|35=D|34=3|49=SENDER|52=20121105-23:24:42|56=EXEC|11=1352157882577|21=1|38=10000|40=1|54=1|55=RECV|59=0|10=062
model reject reason: Message NewOrderSingle violates the constraint
                                        "this.OrdType==OrdType.Limit == present(this.Price)"

Modeling Downstream Systems with Actions

As already discussed, actions in IPL represent events that may occur asynchronously and affect the model’s state. These events can be triggered by things like external actors (e.g., an incoming message from another trader), or other components within the same system (e.g., an instrument being suspended by market operations).

A common example of an action in trading systems is a fill event. Fills are particularly suitable for modelling as actions because they occur when a counterparty sends a matching order, which is inherently asynchronous.

Here’s an example of how we might model a fill action in IPL:

action fill {
  qty   : Qty
  price : Price
  
  validate {
    state.OrdStatus in [ OrdStatus.New, PartiallyFilled ]
  }
  validate {
    this.qty > 0.0 && this.qty <= state.LeavesQty
  }
  validate {
    this.price > 0.0
  }
  ...
}

We use “receive” blocks to model how the system should react when an action occurs. In the case of a fill, this typically involves updating the model’s state according to the traded quantity, and issuing an execution report to acknowledge the trade.

receive (f:fill) {
  state.LeavesQty = state.LeavesQty - f.qty
  state.CumQty = state.CumQty + f.qty
  ...
  send ExecutionReport {
    ...
  }
}

Actions also allow us to define more complex message flows. For example, we can model a simple flow of a New Order Single followed by a fill Execution Report:

messageFlows {
  newOrderFill {
    template [ NewOrderSingle, fill ]
  }
}

Symbolic Reasoning About Systems and Logs

At a more advanced level, our tool performs symbolic reasoning to verify that sequences of logs are coherent with the message flows defined in the model. Consider a simple model including NewOrderSingle and ExecutionReport messages, as well as a fill action and a newOrderFill flow as defined above.

Now, let’s examine some FIX logs:

8=FIX.4.1|9=103|35=D|34=3|49=SENDER|52=20121105-23:24:42|56=EXEC|11=1352157882577|21=1|38=10000|40=1|54=1|55=RECV|59=0|10=062
8=FIX.4.1|9=139|35=8|34=3|49=EXEC|52=20121105-23:24:42|56=SENDER|6=0|11=1352157882577|14=0|17=1|20=0|31=0|32=0|37=1|38=10000|39=0|54=1|55=RECV|150=2|151=0|10=059

These logs show a NewOrderSingle (NOS) followed by an ExecutionReport (ER) with execution type “Fill” (field 150 = 2), acknowledging a fill involving the submitted order.

Our goal is to verify that this sequence of logs is coherent with the message flow defined in the model. To do so, our Log Analysis tool employs Imandra’s symbolic reasoning to devise a sequence of IPL state transitions that align with the logs. More specifically, it attempts to instantiate a fill action such that running the “receive” block on that action produces an ER message matching the one from the FIX logs. We call this operation “action synthesis”.

When we run our tool on the FIX logs above, we get a positive result, confirming that the logs are consistent with our model. However, let’s see what happens if we modify the execution type of the ER to something other than 2:

8=FIX.4.1|9=103|35=D|34=3|49=SENDER|52=20121105-23:24:42|56=EXEC|11=1352157882577|21=1|38=10000|40=1|54=1|55=RECV|59=0|10=062
8=FIX.4.1|9=139|35=8|34=3|49=EXEC|52=20121105-23:24:42|56=SENDER|6=0|11=1352157882577|14=0|17=1|20=0|31=0|32=0|37=1|38=10000|39=0|54=1|55=RECV|150=1|151=0|10=059

Now we get a different feedback:

correct traces: 0% (0/1)
reason: attempting to match message with an action failed
attempting to match: 8=FIX.4.1|9=139|35=8|34=3|49=EXEC|52=20121105-23:24:42|56=SENDER|6=0|11=1352157882577|14=0|17=1|20=0|31=0|32=0|37=1|38=10000|39=0|54=1|55=RECV|150=1|151=0|10=059

This output indicates that the tool encountered an inbound message (the ER) that should correspond to an asynchronous event (i.e., an action), but failed to find a suitable action in the IPL model. This outcome is expected, as the ER in the logs doesn’t fit the NOS+Fill scenario we defined in the model.

Conclusion

Verifying models against real-world data is a critical step in ensuring the accuracy and reliability. Imandra’s Log Analysis tool provides a powerful solution to this complex problem, bridging the gap between formal specifications and real-world behavior and offering multiple levels of verification from basic syntax checking to reasoning about complex trading logic. By making use of powerful tools like IPL and Imandra, Imandra’s Log Analysis allows to verify that observed system API behavior (as captured in FIX logs) aligns with the formal model specification, and provide detailed feedback on why certain log sequences fail to match the model, facilitating debugging and refinement of both the model and the actual system. Learn more about this tool and many other features of our Imandra Connectivity product on our website.

Connecting Digital Twins and Systems was originally published in Imandra Inc. on Medium, where people are continuing the conversation by highlighting and responding to this story.

Validating FIX digital twins with automated reasoning

Nicola Mometto — Tue, 01 Oct 2024 17:03:06 GMT

Imandra Connectivity is a cloud-native AI-powered platform for designing, enabling, auditing and certifying financial APIs (e.g. FIX Protocol). One of its core features is Imandra Protocol Language — a formal programming language for modeling APIs in a way that makes them susceptible to hallucination-free application of automated reasoning (symbolic AI). Most often (in fact, 100% in our experience!), API designs expressed via lengthy PDFs contain errors that are virtually impossible to detect by humans. Below we showcase a recently released feature (a tool) for detecting subtle inconsistencies in the design before they make their way into production environments and cause embarrassment and/or regulatory fines for the financial institutions providing these APIs to their customers.

The need for digital twins

Typical APIs used in financial markets are very complex — they cover many messages, fields with validation logic, workflows, etc. Traditionally, such APIs were designed in and communicated via Word or PDF documents. In two prior posts listed below (and on our website) we describe the numerous problems this entails. To make API design precise and automate (with AI) many tasks involved in the lifecycle of the API, we created a domain-specific language — Imandra Protocol Language-for making it easy to encode such models in a formal (mathematically-precise) format so we can bring to bear the latest advances in AI to ensure their correctness, generate certification packs, test suites, documentation, rigorously integrate with LLMs and much more.

For further information on the need for IPL please check out earlier articles:

IPL models (programs) are functional digital twins of the API endpoint — they model the infinitely many ways a typical customer (e.g. a hedge fund) can use the system behind the API (e.g. exchange order book). In the article below we demonstrate a new tool (feature of our platform) we created to help our users ensure their designs are correct (before they’re implemented in the actual systems!).

Formal digital twins connect all of the stakeholders involved over a “golden source of truth” and enable application of neuro-symbolic AI

The problem: logical inconsistencies in IPL models

In formal specifications, particularly those modeled using Imandra’s IPL, ensuring logical consistency is critical. Our suite of AI tools analyzes IPL models, generates test cases, and can even aid in onboarding conformance. However, when logical inconsistencies exist in the model — whether due to coding errors or deeper logical issues in the specification itself — our tools may exclude certain behaviors from test case generation. Detecting and resolving these inconsistencies can be challenging, especially when the absence of a test case points to an underlying issue that is hard to identify.

To address this challenge, we developed a tool that shifts the detection of logical inconsistencies earlier in the process, allowing users to catch and resolve these issues before generating test cases. This saves significant time and effort ensuring that models are thoroughly vetted for logical soundness and completeness.

Understanding message flows in IPL

In IPL, message flows are used to guide our tools when analyzing and decomposing models. A message flow represents a specific sequence of events or messages that the system processes. For example, a simple message flow might analyze the possible behaviors when a NewOrderSingle message is followed by a fill action.

A more advanced message flow might look like this:

messageFlows {
  NewOrderNoStopLimit {
        template [
            NewOrderSingle when { it.OrdType != OrdType.StopLimit } expanding { OrdType } { TimeInForce }
        ]
    }
}

This flow instructs the system to analyze the behaviors for a NewOrderSingle message where the OrdType is not StopLimit. It also tells our system to explicitly enumerate all valid combinations of OrdType and TimeInForce in separate test cases.

See https://docs.imandra.ai/ipl/messageFlows/ for more info.

An example of a logical inconsistency

Consider the following IPL model, which simulates a simplified financial trading system. In this example, we define two key messages: NewOrderSingle and OrderCancelReplaceRequest, and several validation rules around order types and time conditions.

import FIX_4_4

using OrdType { Limit Market StopLimit Stop }
      TimeInForce { Day GoodTillCancel }

messageFlows {
    Flow1 {
        template [
            NewOrderSingle expanding { OrdType, fun pricePresent } { TimeInForce },
            OrderCancelReplaceRequest
        ]
    }
}
function pricePresent (m : NewOrderSingle) : bool {
    return present (m.Price)
}
internal state {
    ClOrdID :? string
    is_live : bool = false;
}
message NewOrderSingle {
    req ClOrdID
    req OrderQtyData.OrderQty valid when it > 0.0
    req OrdType
    opt Price valid when (case (it) { Some x : x > 0.0 } {None: true })
    req TimeInForce
    validate {
        this.OrdType == OrdType.Limit ==> present (this.Price)
    }
    validate {
        present(this.Price) <==> this.TimeInForce != TimeInForce.Day
    }
    validate {
        this.OrdType != OrdType.Market ==> this.TimeInForce == TimeInForce.Day
    }
    validate {
        !state.is_live
  }
}
message OrderCancelReplaceRequest {
    req ClOrdID valid when (Some it) == state.ClOrdID
    validate { state.is_live }
}
receive (m : NewOrderSingle) {
    state.ClOrdID = Some m.ClOrdID
}

There are two critical logical inconsistencies in this model:

OrderCancelReplaceRequest can never be valid
The OrderCancelReplaceRequest message requires state.is_live to be true for the message to be valid, but in the NewOrderSingle message’s receive block, state.is_live is never updated. As a result, the OrderCancelReplaceRequest can never be accepted, making the model logically inconsistent.
Contradictory validation rules for NewOrderSingle
The validation rules for the NewOrderSingle message introduce a conflict that prevents any Limit order from being valid:
- Rule A: If OrdType is Limit, the Price must be present.
- Rule B: If the Price is present, TimeInForce must be something other than Day.
- Rule C: If the OrdType is not Market, TimeInForce must be Day.
These rules form a cycle of contradictions, making it impossible for a Limit order to satisfy all the conditions at once, rendering the model logically invalid.

How our IPL Logic Validator tool solves this

Let’s assume we are analyzing the model under the Flow1 message flow. Our new IPL Logic Validator tool can automatically detect these inconsistencies. For example, it might highlight that the sequence NewOrderSingle -> OrderCancelReplaceRequest is unreachable due to the failure to update state.is_live, as well as pointing out the specific contradictions in the validation rules for the NewOrderSingle message preventing a Limit order from ever being valid.

Here’s a simplified output from the tool showing the unsatisfiable conditions:

For message flow `Flow1`, unsat cores:
  step 0: [OrdType(Flow1,NewOrderSingle) = Limit +
           this.OrdType == OrdType.Limit ==> present (this.Price) +
           present(this.Price) <==> this.TimeInForce != TimeInForce.Day +
           this.OrdType != OrdType.Market ==> this.TimeInForce == TimeInForce.Day];
  step 1: [state.is_live]

This output shows exactly where the model’s logic breaks down. It highlights the specific combination of validation rules for NewOrderSingle that leads to a logical contradiction, and it also points out that state.is_live is never updated, preventing the OrderCancelReplaceRequest from being valid.

(Coming soon!) Seamless integration with IPL VS Code Plugin

To make this process even more seamless, our IPL Logic Validator tool will soon be fully integrated into our Visual Studio Code extension. When browsing an IPL model, users can simply right-click on the relevant section and select “Validate IPL Logic” from the contextual menu. This triggers an immediate analysis of the model’s logical consistency, pinpointing any unsatisfiable conditions directly within the VSCode environment. This integration allows for quick, in-editor feedback, helping developers identify and resolve issues faster without leaving their workflow.

Download our IPL VS Code Plug-in in the Visual Studio Marketplace.

Conclusion

Our new tool streamlines the process of detecting and diagnosing logical inconsistencies in IPL models. By catching these issues early — before generating test cases or conducting onboarding sessions — it saves developers and spec designers from the frustrating task of manually hunting down the root causes of missing or unreachable behaviors.

Validating FIX digital twins with automated reasoning was originally published in Imandra Inc. on Medium, where people are continuing the conversation by highlighting and responding to this story.

Automated Reasoning for SysML v2 Part 3

Jamie Smith — Tue, 20 Feb 2024 16:14:16 GMT

Using Imandra Region Decomposition to Gain Insights of SysML v2 Models.

The team at Imandra is continuing to develop techniques and explore how Imandra Automated Reasoning can provide insights and prove properties of SysML v2 models. This is the third article in a series on how to use Imandra with SysML v2. In the prior two articles, we covered how Imandra Automated Reasoning is used to verify and prove the correctness of systems. We will expand the Automated Reasoning discussion in this article by looking deeper at proving determinism using formal methods.

Specifically, in the prior two articles, we covered formally verifying two different types of verification goals for a traffic light. In article one, we proved that the system properly handled errors and placed the systems in an operational “stop_sign” mode if an error was detected. We proved that the system would fail-operational and notify the central office of the error via an SMS message. In the second article, we used Imandra Automated Reasoning to prove SysML v2 constraints and requirements are always met.

I recommend that if you have not read those two articles, please do so; they are linked in medium.com at the bottom of this article. For those of you who have read them, thank you. I am very interested in your thoughts and feedback.

Using Imandra Automated Reasoning with SysML v2.

SysML v2 is a general-purpose modeling language for complex system design. SysML v2 is an open standard developed by the Object Management Group (OMG). Our work at Imandra is based on the SysML v2 pilot implementation and beta release from July 2023. The OMG working groups continue progressing toward a final release, which should come in mid-2024. For more information on the SysML v2 release schedule, go to omg.org.

We have spent several months learning SysML v2 and mapping the semantics to Imandra Modeling Language (IML). The more we learn, develop, and experiment, the more excited we get about the potential. The SysML v2 formal semantics will allow us to use Imandra Automated Reasoning in a wide range of applications to help systems engineers design safer systems faster.

At Imandra, we are executing against a roadmap that includes the following:

· Formal verification of properties, requirements, and constraints.

· Conversion from SysML v1 to SysML v2

· SysML v2 to IML transpiler

· Hallucination-free natural language interfaces (ChatBots) will enable all stakeholders to explore and reason about SysML v2 models.

· Integration with popular MBSE development tools

In this article, we will continue to use the traffic light model discussed in the previous two articles. We will show how to use Imandra Automated Reasoning to prove the determinism of the state machine. We will also introduce you to region decomposition to explore all the behavioral regions of the SysML v2 model and to create a semantic diff.

SysML v2 State Machines, States, and Transitions

SysML v2 system behavioral flow is done primarily in two ways. Systems engineers use SysML v2 to model behavior with action nodes and control nodes, such as forks, joins, merge, and decision. The other approach is state-based behavior using a state machine. Both approaches model complex behaviors, and of course, these two approaches can be combined side-by-side or in a hierarchy. This article will focus solely on state-based behavior and state machines.

SysML v2 state machines have three basic elements: states, transitions, and actions. States invoke actions, and transitions determine when you change states. SysML v2 states are a specialized type of action that calls up to three sub-actions: an Entry-action, a Do-action, and an Exit-action. Of course, actions can call additional actions, so you are not limited to only three actions, but there are three events when the actions are invoked. The Entry-action is executed when you enter the state, and then the Do-action is executed after the Entry-action is complete. The Exit-action is executed when the state transitions to another state.

The SysML v2 transitions determine when the state machine changes from one state to another. The SysML v2 transitions may have guards that are expressions that return a Boolean value. If the Boolean value is true, then the transition criteria are met, and the state changes to another state dictated by the transition. Here is a simple diagram of the order of execution of states and actions determined by SysML v2 transitions.

One case that can be confusing is what happens if a transition criterion is met before the Do-action is complete. If that occurs, the Do-action is terminated, and the Exit-action is executed.

To help us better understand SysML v2 state machines, I will go through the traffic light example in some detail. The traffic light state machine has 12 states and 18 transitions. The states are blocks in the state diagram below, and the transitions are arrows. While this state machine is not trivial, it is pretty simple. In the next section, let’s look at a few states and the transitions, first in SysML v2 textual notation and then in Imandra Modeling Language (IML).

Traffic Light State Diagram

Explaining the Traffic Light State Machine

Let’s look at a section of the state machine to help explain SysML v2 state machine concepts and lay the foundation for our discussion of determinism later in the article. We will detail a few states, transitions, and actions to help us understand SysML v2 state machines. This will lead to our explanation of how Imandra Automated Reasoning proves determinism and isolates design errors in the state machine.

The path from “traffic_flow” to “yellow_on” of the state machine has four states (traffic_flow, red_on, green_on, and yellow_on) and five transitions and two additional transitions between “traffic_flow” and “operation_selection.” We will start with the transitions to show how the state machine determines when to change from one state to another. Below is the SysML v2 textual notation for the transitions “traffic_flow_to_red_on” and “traffic_flow_to_operation_selection.”

transition traffic_flow_to_red_on
  first traffic_flow
  accept when (traffic_light_controller.error == false)
  then red_on;

transition traffic_flow_to_operation_selection
  first traffic_flow
  accept when ((traffic_light_controller.message_queue != "")
  or (traffic_light_controller.error == true))
  then operation_selection;

These two transitions are simple and self-explanatory. If the state machine is in the “traffic_flow” state and an error has not been detected, the system will transition to “red_on.” If the state machine is in “traffic_flow” and there is an error, OR if there is an external message, the system will transition to “operation_selection.” In “operation_selection” the system may change its operating mode or if an error occurred continue toward the “diagnostic” state.

Now let’s look at the “traffic_flow” state.

state traffic_flow {
  entry traffic_light_controller.check_status;
  exit;
}

The “traffic_flow” state has an entry function “check_status,” which updates both the “Error” Boolean and the “message_queue.” “Error” and “message_queue” are both used to evaluate the “traffic_flow” transitions. To summarize “traffic_flow,” we have an action that updates the attributes the transitions use to trigger transitions to one of two different states. This example is about as simple as it gets. Now let’s move on to the “red_on” state.

The “red_on” state also has two transitions. They are more complex but still a series of Boolean expressions. We will go over the two transitions one at a time. Below is the “red_on_to_green_on” transition.

transition red_on_to_green_on
  first red_on
  accept when ((traffic_light_controller.wait_duration >traffic_light_controller.red_on_time)
      and (vehicle_sensor.vehicle_at_light == true)
      and (traffic_light_controller.message_queue == "Safe_all_red"))
then green_on;

This transition is easy to read from top to bottom. The state machine will change from the “red_on” state to the “green_on” state if the system has been red longer than the “red_on_time” AND there is a vehicle waiting at the red light, AND we have received the safety message that all the other lights at the intersection are red.

The other transition usage, red_on_to_traffic_flow, is below.

transition red_on_to_traffic_flow
  first red_on
  accept when ((((traffic_light_controller.message_queue != "")
      and (traffic_light_controller.message_queue != "Safe_all_red"))
      or (traffic_light_controller.error == true)))
then traffic_flow;

The state machine transitions from the “red_on” state to the “traffic_flow” state when there is an external message AND that is not “Safe_all_red” OR if an error has been detected. This is similar to the conditions that change the state machine from the “traffic_flow” state to the “operation_selection” state.

The “red_on” state has actions that impact the traffic light beyond the execution of the state machine. The “red_on” state has an entry function, “turn_red_on,” that handles turning lights off, resetting the wait clock, and changing the state of the red_light to “lit.”

[NOTE: The :>> is redefine syntax in SysML v2.]

state red_on {
  entry traffic_light_controller.turn_red_on;
  do traffic_light_controller.check_status;
  exit;
}

action turn_red_on {
  perform action turn_lights_off;
  perform action reset_wait_clock;
  attribute :>> traffic_light_controller.red_light_state = Light_state::lit;
}

The states and transitions usages for the states “green_on” and “yellow_on” are similar to “red_on.” The SysML v2 textual representation is included below, but we will not walk through them. One difference is that the “green_on” and the “yellow_on” states have only one transition.

transition green_on_to_yellow_on
  first green_on
  accept when (traffic_light_controller.wait_duration > traffic_light_controller.green_on_time
  or (traffic_light_controller.error == true))
  then yellow_on;

transition yellow_on_to_red_on
  first yellow_on
  accept when ((traffic_light_controller.wait_duration > traffic_light_controller.yellow_on_time))
  then red_on;

state green_on {
  entry traffic_light_controller.turn_green_on;
  do traffic_light_controller.check_status;
  exit;
  }

state yellow_on {
  entry traffic_light_controller.turn_yellow_on;
  do traffic_light_controller.check_status;
  exit;
  }

For those of you newer to SysML v2, I hope you better understand SysML v2 state machines, including their states and transitions. In the next section, we will use Imandra Automated Reasoning to prove that the state machine is deterministic, which means that each state has either zero or one legal transition at any time.

State Machine Determinism

We will go over how Imandra Automated Reasoning is used to prove the determinism of a SysML v2 state machine. We can use similar techniques to prove other properties, including but not limited to “reachability” and “no-dead-lock.” Reachability is proving that each state in the state machine can be reached. “No-dead-lock” proves that the state machine will not become stuck in a state and unable to change states.

To use Imandra Automated Reasoning, we must convert the SysML v2 model to IML. We have been doing these conversions manually but are actively developing a transpiler to convert from SysML v2 to IML automatically.

The traffic light SysML v2 model has 12 states, 18 transition usages, and 45 attributes. Some of the attributes are discrete, like “error,” and some are continuous, like “wait_duration,” and some are enumerated types, like the “external_messages.” To prove determinism of the state machine, we will focus on the 18 transitions. Let’s take a look a two of the transition converted to IML. The transitions “traffic_flow_to_red_on” and “red_on_to_green_on” in IML are similar to their SysML v2 textual representation.

https://medium.com/media/9e8544008418e66c9234af53fb78d3d7/href

IML is based on the open-source language OCaml, and IML is nearly identical to OCaml. IML is a slightly scoped version of OCaml to ensure we can use all the Imandra Automated Reasoning tools. It is trivial for developers to include OCaml and OCaml libraries in Imandra if needed for visualization or data manipulation.

In IML, we represent the two transition usages as functions with one parameter, “sa.” “sa” is a variable containing the current SysML state and all 45 system attributes. When we convert the model to IML, we define a type “sysml_state_attributes” and then declare a variable “sa” of type “sysml_state_attributes.”

https://medium.com/media/374510c54679f3e47fc9e8d41270e0d7/href

When we evaluate a transition in IML we pass in “sa” and resolve the Boolean logic. If the Boolean logic resolves as “true,” we change the SysML state; if not, we return “sa” unchanged.

We have converted all 18 SysML v2 transition usages to IML functions. To prove that the state machine is deterministic, we will create a helper function called “check_state_transition_functions” that takes a single parameter, “sa.” “check_state_transition_functions” checks each transition function, in turn, using the same “sa.” Every time a transition evaluates to true, we increment a variable “i” by 1. If no transition functions are satisfied, “check_state_transition_functions” will return “0”; if one and only one is satisfied, it will return “1”; and if more than one transition function is satisfied, it will return “>=2”. For a state machine to be deterministic, this function must return “0” or “1” for all SysML States and all attribute values.

https://medium.com/media/9541f19f355a2ca9df0eec81de9011af/href

We will use “check_state_transition_functions” in an Imandra verify function to prove that either 0 OR 1 transition usage satisfies all values of “sa.” Imandra Automated Reasoning will evaluate all 18 transition usages, all 12 SysML states, and all 45 attributes simultaneously and mathematically prove that “check_state_transition_functions” returns either “0” OR “1”.

https://medium.com/media/e8f83281c3a30f89c7ef5ecc664fb2b6/href

Those who were paying close attention in the prior sections may have noticed that this verification cannot be proved due to a lack of constraints in the transitions. Two issues prevent the proof. When “verify” is called, Imandra returns Refuted and a Counter Example showing an explicit “sa” that caused the proof to fail. I trimmed down each counter-example to the relevant attributes. I will go through them one at a time.

https://medium.com/media/57534a1ad4cc950360fb517fb4c938d3/href

When the state machine is in the “Traffic_flow” state AND the “error = false” AND there is a message in the message queue, it is ambiguous whether the state machine should transition to the state “Operation_selection” or “Red_on.” We can correct this issue by modifying “traffic_light_to_red_on” by adding a specific check that no messages are in the message queue.

https://medium.com/media/3a1d2500a460eeaa48120915c8ed1cc8/href

Now, let’s look at the second issue, which is a significant safety concern.

https://medium.com/media/a78c44f42b621bd24028ba6dbedea5b0/href

When the system is in the SysML state “Red_on,” and there is an error, and we have received the message “Safe_all_red,” indicating that all of the other lights at the intersection are red, it is unclear whether the system should change to the “Green_on” state or to the “Traffic_flow” state, as both transition functions are satisfied. This is an issue and potentially a significant safety concern because we do not want to turn the light green if we detect an error in the system.

The ambiguity can quickly be addressed by adding “error = false” to the “red_on_to_green_on” transition usage.

https://medium.com/media/41f60fada88494f5f8b39fa651d871dc/href

Once we correct both issues in the transition functions, we can re-run the verify function, and it returns “Proved” and provides formal proof for review and confirmation by third-party proof checkers.

If you already saw these issues when reading the previous section. Good for you! The two bugs were in my SysML v2 model of the traffic light when I wrote it and converted it to IML. One of the reasons mistakes and bugs are so common is that systems are getting increasingly complex. Even this simple model of a traffic light has a surprising number of behavioral regions.

Region Decomposition

We can use Imandra Region Decomposition to create a map of any function in our IML model. The result is a collection of invariant regions that collectively represent the entire behavior of the function or system. IML has tools for executing the SysML v2 state machine, including a function for evaluating transitions, “transition sa.” “Transition sa” evaluates transition in every iteration of our state machine engine, checking if any transition expression returns true and, if so, changing the SysML v2 state. Below, we use Imandra Region Decomposition to provide a complete map of the function “transition sa.”

For the traffic light state machine, the transitions represent 175 unique behavioral regions. Each one of these regions represents a collection of constraints that determine an invariant result. We represent the regions in an interactive Voronoi diagram to allow users to explore the state space. The diagram above highlights the region where the power is off, and the invariant result is that we are in the SysML Off_state. The other 174 regions are when the power is ON. Let’s look at two of them.

This is one of the two regions we corrected. It is when the conditions for the transition “red_on_to_traffic_flow” are met. In the constraints, you will see that the power is “on” AND message is “Safe_all_red” AND there is an error. So, the SysML State is changed to “Traffic_flow.” It will then progress toward the “diagnostic” state.

Below is the other corrected transition, “traffic_flow_to_red_on,” where power is on AND no error AND no external message. In this region, the SysML State changes to “Red_on.”

Constraining Region Decomposition and Semantic Diffs

Region decomposition is a powerful tool, but sometimes, you want to reduce the number of results by a specific outcome or condition. We can add helper functions to set conditions for the region decomposition. Here are three sets of constraints and the corresponding results.

The first function we use to set a condition is “sysml_state_red_on sa.” It returns a Boolean true if the SysML v2 State starts in “Red_on” and transitions to any other SysML v2 State.

https://medium.com/media/89d37eea4bf15d51cc5e55abe5ff2680/href

The “sysml_state_red_on sa” function is applied as a condition using the syntax ~assuming:”sysml_state_red_on.” The result is a collection of 14 regions, where one region is a transition to “Off_state,” another region is a transition to “Green_on,” and the other 12 are different sets of constraints that change from “Red_on” to “Traffic_flow.”

Sometimes, you may want to see the path to a specific state, not the destinations from a starting state. This is called a preimage. We can easily generate a preimage using region decomposition. To do so, we will create another condition function. The “sysml_end_green_on sa” function returns a Boolean true if the system changes from any SysML v2 State to the SysML v2 State “Green_on.”

https://medium.com/media/d398a04388cce03a8345d2ca178789e2/href

This Region Decomposition returns only one region, which means that we have one, and only one, path that ends with the SysML v2 State “Green_on.” This result is exciting because we show no unexpected edge cases will lead to a “Green_on” state.

The final thing to look at is the Semantic Diff between the transitions before and after we corrected them. Here is the third and final function we will use as conditions for the Region Decomposition.

https://medium.com/media/f32d198bb2df70b650343cd1f6755200/href

“diff_t” returns a Boolean true if the transition functions have different results for a given “sa.” The notation of M3 and M2 refer to different modules, or namespaces, in IML. By declaring the function transition in different modules, they can have different definitions and behavior but the same name. We then use Region Decomposition on the function “M3.transition” with the condition that “M2.transition <> M3.transition”. The “M2.transition” is the collection of transition usages before we corrected the mistakes we found in the earlier section, and the “M3.transition” is after we made the corrections.

This is a semantic diff between the two instances of the transition function. We see only two regions; one reflects the changes we made in the transition usage “red_on_to_traffic_flow,” and the other reflects the changes we made to “traffic_flow_to_operation_selection.”

Conclusion

We continue to make progress and expose more ways to leverage Imandra to reason about SysML v2 models. Determinism, reachability, dead-lock preventions, and semantic diffs are just examples of what is possible. I am excited to learn from you what type of analysis and reasoning would help your teams and customers be more successful systems engineers and stakeholders. We should have the first version of the SysML v2 to IML transpiler done in early 2024 and we would like to get access to your examples to test it. We also want to speak to SysML v1 end-users interested in automatic conversion to SysML v2. Please get in touch with me at Jamie@imandra.ai or look for me at the next OMG members meeting.

Automated Reasoning for SysML v2 Part 3 was originally published in Imandra Inc. on Medium, where people are continuing the conversation by highlighting and responding to this story.

An Introduction to Imandra Markets

Paul Brennan — Tue, 20 Feb 2024 16:11:33 GMT

First published on Tabb Forum.

Can AI and digital twins be used to solve problems with operational resiliency and unlock innovation in financial markets?

Global financial markets are under pressure to unlock new revenue streams through innovation and, as highlighted in a recent IOSCO consultation on market outages, improve operational stability. Active innovation trends include ESG, digital assets, new data products, and the use of AI. However, challenges with system complexity, automation and organisational inefficiencies can drag innovation and profitability.

Is there a way that financial markets can leverage innovative ideas from other industries to accelerate their growth? For example, a recent McKinsey article highlighted that “digital twins are revolutionising how decisions are made within factories, and forward-thinking manufacturers are getting ahead of the technology curve to drive efficiency”.

We discuss how the digital twin approach combined with AI can unlock innovation and increase operational resiliency within financial markets.

Industry challenges with operational resilience

As seen over recent years, major exchange outages have a market-wide impact. Industry participants still ask, “Why does this still happen?”. Regulators and market participants demand a greater level of operational resiliency.

Despite industrywide initiatives to improve financial market operational resilience, including attempts to agree on alternative closing price procedures, many root causes persist. The problem is that there is no silver bullet solution.

A recent IOSCO survey shared in their “Consultation Report Market Outages” published in December 2023 shows that software-related issues caused most market outages. Examples include failed software releases and invalid instructions. Reading the paper highlights the breadth of causes in the chart below, which indicates the scale of the challenge that exchanges face.

Source: IOSCO consultation on market outages. “Root causes of market outages on listing trading venues reported in the surveyed IOSCO jurisdictions between 2018 and 2022”.

Innovation headwinds

With operational resiliency in mind, one can imagine the challenges in delivering swift innovation. New products and features must be designed, specified, built and tested by different internal groups and communicated clearly to external stakeholders such as trading firms and regulators. The coordination required to align all interested parties is enormous, and the industry-wide expectation is to get it right the first time.

Consider the following examples of exchange upgrades:

A new order type to accommodate a new asset class.
A system transformation project to replace a legacy platform.
A new fee type based on complex order interactions.
A matching logic change to introduce an anti-gaming mechanism.

All market participants feel the impact of these upgrades, many of whom have to alter their trading systems to accommodate the change. The exchange has to change its public and private documentation, adjacent systems (post-trade, surveillance, regulatory reporting, etc.), coordinate data providers, have trading firms conformance test, the list goes on. In essence, the entire ecosystem has to understand and implement the change precisely.

It prompts the following questions for the exchange:

How much due diligence is required?
Is the design correct?
How can we predict the impact throughout the entire platform?
How can we ensure the correct understanding across all stakeholders?
How is this tested, and have we identified defects before launch?

From the project’s inception to final deployment and customer adoption, one perceived conflict is present: What is the trade-off between risk management and cost?

There are also non-functional exchange upgrades, which are mandatory for numerous reasons, including information security, hardware replacements, firmware upgrades, and system performance. They each pose comparable challenges for the exchange.

Roots of the problem

Traditional product and software development practices can’t keep pace with the algorithm complexity we see in modern trading and exchange systems.

Knowledge

The knowledge and understanding of the exchange system are distributed across multiple teams and specialists. Documentation such as business requirements, rulebooks, and technical user guides are written manually using prose, tables, and worked examples, often under-specified and quickly outdated, leading to the following challenges with the system build:

There is no way of ensuring that new requirements are logically consistent across the ecosystem.
Developers hope the business requirements have been accurately understood and captured.
Test programs lack formal measures against which they can assess their success.
Accountable stakeholders lack transparency, which exposes them to unbounded risks.

In other words, there is no formal synchronisation between stakeholder understanding & documentation and the exchange system.

Identification of defects

We know from Boehm’s Law and real-world experience that fixing bugs and glitches in production is exponentially more expensive than in the design phase. Yet, there is a lack of scientific techniques used within financial markets to identify defects and design flaws. Given the complexity of modern financial systems, is it possible to verify the impact of the new or upgraded system design before it is even built? And if not, identify defects earlier in the development lifecycle.

An interesting analogy is when a new bridge is built. The architects and civil and structural engineers communicate over an exact blueprint that describes the details of the bridge. Structural engineers apply laws of physics to the blueprint to see if it is structurally sound, even before the ground is broken. This up-front analysis at the design stage saves time, money and lives. This model-based design verification approach is commonplace in safety-critical industries and microprocessor design.

The alternative solution — a digital twin

What is a digital twin?

IBM defines a digital twin as “a virtual representation of an object or system that spans its lifecycle, is updated from real-time data, and uses simulation, machine learning and reasoning to help decision making”.

Digital twins for financial markets

With so many stakeholders, the degree of complexity and the demand for rapid innovation, financial systems should transition from age-old, prose-based specifications to a precise design. Representing the financial system’s business logic in a digital executable model that one can interrogate allows exchanges to:

Verify properties about the exchange system behaviour
Generate test cases to ensure the correctness of the exchange system
Run an audit against production trading data
Generate English-precise, prose documentation

And in a more advanced state, machine learning and generative AI can be harnessed to:

Create new revenue streams with innovative data products
Integrate large language models for ease of access

Imandra is leading the way with this technology in financial markets. The following section describes how the “Imandra Markets” digital twin combined with Automated Reasoning tackles the abovementioned challenges by replacing age-old analogue processes with a digital approach.

Imandra Markets — a digital twin powered by Automated Reasoning

The Imandra Markets Digital Twin

Imandra’s patented technology and scientific advances of automated logical reasoning empower the creation of a fully functional and logically precise “digital twin” of any complex financial software system.

The Imandra Markets digital twin is positioned at the heart of the software and product development lifecycle. The golden source specification aligns and deepens stakeholders’ understanding digitally. It is dynamically connected to the test and production exchange environments through data synchronisation to ensure compliance and correct implementation. Imandra Markets offers unique “what-if” scenario analysis and new feature innovations, paving the way for predictive data products for trading firms’ systematic strategy calibration.

Trading venues benefit from the next generation of design verification, rigorous testing, resiliency, business intelligence and growth opportunities.

Solving the knowledge gap

Instead of relying on English-prose documents and long email chains, the exchange stakeholders can digitise the business logic design of the venue, allowing them to have one central and precise blueprint, just like the architects and structural engineers do when designing and building a new bridge.

The Imandra Effect.

The Imandra Markets digital twin is more than just code. It is a living digital specification that systematically aligns all stakeholders to one golden source of exact truth. No time is wasted in decision-making and figuring out how the exchange system works. Knowledge is preserved across the entire team, and no information is lost.

A core tenet of the product is to ensure that all stakeholders can use it, not just those who can read the code. Imandra Markets has tools catering to all users and their specific areas of interest. Here are two examples:

A compliance officer can directly tie the precise prose description of a system feature and usage statistics to the underlying system via the digital twin. Imandra Markets can generate comprehensive governance reports, making regulatory compliance and approvals simple and quick.
A product manager and business analyst can take a cut of the model and ‘play’ by simulating the impact of new designs. Imandra Markets will systematically assess the impact of new designs, which are displayed through powerful side-by-side orderbook replay tools. This “what-if” scenario analysis and impact assessment accelerates the creative thought process and unlocks innovation.

The Imandra System Auditor displays the precise prose description aligned with production usage statistics/ test results (Functional Predicates) and proven properties of the design (Verification Goals).

Identifying defects and improving operational resilience

Financial market systems are so complex humans struggle to understand the entire system’s behaviour, especially when relying on analogue specifications. There are simply too many edge cases to consider, keep track of and test. Breakthroughs in AI and mathematics allow us to model exchange rules and regulations precisely and apply rigorous logical AI to automate regulatory analysis and testing, all while providing logical audit trails.

Based on the digital design of the exchange or venue, Imandra Markets uses highly automated logical reasoning AI to ensure its compliance and correct implementation. There are two critical pillars to this:

System Verification — A Verified Design. By using AI-powered logical reasoning to verify system behaviour, Imandra Markets will identify exchange design defects upfront before the software development process begins, which isn’t possible in the analogue world of prose-based requirements and specifications.
Systematic Validation — Automated Test Plan. Imandra can systematically analyse the digital twin to identify all possible behaviours and edge cases. This analysis, “symbolic reasoning”, is a form of generative AI underpinned by logical reasoning, which yields accurate results.

This leads to high-coverage automated test generation, used to test the exchange system and identify functional and non-functional issues.

The result is that exchanges benefit from a design and system defect identification step-change. The perceived trade-off between risk management and cost is broken.

Imandra Markets unlocks the pace of software development, gives financial institutions confidence in their operational resilience, and proves that their system design will behave as intended.

Business intelligence

Trading firms are well versed in using order, trade and market data to help inform order routing decision-making, for example, TCA. They aim to understand better the intersection between (i) the exchange features and (ii) the statistical liquidity profile to calibrate their strategy to meet their trading objectives.

Imandra Markets gives the exchange operators a unique tool to analyse their customers’ interaction with the trading system by running many ‘what-if’ scenarios to create an actionable intelligence data product to help customers optimise the whole potential of the exchange system and its features.

Timing a racing car on a race track offers an interesting analogy. The crucial performance measurement visible to the outside world is the one against the stopwatch, the lap time. However, the car’s potential is only visible to the racing team. The team’s engineers can analyse the driver using internal telemetry to see if they fully exploit the car’s capability to make it go faster.

Imandra Markets gives exchanges the tools to help trading firms realise the unused potential of the exchange system.

The takeaway

If there is one takeaway from reading this article, it is that a better way of managing complex financial systems exists.

By taking inspiration from safety-critical industries, companies can move away from traditional analogue processes by using digital twins to manage their systems and reap the benefits.

The Imandra Markets digital twins offer a leap forward in exchange system resiliency, accelerate system development and open up avenues for business intelligence and new revenue streams.

Exchanges have already begun using this approach. Platforms in Europe collectively responsible for one-quarter of the equity market volume actively use this method.

With industry-wide focus and mandates to explore the use of AI, Automated Reasoning is under the spotlight, and when combined with the digital twin approach, it can be game-changing.

About the author

Paul Brennan is the Chief Strategy Officer and Head of Growth for Financial Markets at Imandra, Inc.

Before working at Imandra, Paul spent 15 years at Goldman Sachs in numerous roles, most notably as the Chief Operating Officer of the European trading venue SIGMA X MTF.

Paul has deep expertise in managing regulated businesses based on complex mission-critical technology, including steering them through business and technology transformations.

As an early adopter, Paul became the first customer of Imandra and demonstrated the value of Automated Reasoning AI within Financial Markets.

An Introduction to Imandra Markets was originally published in Imandra Inc. on Medium, where people are continuing the conversation by highlighting and responding to this story.

Automated Reasoning for SysML v2 Part 2

Jamie Smith — Thu, 02 Nov 2023 21:51:19 GMT

Formally Verifying SysML v2 Constraints with Imandra

I previously published an article entitled “Automated Reasoning for SysML v2.” The article outlined how the Imandra Automated Reasoning enables complex reasoning about the design and formal verification of properties and behaviors. The example we shared in that article demonstrated formal verification of three verification goals for a traffic light designed using the pilot implementation of Systems Modeling Language version 2 (SysML v2.):

1. The behavior is deterministic, regardless of state or parameter values.

2. Errors are handled properly, irrespective of state or parameter values.

3. The light only switches to green when it is safe (all other lights in the system are red), irrespective of state or parameter values.

The article focused on the second verification goal, “Errors are handled properly.” Several systems engineers and developers working on SysML v2 read the article or reviewed my example and provided encouragement and positive feedback. They also had some great ideas of what I should do next. More than one person recommended that we synthesize verification goals directly from SysML v2 requirements and constraints. Again, I am excited to share my progress over the past couple of weeks.

Using Imandra Automated Reasoning with SysML v2

SysML v2 is a general-purpose modeling language for complex system design developed by the Object Management Group (OMG) as an open standard. The OMG released the beta implementation in July of 2023, and I expect the final release to be next summer. For more information on the SysML v2 release schedule, go to omg.org.

I am excited about SysML v2 for several reasons. After working with SysML v2 for the last few months, I have found it easy to learn. Unlike SysML v1, the language is consistent across all the modeling elements, such as parts, ports, connections, states, and so on. I am perhaps most excited about SysML v2’s formal semantics and the opportunities this creates. Formal semantics will ensure that there is a single representation of a SysML v2 model that contains all of its elements and relationships. The SysML v2 formal semantics will allow us to cleanly transpile SysML v2 models to the Imandra Modeling Language (IML). Once we have converted a SysML v2 design, we can formally verify the design against requirements, constraints and other verification goals. I am confident with the approach we are taking that system engineers can define verification goals natively in SysML v2 and still take advantage of Imandra’s automated reasoning tools without having to learn a second modeling language.

To demonstrate how Imandra can verify a system designed using SysML v2 we modeled a traffic light using SysML v2 and then converted it to IML. This manual conversion has helped create a guide and roadmap for the automatic SysML v2 to IML conversion tool (a “transpiler”). The SysML v2 traffic light model is based on the standard green, yellow and red traffic light used in the United States. We chose the traffic light because it is complicated enough to demonstrate some key capabilities of both SysML v2 and Imandra, while not being so complex that it would be difficult to understand. In the last article we focused on the behavior of the traffic light. This article will focus on two SysML v2 constraints. We will convert both constraints to formal verification goals in Imandra. Reviewing my first article, “Automated Reasoning for SysML v2,” may be useful before continuing.

Constraints and Verification Goals

SysML v2 models consist of elements and their relationships. One specific type of SysML v2 element is a constraint. A constraint has inputs and always returns a Boolean value. If the Boolean value returned is true, then the constraint is met, if it is false, then the constraint is not met. The input values to the constraint are not limited, except that they must be resolved through calculations and logic to a single Boolean result.

Formal requirements in SysML v2 are a special type of constraint, so for this article and example, you can think of constraints and requirements as interchangeable terms. The techniques and approaches we cover here will apply to both constraints and formal requirements.

Imandra Automated Reasoning has many capabilities, one of which is formal verification. With Imandra, we can mathematically prove the properties of a system using the Imandra verify function. The verify function leverages a variety of solvers and techniques behind the scenes to verify (or prove) that any function or expression is true. If Imandra is unable to prove that an expression or function is true for all parameters and states of the system, Imandra will provide a counterexample illustrating a case where the expression or function does not hold as true.

Constraints in SysML v2 map directly to Imandra verification goals. Both SysML v2 constraints and Imandra verification goals are very general and are defined as a function or expression that takes any number of parameters as inputs and returns a Boolean result. I am going to share two examples of SysML v2 constraints mapped to Imandra verification goals. One is trivial, and the second is significantly more complex.

Mapping Traffic Light Constraints to Verification Goals

In my prior article, we verified three behavioral properties of the traffic light:

1. The behavior is deterministic, regardless of state or parameter values.

2. Errors are handled properly, irrespective of state or parameter values.

3. The light only switches to green when it is safe (all other lights in the system are red), irrespective of state or parameter value.

In this article, we are going to verify weight and power usage against a limit. Verifying weight for a single system design without variation parts or parts that have fluctuating weight is a trivial task, and there are tools and examples in the SysML v2 language of how to create and resolve these types of constraints. Evaluating power consumption is significantly more complicated, automated reasoning is required to ensure that the power consumption requirement is met across all operational states and parameters.

Let us start with the weight constraint and verification goal. Below we have a SysML v2 constraint definition and constraint usage. The constraint definition for the traffic-light weight is as follows.

   constraint def MassConstraint {
        in partMasses : MassValue[0..*];
        in massLimit : MassValue;
            sum(partMasses) <= massLimit
            }

This constraint definition takes two inputs: a list of MassValues and a massLimit. The constraint definition returns a Boolean result based on a predicate defined as the sum of all the MassValues to be less than or equal to the massLimit. We then create an Assert Constraint usage based on this definition.

Assert constraint massConstraint : MassConstraint {        
        in partMasses = 
            (light_housing.totalMass, 
            tl_controller.mass,
            tl_counter.mass, 
            tl_radio.mass);
         in massLimit = 25.0 [kg];
  }

There are several other ways to define this list of masses. In the SysML v2 examples, there is a concept of a “mass rollup.” We used that approach here, and the whole system mass includes the “totalMass” for the light_housing that includes: the mass of the three lights (green, yellow, red), 3 inserts, the electronics housing, and light housing itself. The light_housing.totalMass plus the mass for the controller, counter, and radio give us the whole system mass tied to this constraint. The controller, counter, and radio are not in the light_housing>totalMass because they are not part of the light_housing part hierarchy. This example is an asserted constraint. In SysML v2 the difference between a constraint and an asserted constraint is that asserted constraints always must return true and often have a stakeholder (like the customer) tied to them.

Resolving this constraint is trivial. All that is needed is to sum the 4 mass values and compare them to the mass limit. We are going to use this simple example to illustrate how SysML v2 constraints map to Imandra’s verification goals. Here is the simple mass verification goal in Imandra.

https://medium.com/media/d87c84bc60450f59a699172bcf7cc9c8/href

We synthesized the part_masses function by crawling the part hierarchy and identifying all the massed parts. We include the part masses and the multiplicity. The multiplicity is the number of parts of a specific kind, often, it is 1, but not always. In the case of the “traffic_light” model the light_insert has a multiplicity of 3, one light_insert for each light. We need to multiply the part mass by the multiplicity to get the total mass.

The “sa” variable includes all the attributes for the system and the SysML v2 state. The “sa” variable provides a complete view of the entire system at a point in time.

https://medium.com/media/59e53679b202002b32c44b91f10c4e01/href

This trivial example shows the mapping from SysML v2 to IML. The verify function explores the entire state space of the system and proves that the goal is met for all undefined variables and values. This verification goal returns Proved.

Formally Verifying a Power Consumption Constraint

We now will apply these techniques to a much more complicated example: power consumption. Power consumption, or power usage, is more complex because power consumption changes during operation. The power constraint seems simple enough and is very similar to the mass constraint definition.

    constraint def PowerConstraint {
        in partPowers : PowerValue[0..*];
        in powerLimit : PowerValue;
            sum(partPowers) <= powerLimit
            }

    assert constraint powerConstraint : PowerConstraint {
            in partPower = (light_housing.light_inserts.green_light.power_usage,
                            light_housing.light_inserts.yellow_light.power_usage,
                            light_housing.light_inserts.red_light.power_usage,
                            light_housing.light_electronic_housing.power_usage,
                            traffic_light_controller.power_usage,
                            traffic_light_counter.power_usage,
                            traffic_light_radio.power_usage);                                       
            in powerLimit = 19.0 [W];

The lights and the radio have a variable power determined by their individual states. In SysML v2 we represented these variable power values as a calculation.

    calc def Light_power { in light_state : Light_state;
           return : PowerValue = 
                     if light_state == Light_state::off ? 0 [W]
                        else if light_state == Light_state::lit ? 12 [W]
                        else if light_state == Light_state::blinking ? 8 [W]
                        else 0 [W];
           }
           
    calc def Radio_power { in radio_on : Boolean;
           return : PowerValue = 
                     if radio_on ? 4 [W]
                     else 1 [W];
           }

When a light is Lit, it consumes 12 Watts, when it is Blinking, it consumes 8 Watts, and when it is Off, it consumes 0 Watts. The radio consumes 4 Watts when it is transmitting and 1 Watt when it is just listening. Summing up the power of all the components “on” you would get ~43W, well above the constraint.

To verify PowerConstraint, we need to calculate power during operation across all states, all paths, and all possible situations. There are 12 SysML states for the traffic_light system and 45 attributes; you can see the state map below.

It is a more complex process, so we intend to automate most steps and provide high-level configuration tools native to or similar in style and workflow to popular MBSE tools to enable Systems Engineers to efficiently perform these types of inductive formal proofs.

Here are the six steps we follow to formally prove that the power never exceeds 19W:

1. Create a definition for “Good States” for the system.

2. Create a lemma that verifies, that if you start in a “Good State” you will end up in a “Good State” after executing the state machine once.

3. Use the lemma from step 2 to create another lemma that verifies that if you start in a “Good State” you will remain in a “Good State” after executing the state machine an arbitrary number of times using any parameters or values.

4. Define an initial starting point for the system and verify that it is a “Good State”. We used the “Off_state” just after the power was turned on. Now we know that if the system starts from the “Off_state” it remains in a “Good State regardless of how long the state machine runs and over any values and parameters.

5. Create a lemma that proves the power is <=19 Watts for all “Good States”.

6. Use lemmas created in steps 2–5 to verify that if we start from the “Off_state” the system will never exceed 19W regardless of how long it runs and over any values and parameters. Steps 2–6 will be completely automated, so the user will just need to define the “Good States” for the system, which we will make easy through tooling.

Step 1. Define “Good States”

The complete system state is determined by the “SysML_State” and a collection of SysML attributes. The SysML states are shown in the diagram above, and here is the list of the attributes.

https://medium.com/media/cfafe43c1facf66dc4b3904fec1cb240/href

The system has 45 attributes in all.

We combine the SysML State and the Attributes into a single variable “sa” of type “sysml_state_attributes”. We then define 12 “Good States” by constraining 8 of these 45 attribute values.

In the “Good States” we confirm the following:

· Power is ON

· The lights are on and off during the appropriate SysML states. Ex. Green Light is “Lit” in the “Green_on” SysML_state and the other lights are “Off”

You will see below that each of the lights is represented twice, once in the controller and once as the light itself. We also define a “Good State” when every light is “Off”, but power is ON.

https://medium.com/media/b94976f69043ec6236834ce641094eea/href

Step 2. Prove if you start in a “Good State” you remain in a “Good State” after one step of the state machine.

We used this function “good_tl_states” along with the function we developed to execute the state machine to create our first lemma. The “my_step” function executes the state_machine once. It takes a “sa” and returns a new “sa” based on external messages and values, passed as the variable “ex_m”.

https://medium.com/media/8e954df9d980fcab1626f68451f3235e/href

The external message “ex_m” is made up of these six variables.

https://medium.com/media/7502937d7567ba83e0bbde7e9a96ce24/href

The first lemma “step_good_invariant” proves that if we start in one of our 12 “Good States” we will be in one of the 12 “Good States” after the state machine executes once. We will then use this lemma “step_good_invariant” in the creation of future lemmas.

lemma number 1

Step 3. Prove if you start in a “Good State” the system remains in a “Good State” after n steps of the state machine.

Now we create a second lemma that uses the first lemma to verify that if we start in one of our 12 “Good States” we will be in one of the 12 “Good States” after executing an arbitrary list of “external_messages” of an arbitrary length. So no matter how long the state machine runs, we will be in a “Good State”.

The function below “run_my_step” is a recursive function that runs “my_step” once for every element in the list of “external_messages” “ms”. For the lemma, “run_good_invariant” we do not specify the values or the length of the list of “external_messages” to instruct Imandra to prove that if we start in a “Good State” we end in a “Good State” for an arbitrary list of “external_messages” of an arbitrary length.

https://medium.com/media/2eea91d97d7d524ca065d22dad018043/href

This lemma “run_good_invariant” also comes back as proved, and we can use it in future proofs.

lemma number 2, built using lemma number 1

Step 4. Define an initial state and prove the system remains in a “Good State” after n steps in the state machine.

Next, we defined an initial system state that is in the “SysML_state” = “Off_state” and all the lights are OFF and the Power is ON.

Initial System State, SysML_state = Off_state, Power = true, and everything else is Off.

Then we create a third lemma that proves that if we start from the initial SysML state Off_state we will remain in a “Good State” for an arbitrary list of “external_messages” of an arbitrary length.

Lemma 3 proves that the system is always in a “Good State” from power on.

Step 5. Prove the power constraint is met for all “Good States”

Now we bring in the power calculations. We created a function based on the SysML v2 constraint and used it in our next lemma. The function is “sum_total_power_states” and can be seen below.

https://medium.com/media/d69c96070127a2517a92ea93abbbc018/href

We use the above function to prove that the power is <19W for all “Good States”.

https://medium.com/media/f5c6b455e0fffc698696959381a569d4/href

Step 6. Prove the power constraint is always met.

It returns proved, and now we are ready for the final step. We use all four of the previously proved lemmas to prove that the power consumption is <19W for an arbitrary list of “external_messages” of an arbitrary length starting from turning the power on.

https://medium.com/media/d1129059023900fd720567477d3b7867/href

Theorem proved! The SysML v2 power constraint is true for every possible system state and attribute value.

Conclusion

I am excited about our progress, but we are just getting started. The approach outlined in this article can be done manually today to verify your own SysML v2 models, but we are working to make that process much easier. Here is a short list of work underway:

· Creating a transpiler to use the SysML v2 API to export SysML v2 models as JSON and automatically convert them to IML.

· Developing tooling and automation to accelerate formal verification of SysML v2 constraints.

· Integrating Imandra Automated Reasoning Engine with Large Language Models to enable a hallucination-free natural language interface for reasoning about your SysML v2 models, creating verification goals, and synthesizing SysML v2.

To discuss this work, please contact me at Jamie@imandra.ai, or look for me at the next OMG members meeting.

Automated Reasoning for SysML v2 Part 2 was originally published in Imandra Inc. on Medium, where people are continuing the conversation by highlighting and responding to this story.

Automated Reasoning for SysML v2

Jamie Smith — Mon, 07 Aug 2023 03:55:34 GMT

Version 2 of SysML represents a significant advance over version 1 and is aimed to address numerous shortcomings and limitations of applicability to many types of modern systems engineering challenges (see this for example). Unlike v1, which was based on UML, v2 is based on KerML, a modeling language with formal semantics. It is this foundation of v2 that paves way to applying the latest advances in automated reasoning to empower SysML v2 users with rigorous analysis (including formal verification) of their models, AI-powered testing, integration with LLMs (e.g. ChatGPT) and much more.

I am excited to share some of the work we have been doing over the last several weeks on integrating our reasoning engine, Imandra, with the SysML v2 pilot implementation. Our initial focus is on creating an automated translation process between SysML v2 and Imandra Modeling Language (IML), and then using Imandra to reason about the resulting IML models.

In this article, we’ll cover an example of a traffic light model and verify some of its properties:

1. The behavior is deterministic, regardless of state or parameter values.

2. Errors are handled properly, irrespective of state or parameter values.

3. The light only switches to green when it is safe (all other lights in the system are red), irrespective of state or parameter values.

SysML v2 will empower the next step in MBSE evolution

Increasing system complexity is driving a fundamental shift from document-based systems engineering to model-based systems engineering (MBSE) and sophisticated model-based systems engineering tools. While MBSE has its roots in the aerospace and defense industries, we see MBSE gaining popularity across all sectors. Systems Modeling Language or SysML has been closely linked with MBSE since the first version of the standard was published by the Object Management Group (OMG) in 2006. SysML provides a standard graphical format to model complex systems’ requirements, architecture, and design and has been adopted by most systems design tools. SysML has helped teams share and communicate their designs with collaborators and customers during the design process.

The OMG is taking MBSE to the next stage in its evolution. At the Q2 2023 OMG members’ meeting, the SysML steering committee shared their plans for SysML v2 during a hybrid webinar hosted in Orlando, FL. Check out their video for a great overview.

The team, comprised of representatives from over 50 organizations, has leveraged 15 years of learning to define SysML v2. SysML v2 will enable increased precision, usability, and interoperability and allow the next generation of systems engineers and the next generation of systems.

While SysML v1 was based on UML, SysML v2 was built on Kernel Modeling Language (KerML). KerML is designed to be extensible and with formal semantics. Formal semantics brings mathematical precision to enable rigorous analysis and verification. We have begun leveraging the formalism of SysML v2 to analyze designs with Imandra. With Imandra, we can reason about the design to determine whether it will meet the desired requirements. We can also mathematically prove that its behavior is correct and safe by using formal verification techniques. To illustrate what we can accomplish, let’s review a system design we recently created for a simple traffic light.

We created this traffic light design with the SysML v2 pilot implementation in a Jupyter Labs development environment. The SysML v2 pilot implementation is freely available and can be found here. https://github.com/Systems-Modeling/SysML-v2-Pilot-Implementation

The SysML v2 Traffic Light Model

The SysML v2 traffic light model is based on standard green, yellow, and red traffic lights used in the United States. We chose the traffic light because it is complicated enough to demonstrate some key capabilities of both SysML v2 and Imandra while not being so complex that it would be difficult to understand. The complete SysML v2 model is available at the end of this blog post in Appendix A.

The model is contained in one package and includes 7 parts, 5 ports, 2 interfaces, 16 actions, and 12 states. Review the SysML v2 overview video linked above if these terms are new to you. We have also created an equivalent version of this model in IML to allow us to analyze the system’s behavior and verify properties.

We use Imandra to verify the behavior of the Traffic Light and uncover errors in the states and state transitions. Below is the SysML v2 state diagram, which is a graphical representation of the behavior of the traffic light. Let’s walk through it.

The traffic light starts in the “off” state, and once it receives power, it will enter the “init” state. During the “init” state, a power-up test is run, and if the outcome of that test returns an error; the traffic light will enter “diagnostic,” where additional tests will be run, and the error status will be communicated wirelessly to operators. Then the traffic light advances to “red_blinking” and operates as a stop sign until a technician can correct the error. If the power test returns no error, the light is ready to enter one of its three operational modes: “traffic_flow,” “caution,” or “stop_sign.” A collection of states, actions, and transitions dictates the behavior of the light.

Here are some examples of the SysML v2 for states and their equivalents in IML. Each state has a name and perhaps a series of actions. The actions are classified as entry, then, do, or exit. When the system enters the state, you perform the “entry action,” followed by the “then actions” and then the “do action.” The “do action” is executed until the state is exited. When the system exits a state, it performs the “exit action.”

The code snippets below show that both the SysML v2 and the IML contain the same state names and corresponding actions.

SysML v2 states

https://medium.com/media/8ff8e84b1e5fe8b35e2b8e9f891721ec/href

IML equivalent states

https://medium.com/media/ec532687c0e51b31f0d85d919db63ab2/href

We have created equivalent data types of all the elements in SysML v2 in IML. Let’s take a look at transitions.

SysML v2 Transition Usages:

https://medium.com/media/d394b79ecead8c1e741b94325277a754/href

IML equivalent transitions:

https://medium.com/media/b34e592ae742a75da7a5e528ce42a152/href

Each transition has an initial state and a subsequent state, along with a Boolean expression to determine if the transition is valid. For example, look at the transition from “init_to_operation_selection.”

first state = “init”

Boolean expression “error = false”

Next state = “operation_selection”

So, if we are in the “init” state and the error is false, we move to the “operation_selection” state.

Finally, look at the actions in both SysML v2 and IML syntax.

SysML v2 actions

https://medium.com/media/ff7fd827b24a1d8c62345df685eff441/href

IML equivalent actions

https://medium.com/media/5c36245026bd614f29f4b64ba8bf5047/href

These action functions, include, but are not limited to, turning the lights on and off, managing errors, and reading sensors and clocks.

Verification of the Traffic Light

As I mentioned above, we used Imandra to verify three properties of the traffic light.

1. The behavior is deterministic, regardless of state or parameter values.

2. Errors are handled properly, irrespective of state or parameter values.

3. The light only switches to green when it is safe (all other lights in the system are red), regardless of state or parameter values.

We will take a deeper look at verification goal number two, verifying that errors are correctly handled. You can explore the notebook to learn more about the other two verification goals.

The collection of the state, actions, and transition functions dictate the behavior of the traffic light. We used the IML data types for each of those, along with attributes, to explore and verify the behavior of the traffic light. In practice, we created a data type that included the SysML_states AND the relevant attributes.

The attributes are listed here:

https://medium.com/media/f67b37cfc1632f2a95e87bd40fa66cf5/href

We combined those attributes with the SysML_state. The SysML_states are the 12 states shown in the state diagram above: off, init, diagnostic, operation_selection, stop_sign, caution, traffic_flow, red_blinking, yellow_blinking, green_on, yellow_on, and red_on. Into a single type called `sysml_state_attributes’:

https://medium.com/media/36d3efe8dd9d95ca9b4036bdd42081d8/href

In our IML model, we declared a variable called ‘sa’ of type sysml_state_attributes to represent the larger state of the traffic light. We use this data type along with the transition functions and actions to determine the behavior of the traffic light. We define a function called “my_step” that, when called, does five things.

1. Increments the system clock by 1 second.

2. Runs any “do action” for the current SysML_state

3. Checks the transition functions to see if the SysML_state should change

4. If the SysML_state changes (a. it runs the “exit action” of the current SysML_state, b. it runs the “entry action,” “do action,” and “then actions” of the new SysML_state)

5. Returns a variable of type “sysml_state_attributes.”

Here is “my_step” written in IML.

https://medium.com/media/00185ad224500f3dc709cf5d0d1a6f1b/href

We use the function “my_step” and the variable “sa” along with Imandra Automated Reasoning to verify the behavior of the traffic light. You may have noticed another parameter called “ex_m:external_messages.”

https://medium.com/media/5b3f259fd51413e5230aa1cdda98591a/href

External_messages include the incoming control messages and health information about the system. You can think of this as the information external to the traffic light controller that must be received, measured, and evaluated to determine SysML_state transitions.

Imandra has an internal function called “verify” that evaluates a Boolean function and either returns Proved.

Confirming that the Boolean function is always true for all variables and parameters or it returns Refuted.

If “verify” returns refuted, it includes a counter-examples that provides a set of parameters resulting in the Boolean function being false. This is incredibly powerful and saves hours or days of debugging and attempting to find the conditions that caused your verification goal to fail.

For our traffic light example, we designed our system such that if an error occurs, the system will transition SysML_states moving SysML_state to SysML_state until it reaches “diagnostic” and then “red_blinking.” We are going to define a verification goal to prove that the system exhibits this behavior. To help us, we defined a couple of “helper functions” to create this verification goal.

https://medium.com/media/bc80139ababa02786271d4c846d94bd2/href

Get_sysml_state returns the SysML_state from “sa.”

State_changed takes a variable “sa” and “ex_m” and determines if the SysML_state will change if “my_step” is called. The <> symbol is not equal in Ocaml and IML.

So, we use these functions to create our verification goal. Verify that if Error_state, then state_changed is always true, meaning the SysML_state always changes.

https://medium.com/media/92a46b59d9f4d01f44616a06233f935f/href

The verify function evaluates all parameters and attributes across all the SysML_states algebraically, NOT statistically, or through random sampling. If verify returns Proved, we know the Boolean function is true for all parameters and attributes.

Unfortunately, when we called this verify function, it returned Refuted with a counter-example. The counter-example pointed out that if the system is in the “off” SysML_state and with the power off, it will not change SysML_states. Of course, that makes perfect sense, so we modified our verify function to ensure we only consider when the power is on.

https://medium.com/media/cf4b21debd3562a486d94a5612deecd6/href

So now we have two constraints: Power is on & we are in an Error_state. When we ran this function, we again received Refuted. When we dig into the counter-example, we see that the SysML_state = “Yellow_on” AND the clock has stopped.

We can review the behavior of the “Yellow_on” SysML state. The state “Yellow_on” only happens after “Green_on” and the system stays in the “Yellow_on” state for 2 seconds before transitioning to “Red_on.” In the case where the clock has stopped, the system will never transition from “Yellow_on” because, without a clock, the controller will never register 2 seconds elapsing. We also notice that the system never checks the status of the system, including checking for errors. To correct this issue, we must add a “do action,” like most other states, to check for errors explicitly. Below you can see where we added the “do action” for the “Yellow_on” SysML_state along with a comment.

https://medium.com/media/1388392e0e2e4defc453397682daa27f/href

By adding the “do action,” the “yellow_on_to_red_on” transition function will evaluate to true if an error is detected OR if 2 seconds elapse.

https://medium.com/media/44bbd9f38522e8c3f35a06743b4d286e/href

After identifying this issue with the design, and correcting it, we re-ran the verify function, and it returned Proved.

This process helped us identify two significant issues with the design that could easily have existed in the production code and appeared intermittently, where some traffic lights have a perpetually “yellow_on” light. Not only would this be costly to maintain in the field, but it also would be a significant safety concern, as perpetual “yellow_on” is not compliant with US traffic laws. Motorists would handle this case in unpredictable ways.

Conclusion

I am excited about our progress, but we are just getting started. Today you can use the approach outlined in the blog post and verify your own SysML v2 models, but we are working to make that process much easier. We are building tooling to use the SysML v2 API to export SysML v2 models as JSON and automatically convert them to IML. We also plan to leverage our integration of the Large Language Models to enable a natural language interface for creating verification goals and synthesizing SysML v2 that is correct. To discuss this work, please get in touch with me at Jamie@imandra.ai, or look for me at the next OMG members meeting.

Appendix A: Traffic Light SysML v2

package Traffic_light {
    
    import ISQ::*;
    import ISQBase::*;
    import Time::*;
    import ScalarValues::*;
    import StringFunctions::*;
    import NumericalFunctions::*;
    import ScalarValues::*;
    import SI::*;
    
    port def Can_bus;
    port def Clock;
    port def Vehicle_sensor_port;
    port def Comm;
    
    enum def Light_state {
                lit;
                off;
                blinking;
                }
                      
    calc def Light_power { in light_state : Light_state;
           return : PowerValue = 
                     if light_state == Light_state::off ? 0 [W]
                        else if light_state == Light_state::lit ? 12 [W]
                        else if light_state == Light_state::blinking ? 8 [W]
                        else 0 [W];
           }
           
    calc def Radio_power { in radio_on : Boolean;
           return : PowerValue = 
                     if radio_on ? 4 [W]
                     else 0 [W];
           }
    calc light_power : Light_power;
    calc radio_power : Radio_power;
    
    
    constraint def MassConstraint {
        in partMasses : MassValue[0..*];
        in massLimit : MassValue;
            sum(partMasses) <= massLimit
            }
    assert constraint massConstraint : MassConstraint {
            in partMasses = (light_housing.mass, traffic_light_controller.mass, traffic_light_counter.mass, traffic_light_radio.mass);
            in massLimit = 25.0 [kg];
        }  
            
    constraint def PowerConstraint {
        in partPowers : PowerValue[0..*];
        in powerLimit : PowerValue;
            sum(partPowers) <= powerLimit
            }

    assert constraint powerConstraint : PowerConstraint {
            in partPower = (light_housing.light_inserts.green_light.power_usage,
                            light_housing.light_inserts.yellow_light.power_usage,
                            light_housing.light_inserts.red_light.power_usage,
                            light_housing.light_electronic_housing.power_usage,
                            traffic_light_controller.power_usage,
                            traffic_light_counter.power_usage,
                            traffic_light_radio.power_usage);                                       
            in powerLimit = 19.0 [W];
        }

    interface def vehicle_sensor_signal {
            end vehicle_sensor_out : Vehicle_sensor_port;
            end vehicile_sensor_in : ~Vehicle_sensor_port;
            }
    
    interface def clock_bus {
            end clock_out : Clock;
            end clock_in : ~Clock;
            }
    interface def comm_bus {
            end communications_out : Comm;
            end communications_in : ~Comm;
            }
    interface def light_control_bus {
            end light_control_out : Can_bus;
            end light_control_in : ~Can_bus;
            }  
            
    item def Power;
    item def state_message_item {
        attribute state_message : String;
    }
    port power_on;
    
    part def Light_housing {
        attribute mass : MassValue;
        part light_inserts [3]{
            attribute mass : MassValue;
        }
    }
    part def Light_electronics_housing {
        attribute mass : MassValue;
        attribute power_usage : PowerValue;
    }
    part def light {
        attribute color;
        attribute light_state : Light_state;
        attribute location : Integer;
        attribute mass : MassValue;
        attribute power_usage : PowerValue;
        port light_control_in : Can_bus;
        } 
    part light_housing {   
        attribute mass :>> mass = 3.0 [kg];
        part light_electronic_housing : Light_electronics_housing {
            attribute mass :>> mass = 3.0 [kg];
            attribute power_usage :>> power_usage = 0.2 [W];
        }
        
    part def light_insert {
        attribute mass : MassValue;
        part red_light : light;
        part yellow_light : light;
        part green_light : light;
    }
        part light_inserts [3] : light_insert {
            attribute mass :>> mass = 3.0 [kg];
            part red_light :>> red_light {
                attribute red :>> color;
                attribute location :>> location =  0;
                attribute :>> light_state = Light_state::off;
                attribute mass :>> mass = 4.0 [kg];
                attribute power_usage :>> power_usage = light_power(light_housing.light_inserts.red_light.light_state);


            }
            part yellow_light :>> yellow_light {
                attribute yellow :> color;
                attribute location :>> location =  1;
                attribute :>> light_state = Light_state::off;
                attribute mass :>> mass = 4.0 [kg];
                attribute power_usage :>> power_usage = light_power(light_housing.light_inserts.yellow_light.light_state);

            }
            part green_light :>> green_light {
                attribute green :> color;
                attribute location :>> location =  2;
                attribute :>> light_state = Light_state::off;
                attribute mass :>> mass = 4.0 [kg];
                attribute power_usage :>> power_usage = light_power(light_housing.light_inserts.green_light.light_state);

            }

        }
    }
    part def Controller {
        attribute power : Boolean;
        attribute error : Boolean;
        attribute error_code : Integer;
        attribute message_queue : String;
        attribute counter : Integer;
        attribute current_time : TimeInstantValue;
        attribute state_duration : DurationValue;
        attribute wait_duration : Natural;
        attribute yellow_on_time : Natural;
        attribute green_on_time : Natural;
        attribute red_on_time : Natural;
        attribute yellow_light_state : Light_state;
        attribute red_light_state : Light_state;
        attribute green_light_state : Light_state;
        port light_control_out : Can_bus;
        port communications_in : Comm;
        attribute mass : MassValue;
        attribute power_usage : PowerValue;
        
    }
    part def Vehicle_sensor{
        attribute vehicle_at_light : Boolean;
    }
    part def Counter {
        attribute register : Integer;
        attribute mass : MassValue;
        attribute power_usage : PowerValue;
    }
    part def Radio {
        attribute register : Integer;
        port communications_out : Comm;
        attribute trans : Boolean;
        attribute mass : MassValue;
        attribute power_usage : PowerValue;
    }
    part vehicle_sensor : Vehicle_sensor {
        port vehicle_sensor_out : Vehicle_sensor_port;
    }
    part traffic_light_controller : Controller{
            attribute yellow_on_time :>> yellow_on_time =  2 [s];
            attribute green_on_time :>> green_on_time =  30 [s];
            attribute red_on_time :>> red_on_time =  90 [s];
            attribute mass :>> mass = 0.8 [kg];
            attribute power :>> power = 2.0 [W];

            action power_up_test;
            action reset_wait_clk{
                      assign wait_duration := 0 [s];
                      }
            
            
            action turn_lights_off {
                      assign traffic_light_controller.green_light_state := Light_state::off;
                      assign light_housing.light_inserts.green_light.light_state := Light_state::off;
                      assign traffic_light_controller.yellow_light_state := Light_state::off;
                      assign light_housing.light_inserts.yellow_light.light_state := Light_state::off;
                      assign traffic_light_controller.red_light_state := Light_state::off;
                      assign light_housing.light_inserts.red_light.light_state := Light_state::off;
                      assign traffic_light_radio.trans := false;
            }            

            action turn_yellow_on {
                      perform action turn_lights_off;
                      perform action reset_wait_clock;
                      assign traffic_light_controller.yellow_light_state := Light_state::lit;
                      assign light_housing.light_inserts.yellow_light.light_state := Light_state::lit;
                      }
            action turn_green_on {
                      perform action turn_lights_off;
                      perform action reset_wait_clock;
                      assign traffic_light_controller.green_light_state := Light_state::lit;
                      assign light_housing.light_inserts.green_light.light_state := Light_state::lit;
                      }
            action turn_red_on {
                      perform action turn_lights_off;
                      perform action reset_wait_clock;
                      assign traffic_light_controller.red_light_state := Light_state::lit;
                      assign light_housing.light_inserts.red_light.light_state := Light_state::lit;
                      }

            action reset_error;
            action reset_wait_clock;

            action run_diagnostic{
                action update_error_code;
                action update_error;
                action send_error_sms {
                        assign traffic_light_radio.trans := true;
                        }
                }
            action check_status{
                action check_error;
                action read_message_queue{
                    attribute message_from_queue : String;
                }
                }
            action blink_red_light {
                      perform action turn_lights_off;
                      assign traffic_light_controller.red_light_state := Light_state::blinking;
                      assign light_housing.light_inserts.red_light.light_state := Light_state::blinking;
                      }
            action blink_yellow_light {
                      assign traffic_light_controller.yellow_light_state := Light_state::blinking;
                      assign light_housing.light_inserts.yellow_light.light_state := Light_state::blinking;
                      }
        
            port clock_in : Clock;
            port vehicle_sensor_in :Vehicle_sensor_port;
            }
       part traffic_light_counter : Counter{
            port clock_out : Clock;
            attribute mass :>> mass = 0.4 [kg];
            attribute power_usage :>> power_usage = 0.4 [W];
         }
       part traffic_light_radio : Radio{
            attribute mass :>> mass = 0.6 [kg];
            attribute trans :>> trans = false;
            attribute power_usage :>> power_usage = radio_power(traffic_light_radio.trans); 
         }
   
state traffic_light{
            entry;
            succession begin first start
                then off;    
                
            transition off_to_init
                first off 
                accept Power via power_on
                then init; 
                
            transition init_to_diagnostic
                first init
                accept when (traffic_light_controller.error == true)
                then diagnostic;
                
            transition diagnostic_to_red_blinking
                first diagnostic
                then red_blinking;

            transition operation_selection_to_init
                first operation_selection
                accept when (traffic_light_controller.error == true)
                then init;

            transition init_to_operation_selection
                first init
                accept when (traffic_light_controller.error == false)
                then operation_selection;
            
            transition operation_selection_to_stop_sign
                first operation_selection
                accept when ((traffic_light_controller.message_queue == "operating_mode.stop_sign")
                        and (traffic_light_controller.error == false))
                then stop_sign;  

            transition operation_selection_to_caution
                first operation_selection
                accept when ((traffic_light_controller.message_queue == "operating_mode.caution") 
                        and (traffic_light_controller.error == false))
                then caution;
                
            transition operation_selection_to_traffic_flow
                first operation_selection
                accept when ((traffic_light_controller.message_queue == "operating_mode.traffic_flow") 
                        and (traffic_light_controller.error == false))
                then traffic_flow;
            
            transition caution_to_yellow_blinking
                first caution
                then yellow_blinking;
                
            transition stop_sign_to_red_blinking
                first stop_sign
                then red_blinking;
                
            transition traffic_flow_to_red_on
                first traffic_flow
                accept when (traffic_light_controller.error == false)
                then red_on;

            transition red_on_to_green_on
                first red_on
                accept when (traffic_light_controller.wait_duration > traffic_light_controller.red_on_time) 
                        and (vehicle_sensor.vehicle_at_light == true)
                        and (traffic_light_controller.message_queue == "Safe_all_red")
                        and (traffic_light_controller.error == false)
                then green_on;
                
            transition red_on_to_traffic_flow
                first red_on
                accept when ((((traffic_light_controller.message_queue != "")
                        and (traffic_light_controller.message_queue != "Safe_all_red"))
                        or (traffic_light_controller.error == true)))
                then traffic_flow; 

            transition green_on_to_yellow_on
                first green_on
                accept when  (traffic_light_controller.wait_duration > traffic_light_controller.green_on_time
                        or (traffic_light_controller.error == true))
                then yellow_on;

            transition yellow_on_to_red_on
                first yellow_on
                accept when (traffic_light_controller.wait_duration > traffic_light_controller.yellow_on_time)
                then red_on;

            transition traffic_flow_to_operation_selection
                first traffic_flow
                accept when ((traffic_light_controller.message_queue != "")
                        or (traffic_light_controller.error == true))
                then operation_selection;    
                
            transition red_blinking_to_operation_selection
                first red_blinking
                accept when ((traffic_light_controller.message_queue != "")
                        or (traffic_light_controller.error == true))
                then operation_selection;
                
            transition yellow_blinking_to_operation_selection
                first yellow_blinking
                accept when ((traffic_light_controller.message_queue != "")
                        or (traffic_light_controller.error == true))
                then operation_selection;

    }      
             state off;
                
             state init{
                    entry traffic_light_controller.power_up_test;
                    do traffic_light_controller.check_status.check_error;
                    exit;
                    }
                    
             state diagnostic {
                    entry traffic_light_controller.run_diagnostic;
                    exit traffic_light_controller.reset_error;
                    }
                
             state operation_selection{
                    entry traffic_light_controller.check_status.check_error;
                    do traffic_light_controller.check_status;
                    exit;
                    }
             state stop_sign;
             
             state red_blinking {
                    entry traffic_light_controller.blink_red_light;
                    do traffic_light_controller.check_status.check_error;
                    exit;
                    }
                
                
             state caution;
             
             state yellow_blinking {
                    entry traffic_light_controller.blink_yellow_light;
                    do traffic_light_controller.check_status; 
                    exit;
                    }
                
             state traffic_flow {
                    entry traffic_light_controller.check_status;
                    exit;
                }
             
             state green_on {
                    entry traffic_light_controller.turn_green_on; 
                    do traffic_light_controller.check_status;
                    exit;
                }
                
             state yellow_on {
                     entry traffic_light_controller.turn_yellow_on;
                     do traffic_light_controller.check_status.check_error;
                     exit;
                }
             
             state red_on {
                     entry traffic_light_controller.turn_red_on;
                     do traffic_light_controller.check_status;
                     exit;
                }
            
    interface : clock_bus connect
            traffic_light_counter.clock_out to
            traffic_light_controller.clock_in;
    interface : vehicle_sensor_signal connect
            vehicle_sensor.vehicle_sensor_out to
            traffic_light_controller.vehicle_sensor_in;     
    interface : light_control_bus connect
            traffic_light_controller.light_control_out to
            light_housing.light_inserts.red_light.light_control_in;
    interface : light_control_bus connect
            traffic_light_controller.light_control_out to
            light_housing.light_inserts.yellow_light.light_control_in;
    interface : light_control_bus connect
            traffic_light_controller.light_control_out to
            light_housing.light_inserts.green_light.light_control_in;

}

Automated Reasoning for SysML v2 was originally published in Imandra Inc. on Medium, where people are continuing the conversation by highlighting and responding to this story.

Analysing Machine Learning Models with Imandra

Lewis Hammond — Thu, 26 Sep 2019 21:44:23 GMT

The vast majority of work within formal methods (the area of computer science that reasons about hardware and software as mathematical objects in order to prove they have certain properties) has involved analysing models that are fully specified by the user. More and more, however, critical parts of algorithmic pipelines are constituted by models that are instead learnt from data using artificial intelligence (AI). The task of analysing these kinds of models presents fresh challenges for the formal methods community and has seen exciting progress in recent years. While scalability is still an important, open research problem — with state-of-the-art machine learning (ML) models often having millions of parameters —in this post we give an introduction to the paradigm by analysing two simple yet powerful learnt models using Imandra, a cloud-native automated reasoning engine bringing formal methods to the masses!

Introduction

Verifying properties of learnt models is a difficult task, but is becoming increasingly important in order to make sure that the AI systems using such models are safe, robust, and explainable. ML, and in particular deep learning, is a powerful technique that has seen unprecedented successes recently in a wide variety of tasks, often producing super-human performance. However, understanding the resulting models and forming guarantees of performance with respect to certain input or output conditions is often difficult. Imandra is a general-purpose automated reasoning engine offering a suite of tools that can be used to provide exactly these kinds of insights and guarantees for a wide range of algorithms.

Supervised learning is a subfield of ML that involves training a model using data made up of input-output pairs so that given a new, unseen input the model is able to produce an appropriate output prediction. In other words, the aim is to learn a model that approximates a function mapping inputs to outputs. To interact with Imandra we use the Imandra Modelling Language (IML), which is based on an augmented subset of the functional programming language OCaml, and can therefore reason about such functions in a particularly natural way.

To illustrate this approach we’ll be looking at examples from two of the most common tasks within supervised learning (and ML more generally): classification and regression. In particular, we’ll show how two of the most common kinds of model used to perform these tasks, random forests and neural networks, can be analysed using Imandra. For each task we’ll use a real-world benchmark dataset from the UCI Machine Learning Repository and create our models using Python with some standard ML libraries. You can find all of our code for both learning and analysing our models on GitHub, and there’s also a corresponding cloud-based Imandra Jupyter Notebook that you can try out for yourself.

Classification

In a classification task we want to learn to predict the label of a datapoint based on previous labelled data. In the classic Wisconsin Breast Cancer (Diagnostic) dataset the aim is to decide whether a cancer is benign or malignant based on the features of a sample of cell nuclei. In the dataset we have the following variables:

1. ID number
2. Diagnosis (malignant or benign)
3–32. Real values for the mean, standard error, and the ‘worst’ value for each cell nucleus’
 a) Radius
 b) Texture
 c) Perimeter
 d) Area
 e) Smoothness
 f) Compactness
 g) Concavity
 h) Concave points
 i) Symmetry 
 j) Fractal dimension

As is standard practice we pre-process the data before learning. First we standardise each variable to have zero mean and unit variance, then remove all but one from groups of variables that are highly correlated, along with those that have low mutual information with respect to the target variable. The data is split into training (80%) and test (20%) sets and we use Scikit-Learn to learn a random forest of 3 decision trees of maximum depth 3. As this is a relatively straightforward problem even this simple model achieves a fairly high accuracy. Using a short Python script each tree is then converted to IML and can be reasoned about using Imandra.

https://medium.com/media/d56a04285e65b4151b813f4d81ee4545/href

We can also create a custom input type in Imandra for our model, so that we can keep track of the different features of our data. However, remember that we processed our data before learning. To make things easier, we’ll add in a function applying these transformations to each input variable using some multiplicative and additive scaling values extracted during our data pre-processing stage. After that we can define a full end-to-end model which combines these pre-processing steps and the random forest.

https://medium.com/media/0f96c99dea1269a5c8c0282aa98463ef/href

As our IML model is fully executable we can query it, find counterexamples, prove properties, apply logical side-conditions, decompose its behaviour into regions, and more. As a quick sanity check to make sure everything is working, we can run a datum (which in this case happens to have the malignant label) from our dataset through the model as follows:

# rf_model {radius_mean = 17.99; 
            compactness_mean = 0.2776;
            concavity_mean = 0.3001;
            radius_se = 1.095; 
            compactness_worst = 0.6656; 
            concavity_worst = 0.7119;
            concave_points_worst = 0.7119};;

- : string = "malignant"

Looking good! We can also use Imandra to generate instances and counterexamples for us, potentially given logical side conditions that can be specified as functions outputting a Boolean value. If we do this naively then Imandra returns instances that, while correct, might be wildly different from any reasonable values we’d expect to see in real life. In general we often only care about the performance of our models when some reasonable bounds are placed on the input (for example, the mean radius can’t be negative, and if the values for this variable in our dataset range between 6.98 and 28.11 we wouldn’t really expect any value greater than, say, 35).

Using the description of each variable in the dataset information above we can form a condition describing valid and reasonable inputs to our model. In ML as a whole, we are typically only interested in the performance and quality of a model over some particular distribution of data, about which we often have prior beliefs.

https://medium.com/media/1310efaf6a93ab5155a3387d6786f6ef/href https://medium.com/media/07b41cb18a4e52f0dd6a62dec3f2fb75/href

This allows us to generate the following synthetic data point which our model would classify as benign. We can then access and compute with our instance x using the CX (counterexample) module and display the results in decimal (as opposed the standard fractional) notation using a custom pretty printer defined above:

# instance (fun x -> rf_model x = "benign" && is_valid_rf x);;

- : rf_input -> bool = 
Instance (after 0 steps, 0.021s):
  let (x : rf_input) =
    {radius_mean = (Real.mk_of_string
     "28061993138323/5000000000000");
     compactness_mean = (Real.mk_of_string
     "8487976201/5000000000000");
     concavity_mean = (Real.mk_of_string 
     "162320931909/2500000000000");
     radius_se = (Real.mk_of_string 
     "40971/20000");
     compactness_worst = (Real.mk_of_string 
     "321/5000");
     concavity_worst = (Real.mk_of_string
     "415811319441/25000000000000");
     concave_points_worst = (Real.mk_of_string
     "3877904791/781250000000")}
[✔] Instance found.
module CX : sig val x : rf_input end

# #install_printer pp_approx;;

# CX.x;;

- : rf_input = {radius_mean = 5.61239862766; 
                compactness_mean = 0.0016975952402;
                concavity_mean = 0.0649283727636; 
                radius_se = 2.04855;
                compactness_worst = 0.0642; 
                concavity_worst = 0.0166324527776;
                concave_points_worst = 0.00496371813248}

We can also reason about our model in more interesting ways, such as checking the validity of certain constraints we want our model to satisfy. For example, if the surface of a cell nucleus has many, large concave sections, this is a particularly negative sign indicating that the cancer is likely to be malignant. We can use Imandra to easily verify that our model always classifies a (valid) sample of highly concave cells as malignant:

# verify (fun x ->
          is_valid_rf x &&
          x.concavity_mean >=. 0.4 &&
          x.concavity_worst >=. 1.0 &&
          x.concave_points_worst >=. 0.25 ==>
          rf_model x = "malignant");;

- : rf_input -> bool = 
[✓] Theorem proved.

Excellent, Imandra has proved for us that this property always holds, for all possible input values. Although our model is correct here, by tweaking some of the numbers in the statement we want to verify it isn’t hard to come up with ostensibly reasonable properties that our model fails to satisfy (we’ll see an example of this in the next section). Not only will Imandra tell us when a verification command fails, it also provides a particular instance of an input that violates the property. The instance is then automatically loaded into the state space and thus can be further inspected and reasoned about. This is one of the ways that Imandra can be used to help diagnose problems with models and to give insights into their behaviour.

The final feature we’ll look at in this section is region decomposition, a technique that Imandra uses to break up a potentially infinite input space into discrete regions over which the behaviour of the decomposed function is constant (for an introduction to this feature, take a look at one of our earlier posts). The nested if ... then ... else statements in how the trees (which make up the random forest) are defined mean that they are a prime candidate for this functionality. As well as a logical description of the regions, Imandra also produces an interactive Voronoi diagram to help visualise an algorithm’s behaviour. Have a look on the notebook to see them in action and simply click on a region to see its description!

Voronoi diagrams for our decomposition of the random forest ensemble (left) and the first tree (right)

We can also use side conditions on the region decomposition of our model using the ~assuming: flag. One application here is in simulating partial observability. For example, perhaps we know most of the measurements for a particular set of cells and we’d like to see how the classification of the input depends on the remaining features. Let’s imagine that we only have the concavity measurements for a particular patient’s cell sample and we’d like to see how the output of our model depends on the values of the other features.

https://medium.com/media/0e09ce353cb066647a89d8393cb2d1d3/href

By specifying the known values in a side condition we can get an interactive visual tool (embedded in our Imandra Jupyter Notebook, or available via an automatically generated HTML link) that provides us with a finite set of possible outcomes and a logical, human-readable description of each one:

#install_printer Imandra_voronoi.Voronoi.print;;

Decompose.top 
  ~ctx_asm_simp:true 
  ~assuming:"partial_observation" 
  "rf_model" [@@program];;

9 regions computed.
- : Imandra_interactive.Decompose.t list = 
Open:
file:////var/folders/l9/d5spnx9177v53nkbpjl4f0yr0000gs/T/voronoi_ee639a.html

Regression

In a regression task we want to learn to predict the value(s) of some variable(s) based on previous data. In the commonly used Forest Fires dataset the aim is to predict the area burnt by forest fires, in the northeast region of Portugal, by using meteorological and other data. This is a fairly difficult task and while the neural network below doesn’t achieve state-of-the-art performance, it’s enough to demonstrate how we can analyse relatively simple models of this kind in Imandra. In the dataset we have the following variables:

1. X-axis spatial coordinate (within the Montesinho park map)
2. Y-axis spatial coordinate (within the Montesinho park map)
3. Month
4. Day
5. FFMC index (from the FWI system)
6. DMC index (from the FWI system)
7. DC index (from the FWI system)
8. ISI index (from the FWI system)
9. Temperature
10. Relative percentage humidity
11. Wind speed
12. Rainfall
13. Area of forest burnt

We again pre-process the data before learning by first transforming the month and day variables into a numerical value and applying a sine transformation (so that similar times are close in value), as well as removing outliers and applying an approximate logarithmic transformation to the area variable (as recommended in the dataset description). Each variable is scaled to lie between 0 and 1, and those with high correlations and/or low mutual information with respect to the target variable are removed. We then split the data into training (80%) and test (20%) sets and use Keras to learn a simple feed-forward neural network with one (6 neuron) hidden layer, ReLU activation functions, and stochastic gradient descent to optimise the mean squared error. After saving our model as a .h5 file we use a short Python script to extract the network into an IML file and reason about it using Imandra.

https://medium.com/media/bc97ca33aa4cd1a1f592549fb5cee33f/href

As in our first example we can also define some custom data types, functions that replicate our pre/post-processing of the data, and a condition that describes valid inputs to the model based on the dataset description (plus some reasonable assumptions about Portugal’s climate).

https://medium.com/media/ab78050f45683ec88e0e83153e90462e/href https://medium.com/media/f24543d35fb6b6a553c24e94c4a67b26/href

By generating instances using side conditions, such as the one below (where we require the output to be more than 20 hectares, the temperature to be 20 degrees celsius and the month to be May) we can again both query and compute with our model, reasoning about the results directly using the CX module. Note that here our previously installed pretty printer is converting values to decimal notation for us again:

# instance (fun x -> 
            nn_model x >. 20.0 && 
            x.temp = 20.0 && 
            x.month = May && 
            is_valid_nn x);;

- : nn_input -> bool = 
Instance (after 0 steps, 0.103s):
  let (x : nn_input) = 
    {month = May; 
     day = Mon; 
     dmc = 500.; 
     temp = 20.; 
     rh = 100.;
     rain = (Real.mk_of_string  
     "24634017759143723418947963885307992/
     1878541866480547748364634187934375")}
[✔] Instance found.
module CX : sig val x : nn_input end

# CX.x;;

- : nn_input = {month = May; 
                day = Mon; 
                dmc = 500.; 
                temp = 20.; 
                rh = 100.;
                rain = 13.1133717053}

# nn_model CX.x;;

- : real = 20.0272890315

The kinds of analysis we can perform here are not dissimilar to what we looked at above with our classification task, so without wanting to repeat ourselves too much, we’ll conclude this section by illustrating Imandra’s ability to prove a couple of useful properties about our network. We can start by verifying a condition that we’d hope any reasonable predictive model for this task would obey, namely that the area outputted by the model is never negative:

# verify (fun x -> is_valid_nn x ==> nn_model x >=. 0.0);;

- : nn_input -> bool = 
[✓] Theorem proved.

Great, so we know that whatever happens, as long as our neural network receives a valid input it will never malfunction by outputting a negative number. Although this is a simple example, it’s exactly these kind of guarantees that are needed for the safe deployment of such models in the real world, where a network’s output could form part of, say, a control sequence for operating a driverless car.

Finally, we’ll try something slightly more ambitious and test a hypothesis. All other things remaining equal, we would expect that the higher the temperature, the larger the area of forest that would be burnt. Due to the imperfections in our model (because of limited data, stochasticity in training, the complicated patterns present in natural physical phenomena, and so on) this assertion is in fact easily falsifiable by Imandra, as can be seen here:

# verify (fun a b -> 
          is_valid_nn a && 
          is_valid_nn b && 
          a.month = b.month && 
          a.day = b.day && 
          a.dmc = b.dmc && 
          a.rh = b.rh && 
          a.rain = b.rain && 
          a.temp >=. b.temp ==> 
          nn_model a >=. nn_model b);;

- : nn_input -> nn_input -> bool = 
Counterexample (after 0 steps, 0.116s):
  let (a : nn_input) =
    {month = Feb; 
     day = Wed;
     dmc = (Real.mk_of_string 
"2001302860578649047203412202395247295896300726349164954426900657936422485111743/26290894775182656579389311741592195964262695657047583625044304683319346328960");
     temp = (Real.mk_of_string "91160069288673997541144963139828470662342220288373496574482010785769071655/16907327829699457607324316232535174253545141901638317443758395294739129472");
     rh = (Real.mk_of_string "310987560530841206048248341173125577069685813208286530077844293829277108589275/5258178955036531315877862348318439192852539131409516725008860936663869265792");
     rain = (Real.mk_of_string "75815863864250199885180688130868702048660206232272093463832717880470859243/24896680658316909639573211876507761329794219372204151160079833980416047660")}
  let (b : nn_input) =
    {month = Feb; 
     day = Wed;
     dmc = (Real.mk_of_string "2001302860578649047203412202395247295896300726349164954426900657936422485111743/26290894775182656579389311741592195964262695657047583625044304683319346328960");
     temp = 0.;
     rh = (Real.mk_of_string "310987560530841206048248341173125577069685813208286530077844293829277108589275/5258178955036531315877862348318439192852539131409516725008860936663869265792");
     rain = (Real.mk_of_string "75815863864250199885180688130868702048660206232272093463832717880470859243/24896680658316909639573211876507761329794219372204151160079833980416047660")}
[✗] Conjecture refuted.
module CX : sig val a : nn_input val b : nn_input end

That’s quite a lot of information, but we can easily dive in and inspect the key parts of the counterexamples that Imandra has generated for us. In particular, we know that the inputs will be the same in all but the temperature feature, so let’s take a look at those values first and then run each counterexample through the model to show that it really does invalidate our hypothesis:

# CX.a.temp;;

- : real = 5.39174908104

# CX.b.temp;;

- : real = 0.

# nn_model CX.a;;

- : real = 1.47805400176

# nn_model CX.b;;

- : real = 1.79452234896

All is not lost, however! Although the network doesn’t satisfy our original verification statement we can restrict our setting in a sensible way in order to prove something slightly weaker:

There is very little data from winter months, and so the model is unlikely to generalise well here, hence we’ll only consider non-winter months
We’ll increase the tolerance in temperature to 10 degrees celsius
We’ll increase the tolerance in area burnt to 25 hectares

https://medium.com/media/947d19b6db05f715b3db2bf6be84f4e6/href

With these allowances in place, Imandra gives us back a final proof:

# verify (fun a b -> 
          is_valid_nn a && 
          is_valid_nn b && 
          a.month = b.month && 
          not (winter a.month) &&
          a.day = b.day && 
          a.dmc = b.dmc && 
          a.rh = b.rh && 
          a.rain = b.rain && 
          (a.temp -. 10.0) >=. b.temp ==> 
          (nn_model a +. 25.0) >=. nn_model b);;

- : nn_input -> nn_input -> bool = 
[✓] Theorem proved.

Conclusion

Reasoning about models that have been learnt from data is just one of many ways that formal methods can be brought to bear upon ML. While this problem is certainly a hard one — most state-of-the-art approaches rely on highly specialised software, techniques, and assumptions that Imandra doesn’t (yet!) support — it’s one we’re interested in pursuing further, particularly when it comes to scalability. However there are also other exciting avenues in this area that we’re currently exploring, such as our recent post showing how region decomposition can be used to improve sample efficiency in deep reinforcement learning, for example.

If you’re interested in our work be sure to check our other notebooks, find out more and get email updates on our website, join the discussion on our Discord channel, and of course subscribe to our Medium publication!

References

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete Problems in AI Safety. arXiv preprint arXiv:1606.06565. 2016.
Ian Goodfellow and Nicolas Papernot. The Challenge of Verification and Testing of Machine Learning. CleverHans blog post (Accessed: 21.08.19). 2017.
Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. Safety Verification of Deep Neural Networks. In International Conference on Computer Aided Verification, pp. 3–29. 2017.
Pushmeet Kohli, Krishnamurthy Dvijotham, Jonathan Uesato, and Sven Gowal. Identifying and Eliminating Bugs in Learned Predictive Models. DeepMind blog post (Accessed: 21.08.19). 2019.
Daniel Selsam, Percy Liang, and David L Dill. Developing Bug-Free Machine Learning Systems with Formal Mathematics. arXiv preprint arXiv:1706.08605. 2017.
Sanjit A Seshia, Dorsa Sadigh, and S Shankar Sastry. Towards Verified Artificial Intelligence. arXiv preprint arXiv:1606.08514. 2016.
John Törnblom and Simin Nadjm-Tehrani. Formal Verification of Random Forests in Safety-Critical Applications. In International Workshop on Formal Techniques for Safety-Critical Systems, pp. 55–71. 2018.

Analysing Machine Learning Models with Imandra was originally published in Imandra Inc. on Medium, where people are continuing the conversation by highlighting and responding to this story.

Imandra Inc. - Medium

Managing Complexity with Math and Logic: Changing Stripe Payment Flow with Claude and CodeLogician

The Process

The Starting Point: A Production Stripe Payment Flow

Step 1: Formalize in IML

Step 2: The Verifier Finds What Claude Can’t

Step 3: The Verifier Finds What Claude *Thinks* It Knows

Step 4: Adding New Features with Confidence

Step 5: 84 Regions, 84 Tests

What This Demonstrates

The Workflow, Revisited

Conclusion

Vibe Coding was phase 1. Logic-first AI is phase 2. It is here now.

AI Can Generate Code. You Still Have to Trust It.

From Token-Space “Reasoning” to State-Space Exploration

How the process is changing: the before and after

Before — The Human Is the Reasoning Engine

After — The Agent Has to Show Its Work

The Practical Effect

Demo: A Complex Payment Gateway with Claude Code

Step 1: Generate + Type-check

Step 2: Ask the System to Explore the Logic

Step 3: The Engine Found a Real Edge Case

Step 4: The Second Insight Was Even More Interesting

The Takeaway: This Is Phase 2 of Vibe Coding

First steps with ImandraX

Quick background on ImandraX

Prequisites

Some code to reason about

Proving some properties of reverse

Speeding up the proof using lemmas

Weaning off the ‘auto’ proof tactic

A more efficient list reversal method

Going further

Connecting Digital Twins and Systems

Modeling FIX Gateways as Digital Twins

Connecting Digital Twins to Systems

Checking Log Consistency Against Fields/Messages

Conditional Validations

Modeling Downstream Systems with Actions

Symbolic Reasoning About Systems and Logs

Conclusion

Validating FIX digital twins with automated reasoning

The need for digital twins

The problem: logical inconsistencies in IPL models

Understanding message flows in IPL

An example of a logical inconsistency

How our IPL Logic Validator tool solves this

(Coming soon!) Seamless integration with IPL VS Code Plugin

Conclusion

Automated Reasoning for SysML v2 Part 3

Using Imandra Automated Reasoning with SysML v2.

SysML v2 State Machines, States, and Transitions

Explaining the Traffic Light State Machine

State Machine Determinism

Region Decomposition

Constraining Region Decomposition and Semantic Diffs

Conclusion

An Introduction to Imandra Markets

Can AI and digital twins be used to solve problems with operational resiliency and unlock innovation in financial markets?

Industry challenges with operational resilience

Innovation headwinds

Roots of the problem

The alternative solution — a digital twin

Imandra Markets — a digital twin powered by Automated Reasoning

The takeaway

Automated Reasoning for SysML v2 Part 2

Formally Verifying SysML v2 Constraints with Imandra

Using Imandra Automated Reasoning with SysML v2

Constraints and Verification Goals

Mapping Traffic Light Constraints to Verification Goals

Formally Verifying a Power Consumption Constraint

Step 1. Define “Good States”

Step 2. Prove if you start in a “Good State” you remain in a “Good State” after one step of the state machine.

Step 3. Prove if you start in a “Good State” the system remains in a “Good State” after n steps of the state machine.

Step 4. Define an initial state and prove the system remains in a “Good State” after n steps in the state machine.

Step 5. Prove the power constraint is met for all “Good States”

Step 6. Prove the power constraint is always met.

Conclusion

Automated Reasoning for SysML v2

Step 3: The Verifier Finds What Claude Thinks It Knows