Suggestions for implementing a composition-based optimization (i.e. fractional portion of ingredients)

For starters, my experience with Ax is running the Loop tutorial once and reading through some of the documentation such as the parameter types (i.e. fairly new). Also, I have some familiarity with Bayesian optimization.

The actual use-case is slightly different and more complicated, but I think the following is a suitable toy example. I go over the problem statement, some setup code, and possible solutions. Would love to hear some feedback.

## Problem Statement
Take a composite material with the following `class: ingredient` combinations:
- [Filler](https://www.totalboat.com/2021/02/11/when-and-how-to-use-an-epoxy-filler/): Colloidal Silica (`filler_A`)
- Filler: Milled Glass Fiber (`filler_B`)
- [Resin](https://www.thomasnet.com/articles/plastics-rubber/types-of-resins/): Polyurethane (`resin_A`)
- Resin: Silicone (`resin_B`)
- Resin: Epoxy (`resin_C`)

Take some toy data of components and their fractional prevalences (various combinations of fillers and resins, and various numbers of components) along with their objective (training data), and some model which takes arbitrary input parameters and predicts the objective (strength) which we wish to maximize.

For constraints, I'm thinking:
- limit the total number of components in any given "formula" (e.g. max of 3 components)
- naturally, that the compositions sum to 1 (or that `abs(1-sum(composition)) <= tol`)
- there has to be at least one filler and at least one resin (if feasible)

## Setup Code
To make it more concrete, it might look like the following:
```python
choices = ["filler_A", "filler_B", "resin_A", "resin_B", "resin_C", "dummy"]

data = [
        [["filler_A", "filler_B", "resin_C"], [0.4, 0.4, 0.2]],
        [["filler_A", "resin_A", "resin_B"], [0.6, 0.2, 0.2]],
        [["filler_A", "filler_B", "resin_B"], [0.5, 0.3, 0.2]],
        [["filler_A", "resin_B", "dummy"], [0.5, 0.5, 0.0]],
        [["filler_B", "resin_C", "dummy"], [0.6, 0.4, 0.0]],
        [["filler_A", "filler_B", "resin_A"], [0.2, 0.2, 0.6]],
        [["filler_B", "resin_A", "resin_B"], [0.6, 0.2, 0.2]],
        ] # made-up data

def predict(objects, composition):
    ...
    return obj
```

## Possible Solutions
One-hot-like prevalence encoding and components/composition 
### One-hot-like prevalence encoding
I've thought about trying to do a sort of "one-hot encoding" (assuming I'm using this term correctly), such that each component gets its own composition as a variable:

| filler_A | filler_B | resin_A | resin_B | resin_C |
|----------|----------|---------|---------|---------|
| 0.4      | 0.4      | --      | --      | 0.2     |
| 0.6      | 0.0      | 0.2     | 0.2     | --      |
| 0.5      | 0.3      | --      | 0.2     | --      |
| 0.5      | --       | --      | 0.5     | --      |
| --       | 0.6      | --      | --      | 0.4     |
| 0.2      | 0.2      | 0.6     | --      | --      |
| --       | 0.6      | 0.2     | 0.2     | --      |

which I think would look like the following:

```python
best_parameters, values, experiment, model = optimize(
    parameters=[
        {
            "name": "filler_A",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "filler_B",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "resin_A",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "resin_B",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "resin_C",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
    ],
    experiment_name="composition_test",
    objective_name="strength",
    evaluation_function=predict,
    parameter_constraints=["abs(1 - (filler_A + filler_B + resin_A + resin_B + resin_C)) <= 1e-6", "filler_A + filler_B > 0", "resin_A + resin_B + resin_C > 0"], # not sure if I can use `abs` here
    total_trials=30,
)
```

However, this could easily lead to compositions where all of the components have a finite prevalence and can be problematic from an experimental perspective.

### components/composition
As I mentioned in the constraints, I've also thought about setting an upper limit to the number of components in a formula, which I think might look something like the following:

```python
best_parameters, values, experiment, model = optimize(
    parameters=[
        {
            "name": "object1",
            "type": "choice",
            "bounds": choices,
        },
        {
            "name": "object2",
            "type": "choice",
            "bounds": choices,
        },
        {
            "name": "object3",
            "type": "choice",
            "bounds": choices,
        },
        {
            "name": "composition1",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "composition2",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "composition3",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
    ],
    experiment_name="composition_test",
    objective_name="strength",
    evaluation_function=predict,
    parameter_constraints=["abs(1 - (composition1 + composition2 + composition3)) <= 1e-6"],
    total_trials=30,
)
```

How would you suggest implementing this use-case in Ax? If it would help, I'd be happy to flesh this out into a full MWE or try out any suggestions. The real use-case involves ~100 different components across 4 different classes, and the idea is to (eventually) use this in an experimental adaptive design scheme.

(tag @ramz-i who is the individual in charge of this project in our research group, post here if you have anything to add)

#706


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions for implementing a composition-based optimization (i.e. fractional portion of ingredients) #727

Problem Statement

Setup Code

Possible Solutions

One-hot-like prevalence encoding

components/composition

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

filler_A	filler_B	resin_A	resin_B	resin_C
0.4	0.4	--	--	0.2
0.6	0.0	0.2	0.2	--
0.5	0.3	--	0.2	--
0.5	--	--	0.5	--
--	0.6	--	--	0.4
0.2	0.2	0.6	--	--
--	0.6	0.2	0.2	--

filler_A	filler_B	resin_A	resin_B	resin_C
0.4	0.4	--	--	0.2
0.6	0.0	0.2	0.2	--
0.5	0.3	--	0.2	--
0.5	--	--	0.5	--
--	0.6	--	--	0.4
0.2	0.2	0.6	--	--
--	0.6	0.2	0.2	--

Suggestions for implementing a composition-based optimization (i.e. fractional portion of ingredients) #727

Description

Problem Statement

Setup Code

Possible Solutions

One-hot-like prevalence encoding

components/composition

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

filler_A	filler_B	resin_A	resin_B	resin_C
0.4	0.4	--	--	0.2
0.6	0.0	0.2	0.2	--
0.5	0.3	--	0.2	--
0.5	--	--	0.5	--
--	0.6	--	--	0.4
0.2	0.2	0.6	--	--
--	0.6	0.2	0.2	--