Currently our NumPy workflow has a flaky test in test_creation_functions.py::test_linspace, which means sometimes we'll get an XFAIL and sometimes an XPASS. The random-ish nature of Hypothesis means this is often a possibility for bugs the test suite identifies. Therefore I think instead of xfailing tests for a workflow, we should skip them completely. We could also mix-and-match, but I think it's best to set a precedent that's simple for outsiders to understand and use.