Add foldlM'#543

noughtmare · 2023-10-30T21:04:22Z

I haven't added tests or benchmarks yet, because I'm not quite sure how to do it.

Open questions:

Can QuickCheck generate monadic functions?
And what would be a good benchmark?
Do we need a strict foldlM'?
Why are the Data.Text.Internal.Fusion.Common folds marked INLINE [0]?

For motivation, see this recent stackoverflow question. There the problem is to convert a text into a vector of characters, but that requires extraneous allocation if done naively with unfoldr Text.uncons. If a foldlM function were exposed like I implemented in this PR then you can write it efficiently like this:

import Data.Text as T
import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Unboxed.Mutable as VUM
import Test.Tasty.Bench

createFoldMethod :: Text -> VU.Vector Char
createFoldMethod t = VU.create $ do
  m <- VUM.new (T.length t)
  _ <- T.foldlM (\i x -> VUM.write m i x *> pure (i + 1)) 0 t
  pure m

main = defaultMain
  [ bench (show (10 ^ i) ++ " characters") $ whnf createFoldMethod $ T.replicate (10 ^ i) (T.pack "0")
  | i <- [1..5]
  ]

Benchmark results:

  10 characters:     OK
    33.1 ns ± 2.7 ns, 111 B  allocated,   0 B  copied, 6.0 MB peak memory
  100 characters:    OK
    228  ns ±  21 ns, 471 B  allocated,   0 B  copied, 6.0 MB peak memory
  1000 characters:   OK
    2.16 μs ± 169 ns, 3.9 KB allocated,   0 B  copied,  10 MB peak memory
  10000 characters:  OK
    21.0 μs ± 1.7 μs,  39 KB allocated,   2 B  copied,  10 MB peak memory
  100000 characters: OK
    210  μs ±  15 μs, 386 KB allocated,  21 B  copied,  11 MB peak memory

This matches the performance of an implementation I wrote that loops over the internal stream representation manually.

Bodigrim · 2023-10-30T21:15:02Z

As suggested in comments (was it @thomasjm asking?), the simplest solution is encodeUtf32LE + unsafeFromForeignPtr indeed. There is nothing particularly unsafe here.

But yes, foldlM makes sense to have. We already got spanM, for instance.

noughtmare · 2023-10-30T21:17:08Z

I tried that:

encodeMethod :: Text -> VS.Vector Char
encodeMethod t = 
  case encodeUtf32LE t of
    BS ptr len -> 
      runST $ VS.unsafeFreeze $ 
        VSM.unsafeFromForeignPtr0 (castForeignPtr ptr) (len `quot` 4)))

But it requires reaching into internal parts of the bytestring library* and more importantly it wasn't efficient:

  10 characters:     OK
    281  ns ±  24 ns, 3.3 KB allocated,   0 B  copied, 6.0 MB peak memory
  100 characters:    OK
    2.32 μs ± 181 ns,  31 KB allocated,   2 B  copied, 6.0 MB peak memory
  1000 characters:   OK
    22.3 μs ± 1.8 μs, 302 KB allocated,  20 B  copied, 6.0 MB peak memory
  10000 characters:  OK
    221  μs ±  11 μs, 3.0 MB allocated, 212 B  copied, 6.0 MB peak memory
  100000 characters: OK
    2.31 ms ± 169 μs,  30 MB allocated,  85 KB copied, 7.0 MB peak memory

* I guess perhaps I could use unsafeUseAsCStringLen instead somehow?

Bodigrim · 2023-10-30T21:21:41Z

src/Data/Text.hs


+-- | /O(n)/ A monadic version of 'foldl'.
+foldlM :: Monad m => (a -> Char -> m a) -> a -> Text -> m a
+foldlM f z t = S.foldlM f z (stream t)


Does this implementation match Data.Foldable.foldlM, especially with regards to short-circuiting and the order of effects? Would be nice to have a test, at least for m ~ Maybe and m ~ Either.

Bodigrim · 2023-10-30T21:28:51Z

But it requires reaching into internal parts of the bytestring library* and more importantly it wasn't efficient

Sigh. encodeUtf32LE should be implemented as a loop in ST instead of relying on streaming. Thanks for checking.

thomasjm · 2023-10-30T22:09:43Z

Whoa, thanks for doing this @noughtmare! Yes, that was me asking on StackOverflow :)

noughtmare · 2023-10-31T10:43:52Z

I just found out that we can define foldlM efficiently using just the public API of text:

foldlM k z t = T.foldr (\x go s -> k s x >>= go) pure t z

This optimizes to a tight loop.

Lysxia · 2024-03-13T01:20:14Z

I added some tests and rebased. To answer some of the questions:

Can QuickCheck generate monadic functions?

We could but we don't need to. Because of foldlM's polymorphism, we can simplify tests by specializing foldlM' to (1) only the state monad (2) only one special function argument and initial accumulator. (So the only remaining thing for quickcheck to generate is the text.)

Specializing foldlM to the state monad still determines its behavior in all monads. The idea is to use the state monad to log all the function calls made by foldlM, and parametric polymorphism guarantees that the behavior of foldlM in any other monad follows that log.

And what would be a good benchmark?

Idk about an actual benchmark, but I added an inspection test that the simplified Core is the same as foldl' in the Identity case.

Do we need a strict foldlM'?

What you defined is actually already foldlM' because there is a bang on the accumulator. I've renamed the function accordingly.

Given that the lazy foldl/foldlM are known to be footguns and not that useful, I'm inclined to not include foldlM in the first place. Only add the strict foldlM'.

I have no clue about the INLINE [0] annotation.

foldlM k z t = T.foldr (\x go s -> k s x >>= go) pure t z

Both the direct implementation and the foldr implementation look fine to me. I leave it up to you.

Although it's a one-liner with foldr, it's not obvious for newcomers. foldlM seems a quite accessible primitive so +1 for adding it.

Bodigrim · 2024-03-13T19:53:13Z

@Lysxia looks good to me, thanks for wrapping it up. Could you please fix the CI job with -fdeveloper?

Lysxia · 2024-03-13T22:23:02Z

Fixed by inspection-testing S.foldlM' instead of T.foldlM' (which is a wrapper around it, and with the +developer flag there's a HasCallstack constraint that confuses inspection-testing.)

tests/Tests/Properties/Folds.hs

Bodigrim · 2024-03-13T22:37:12Z

Thanks all!

Bodigrim reviewed Oct 30, 2023

View reviewed changes

Lysxia force-pushed the master branch from 6916046 to 75c2b84 Compare March 13, 2024 00:34

Jaro Reinders and others added 3 commits March 13, 2024 22:54

Add foldlM

8e08757

Rename foldlM to foldlM'

bd31efe

Add tests for foldlM'

6f8c4ca

Lysxia force-pushed the master branch from 75c2b84 to 6f8c4ca Compare March 13, 2024 22:03

Bodigrim approved these changes Mar 13, 2024

View reviewed changes

tests/Tests/Properties/Folds.hs Show resolved Hide resolved

Bodigrim merged commit 66fa38f into haskell:master Mar 13, 2024

Lysxia changed the title ~~Add foldlM~~ Add foldlM' Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add foldlM'#543

Add foldlM'#543
Bodigrim merged 3 commits intohaskell:masterfrom
noughtmare:master

noughtmare commented Oct 30, 2023 •

edited

Loading

Uh oh!

Bodigrim commented Oct 30, 2023

Uh oh!

noughtmare commented Oct 30, 2023 •

edited

Loading

Uh oh!

Bodigrim Oct 30, 2023

Uh oh!

Bodigrim commented Oct 30, 2023

Uh oh!

thomasjm commented Oct 30, 2023

Uh oh!

noughtmare commented Oct 31, 2023

Uh oh!

Lysxia commented Mar 13, 2024 •

edited

Loading

Uh oh!

Bodigrim commented Mar 13, 2024

Uh oh!

Lysxia commented Mar 13, 2024

Uh oh!

Uh oh!

Bodigrim commented Mar 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

noughtmare commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bodigrim commented Oct 30, 2023

Uh oh!

noughtmare commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bodigrim Oct 30, 2023

Choose a reason for hiding this comment

Uh oh!

Bodigrim commented Oct 30, 2023

Uh oh!

thomasjm commented Oct 30, 2023

Uh oh!

noughtmare commented Oct 31, 2023

Uh oh!

Lysxia commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bodigrim commented Mar 13, 2024

Uh oh!

Lysxia commented Mar 13, 2024

Uh oh!

Uh oh!

Bodigrim commented Mar 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

noughtmare commented Oct 30, 2023 •

edited

Loading

noughtmare commented Oct 30, 2023 •

edited

Loading

Lysxia commented Mar 13, 2024 •

edited

Loading