Implement script level cache#403

cosmo-kramer · 2016-06-16T01:46:59Z

Program flow is like:

Speed improvement is achieved by making processModule cache anything( scripts, predefs, imports including $ivy and $file,etc.) it processes at script level( at block level it already gets) and tries to retrieve from cache next time it is asked to process same code.

Process Module checks if the script is available in cacheFolder i.e. scriptCaches. If it is not found, processModule0 is called which keeps the rest of program flow same as before this diff but passes back importHooks and other imports accumulated in processCorrectScript function. This data is stored by classFilesListSave which takes list of pkgName concatenated with wrapperName and hash values of blocks along with imports and stores them.

If script is found in cache, classFilesListLoad reads the list of names and hash values of blocks and loads those blocks from their cache folders made by compileCacheLoad function. Loaded import hooks are resolved by resolveSingleImportHook. Along with retrieved imports this data is passed to eval.evalCacheClassFiles which loads each classFile and evals the main function thus executing the code.

The final imports are returned by processModule function whether evaled or loaded from cache, thus processModule is opaque from outside whether cache HIT or MISS, it behaves in exactly same way in return as well as side effects except cache save.

cosmo-kramer · 2016-06-18T18:34:37Z

  }

+
+  def runScriptLevelCache(path: Path, args: Seq[String] = Vector.empty,


starting point for running scripts
Tries to load from cache else calls runscript

lihaoyi · 2016-06-19T01:33:57Z

Great that you got tests passing!

Some high level feedback:

Would it be possible to move all the logic into the processModule function or some helper called by it? It seems you are duplicating a bunch of logic between runScriptLevelCache and cachedModule, as called through loadModule and load.module. Given that all these code paths end up going through processModule0 anyway, this would help keep the logic in one place.
If we did that, we could probably get rid of runScriptLevelCache entirely, since runScript will eventuall call processModule0 which will take care of the caching etc.. After all, the "first" script you run is no different from any other script you load.module; they all go through processModule0
We could probably get rid of the withCompiler flag; if all the script-caching logic is included as part of processModule0, check like https://github.com/lihaoyi/Ammonite/pull/403/files#diff-bcdb6a0282e047eba770bc309743e114R126 become un-necessary since that code calls processModule0, which should do the right thing by default

That's the high-level review; your code looks great. Leaving some other feedback in-line. Now that you've got this working and tests passing, let's iterate on this diff until it's great!

lihaoyi · 2016-06-19T01:34:17Z

          |@
          |println(wd relativeTo x)""".stripMargin
-      )
+        )


Try not to re-format irrelevant things as part of this diff.

Things like this, and this (adding indentation to the whole for-comprehension) may or may not be the "right" formatting, but they have no place in an already-very-large diff like this one. Given that this diff is already large and hard to review for correctness, you should aim to avoid all this sort of minor/irrelevant changes so we can focus on the script-level-caching. If we care enough, we can put these changes into a separate diff later.

@CoderAbhishek bump

lihaoyi · 2016-06-22T13:04:52Z

+                    pkgName: Seq[Name]
+                   ): Res[Imports] = if(scriptCaching) {
+    val cacheTag = "cache" + Util.md5Hash(Iterator(code.getBytes)).map("%02x".format(_)).mkString
+    storage.asInstanceOf[Storage.Folder].classFilesListLoad(pkgName.map(_.backticked).mkString("."), wrapperName.backticked, cacheTag) match {


We should need to asInstanceOf to check if something is a Storage.Folder; instead, Storage should has classfilesListLoad and classfilesListLoad as part of it's interface, and Storage.InMemory should just keep things in an in-memory Map or something when saved and read from that Map on load

lihaoyi · 2016-06-22T13:22:52Z

The unit tests aren't testing the right thing. Our goal isn't to get the compilationCount to zero, but instead it is to ensure that the compiler is never initialized.

Realistically, what you should do is:

Introduce a new compilerInitialized boolean on the Interpreter that starts as false and gets set to true when/every-time the compiler is initialized
Move the tests from integration tests into "normal" unit tests, in the repl/ project
Extract the body of runScript into runScriptInternal, with runScriptInternal private[ammonite] since it's only to be used for tests and not part of the public API
Make runScriptInternal return a Res[(Seq[ImportData], Boolean)] with the boolean coming from the `compilerInitialized
Make runScript call runScriptInternal, and discard the boolean.
Make your unit tests instantiate the REPL and call scripts through the normal Main() call, except calling runScriptInternal rather than runScript, and validating that the second time (?) the same script is run, the compilerInitialized: Boolean that gets returned is false

lihaoyi · 2016-06-22T13:24:54Z

+    // blockNumber keeps track of blockIndex
+    cachedData.foreach { d =>
+      for {
+        cls <- eval.loadClass(pkg + "." + wrapper + getBlockNumber, d._1)


What happens if loadClass or evalMain fail? You should probably use Res.map on this to propagate the Res[_] from each individual loadClass into a big Res[List[...]], and make evalCachedClassFiles return a Res[_] to represent the possibility of failure

cosmo-kramer force-pushed the master branch 7 times, most recently from 14b2ed5 to 52514bf Compare June 18, 2016 18:33

cosmo-kramer reviewed Jun 18, 2016
View reviewed changes

lihaoyi reviewed Jun 19, 2016
View reviewed changes

lihaoyi mentioned this pull request Jun 19, 2016

Rebase onto master cosmo-kramer/Ammonite#1

Closed

lihaoyi reviewed Jun 22, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement script level cache#403