Skip to content

implementation of a generic iterator interface, api changes#238

Closed
rsoika wants to merge 10 commits into
tdf:masterfrom
rsoika:master
Closed

implementation of a generic iterator interface, api changes#238
rsoika wants to merge 10 commits into
tdf:masterfrom
rsoika:master

Conversation

@rsoika

@rsoika rsoika commented Jun 24, 2023

Copy link
Copy Markdown
Collaborator

I changed the API as suggested here.

In addition the Navigation interface now implements a Iterator interface in a generic way.

public abstract class Navigation<T extends Selection> implements Iterator<T> {
...
}

This makes the code in my opinion much more readable and the Class cast is no longer needed:

	nav = new TextNavigation(phrase, doc);
	while (nav.hasNext()) {
              TextSelection currentSelection = search.next();
              OdfElement currentElement = search.getElement();
	    ....
	}

@svanteschubert

Copy link
Copy Markdown
Contributor

Hello Ralph,

Your work does really improve the former implementation a lot!

I find the prior existing search functionalty quite difficult to understand, while the underlying search scenario seems relativ easy (looking for strings in certain elements within the document), its API improvement seems quite complex to me and I might need more time and/or discussions!

Only looked briefly into it today and realized, I need to sleep over it and look over it during next week as it is not easy to decide. :-)

Some questions on the prior & new API you might speed up my review:

For instance,

  1. I am puzzled why the method next() is returning the current selection result and looking in TextNavigation the method public TextSelection next() is not searching for the next selection, but always returning the current.
  2. On the other hand the method hasNext() is always searching for the next selection. Is hasNext() doing a state change?

To make things even more complex, I would like to point out that this functionality was added by IBM Beijing around 2009. StarOffice team was dismissed by Oracle in 2010. And I started to work for Open-XChange in 2011 on ODFDOM to be used as the backend of a WebOffice only dispatching editor changes instead of exchanging complete documents.
Document changes require to point at a document, Adding IDs to all possible referenceable items would blow up the size of the document, therefore by doing "convention over configuration" we simply count the "user known items" (UKI) in the document that are being changed. Multiple XML elements might be a single "user known item" (e.g. table) and every character is a user known item. Styles are just metadata (subfeatures) on these UKI.

Long story short, instead of refactoring this selection (I guess I simply have not thought at that time), I created a new one in parallel in changes package:

Fun fact, basically the Hyperlink is just a Selection of Text with the property of an URL! :-)

It is not part of this API change as it has no change functionality, but in theory these both parts belong closer together.

Have a nice Sunday, Ralph!

PS: Some other functionality as tables have overlapping in the change package, too...

@rsoika

rsoika commented Jul 2, 2023

Copy link
Copy Markdown
Collaborator Author

Hi Svante,

yes I agree with you that the methods next() and hasNext() are currently implemented in an untypical way.
The problem was, initially only the method hasNext() exits. I don't wanted to change to much in the API so more or less I have simply replaced currentElement() with next(). But I am also not happy.

Currently you can't call next() several times in series because nothing will change.

Should we change this in a first next step

  • hasNext() only verifies if a selection exits
  • next() should do the logic like the current hasNext() implementation.

@svanteschubert

Copy link
Copy Markdown
Contributor

Yes, I think this would be a good idea to do as you suggest!
And of course it is always good not to change much API logic - unless it is necessary or it makes a lot of sense ;-)

@rsoika

rsoika commented Jul 3, 2023

Copy link
Copy Markdown
Collaborator Author

I have now adjusted the API accordingly. In the process, I have also improved some tests.

Now to me this looks much better. I hope you are happy with it too?

@svanteschubert

Copy link
Copy Markdown
Contributor

Hello Ralph,

I do not want to be too strict as the original donated code was never that hard reviewed, let's see if we the same understanding or maye I am overseeing something. ;-)

We have this two methods:

  • next()
  • hasNext()

If hasNext() might be called in the very beginning, it certainly have to call the similar underlying functionality similar to the next() function to search if there is some string to be found in the ODF. The found string should be cached, as if hasNext() is being called just again this cached string should be returned (instead of looking for a new one), right?

if afterwards next() is called, it would first check if there is a cached string from hasNext(), if this is the case, this one will be returned and set to null, if null from the beginning the same search function as called by hasNext() is being triggered.

That is basically what I wanted to write on Sunday, but I was without coffein :-)
Does it make sense? Do I oversee something?

PS: We might as well have a good old voice call (or VOIP) to quickly discuss this and some context - as I now looked a bit deeper into this functionality! :-)
I would be glad to have a remote tea break- available on 4th July frolm 8:30 to 15:45 and again after 16:15 - otherwise send me your time possibilites or confirmation to svante.schubert@gmail.com

@rsoika

rsoika commented Jul 3, 2023

Copy link
Copy Markdown
Collaborator Author

I agree totally with you. And I have tried my very best - except the caching idea.

@rsoika

rsoika commented Jul 4, 2023

Copy link
Copy Markdown
Collaborator Author

Hi @svanteschubert,
I now started to implement a new version with a cleaner usage of the Iterator interface as you suggested.
So I also stumbled now over this static SelectionManager. And I removed it now in my first approach completely.

I archived my first goal and the next() and hasNext() methods now behave exactly like defined in the Iterator javadoc including throwing a NoSuchElementException if no more elements exits and the user calls next() again.

So this looks now really much better!

But now: The Selection class have a lot of such pasteAtFrontOf, pasteAtEndOf, cut,... methods. In our today's call I understood you that there are already a 'change'-API to manipulate nodes. If so, than this could be a good moment to remove duplicate code?
My problem is, that the implementation of these methods in the search.Selection Class expect a lot of the old behaviour and this would now take a lot of time to tweak these methods in a way to work correctly on the new Iterator Interface.

So what should I do? Should I go through all the junit tests and tweak this Selection methods? Or do you have a maybe better API I could use here? If so maybe we got much better junit test out of this work...

@svanteschubert

Copy link
Copy Markdown
Contributor

Hi Ralph,

I might look into this earliest in the late afternoon, therefore I will answer now briefly:

Hi @svanteschubert, I now started to implement a new version with a cleaner usage of the Iterator interface as you suggested. So I also stumbled now over this static SelectionManager. And I removed it now in my first approach completely.

I archived my first goal and the next() and hasNext() methods now behave exactly like defined in the Iterator javadoc including throwing a NoSuchElementException if no more elements exits and the user calls next() again.

So this looks now really much better!

These are indeed good news!!
I think the approach to cut out SelectionManager is an approach that I would have tried as well...

But now: The Selection class have a lot of such pasteAtFrontOf, pasteAtEndOf, cut,... methods. In our today's call I understood you that there are already a 'change'-API to manipulate nodes. If so, than this could be a good moment to remove duplicate code? My problem is, that the implementation of these methods in the search.Selection Class expect a lot of the old behaviour and this would now take a lot of time to tweak these methods in a way to work correctly on the new Iterator Interface.

So what should I do? Should I go through all the junit tests and tweak this Selection methods? Or do you have a maybe better API I could use here? If so maybe we got much better junit test out of this work...

The scenario of the existing change code is that a list of ODF changes (delivered in JSON) is being applied to the ODF document. The change position is defined by an integer list, similar to an XPath counting the user entities (but not the XML).
And this is exactly the problem, here in the concept of user-entitiy positions is unknown and just the element with the text being handled (usually text:p and text:h).
We might return the start and end position of the found text and apply text changes from the existing API.
Let me dive deeper into it (hopefully) tomorrow afternoon.

Thanks so far for your help, Ralph. Very much appreciated!
Good night, Svante

@rsoika

rsoika commented Jul 5, 2023

Copy link
Copy Markdown
Collaborator Author

Hi, @svanteschubert
I am still working on the new implementation, but I committed my draft just to give you an impression what I try to archive.
I have written a new Junit Test method just testing the behaviour of the Iterator interface in different scenarios

See here:

https://github.com/rsoika/odftoolkit/blob/master/odfdom/src/test/java/org/odftoolkit/odfdom/incubator/search/TextNavigationTest.java#L171

Please do not peek to much into my code now - it is work in progress.....

So now, only the junit test that are manipulating the elements are now broken. It would be great if we could switch to your change API.
Can you give me an simple example how you replace part of a text in an element with your API?

@rsoika

rsoika commented Jul 5, 2023

Copy link
Copy Markdown
Collaborator Author

Hi, @svanteschubert once again I want to share my thoughts (but please don't feel pressured by me now and read it when you find time):

I finished the TextNavigation implementation. The class now implements the Iterator interface in a elegant way and it works like a charm. I am very happy with this. You can search for regex-patterns and navigate through the document.

For my own project this is all what I expect from the navigator. The Selection class - which is the object class returned by the navigator - provides the Container Element and even the position within the text where the pattern was found.

So far everything is super fine.

BUT: The Selection object also provides a lot of methods to cut and paste text before and after the selection. All these methods (and this is a very lot of code) depend on the very strange behaviour of the initial code - including this static SelectionManager which was only necessary for all these modify-methods - not for the navigation.

I - and this is my personal opinion - would not expect that a Navigator API provides me methods to modifiy the document. I am sure it is easy enough with the existing API to change the document content when iterating through the document. At least I have done this in our own project by simply modify the content of an Container Element returned by the Navigator. (I did not even notice that the TextSelection provides a method replaceWith(String) )

A lot of the junit tests dealing with the modification of the current selection, are currently broken. This is because these tests depend on the modify-methods provided by the TextSeleciton object and the SelectionManager which I have disabled for now.

I think it should be not to difficulty to re-implement these tests with the core functionality and not using the replace/paste/delete/... methods on the TextSelection class.
Of course I think the Junit Tests are all fine, because they demonstrate how to use the navigation API to modify the content of a document. And I understood you in such a way that you also don't think that the modifiy-methods in the serach-package are ideal?

So my question is: Do you think it is worth to re-implement the junit tests and later remove the modify-methods from the Navigator API?

@rsoika

rsoika commented Jul 6, 2023

Copy link
Copy Markdown
Collaborator Author

Hi, @svanteschubert
I finished now things as far as I was able.

The Iterator Interface is now correctly implemented for both Navigation implementations. I cleaned up the code as good as I was able to.

I inserted the SelectionManager now as a member variable for the TextNavigation and TextStyleNavigation. (No more static class). The SelectionManager is only needed for the modify methods as I mentioned before. But it is very important.

After diving deeper and deeper into the code, I understood that the modify methods are needed inside the Navigation for use cases in which you whant to replace parts of a paragraph including the correct formatting. Most of junit tests are dealing with style sheets during modification. The method TextSelectionTest.testReplacewithAndBold() is a good example of what the idea behind is. The test not only replaces a text, but also applys a style to each new replacement. Internally this means new span-tags are created. Of course you know all this much much better than I do. And it takes me now hours and hours to get through all this.

Anyway - now all Unit Test are fine - except the tests dealing with the methods pasteAtFrontOf and pasteAtEndOf. These tests are disabled because the broke! I tried my very best but I cant fix this as the complexity it to height for me.

The test TextSelectionTest.testPasteAtFrontOfFirst() demonstrates what is expected. It selects the crazy formatted text pattern 'change' and insert it before the text pattern 'delete':

image

As you can see, this test is working fine. So things are not totally bad. But if the parent element hat more occurrences of the text pattern (like in the test method refreshAfterPasteAtFrontOf), than something goes wrong. The root of the problem seems to be the Method refreshAfterPasteAtFrontOf of the SelectionManager.

I was not able to fix the problem and have now given up a bit frustrated. It would be great if you would still accept my changes so far and we could track the PasteAtFront and PasteAtEnd methods as a separate issue.

What do you think?

@svanteschubert

Copy link
Copy Markdown
Contributor

@rsoika Thank you so far, I am curious on what you changed and on the problems you mentioned.
I have no problem to split off a certain problem - it will be far better than earlier.
Is it okay to postpone it to next week?
As I had and have no capacities this week to do the review, but quite certain that I can do it in the first 3 days next working week.
Perhaps I will try to reuse the existing functionality to change content from the ODF Element base class, see https://github.com/tdf/odftoolkit/blob/master/odfdom/src/main/java/org/odftoolkit/odfdom/pkg/OdfElement.java

@rsoika

rsoika commented Jul 7, 2023

Copy link
Copy Markdown
Collaborator Author

@svanteschubert no problem. When you find time next week that's fine. We can also do a short call next week to synchronize and sort out confusions ;-)

@svanteschubert

Copy link
Copy Markdown
Contributor

Hi Ralph, I reviewed and found the issue, had to create a new PR with my added commits on top:
#240
We continue in #240

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants