Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from com.google.refine to org.openrefine, for #226 #335

Open
wants to merge 3 commits into
base: 4.0-point-1
Choose a base branch
from

Conversation

wetneb
Copy link
Owner

@wetneb wetneb commented Sep 24, 2023

This is the second PR in an attempt to break my work on 4.0 into reviewable chunks.

This PR moves from the com.google.refine to the org.openrefine namespace, for OpenRefine#2269.

Copy link

@tfmorris tfmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a cursory review and transferred over some of the comments from my PR. My biggest questions/concerns:

  • mapping of legacy package/class names (it seemed broken to me)
  • loading of legacy projects which have hardcoded classnames serialized
  • being careful not to change legacy test data with over aggressive global replace

Comment on lines 35 to 43
import org.openrefine.extension.database.model.DatabaseColumn;
import org.openrefine.extension.database.model.DatabaseQueryInfo;
import org.openrefine.extension.database.model.DatabaseRow;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.google.refine.extension.database.model.DatabaseColumn;
import com.google.refine.extension.database.model.DatabaseQueryInfo;
import com.google.refine.extension.database.model.DatabaseRow;
import com.google.refine.importers.TabularImportingParserBase.TableDataReader;
import com.google.refine.importing.ImportingJob;
import org.openrefine.importers.TabularImportingParserBase.TableDataReader;
import org.openrefine.importing.ImportingJob;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import ordering seems rather ad hoc. I'd suggest, as I have in the past, agreeing on a target import ordering which can then be used as part of this refactoring.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy enforcing a certain import order, as I mentioned here: https://forum.openrefine.org/t/java-import-ordering/846
Do you know if our current linter can be configured for that, or if we should introduce a new one?
That being said, it's not clear to me at which point in the git history this linting should be done.

@@ -5,7 +5,7 @@ importPackage(org.openrefine.wikidata.commands);
* Function invoked to initialize the extension.
*/
function init() {
var RefineServlet = Packages.com.google.refine.RefineServlet;
var RefineServlet = Packages.org.openrefine.RefineServlet;
RefineServlet.registerClassMapping(
"org.openrefine.wikidata.operations.SaveWikibaseSchemaOperation$WikibaseSchemaChange",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this identity mapping? If it actually does something, an explanatory comment would be helpful, otherwise it should be removed.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly it's a workaround for a rather obscure class loading problem. I'd be happy to add a comment at this stage already but given that Change classes are dropped later on, it's a problem that gets solved at this point.

@@ -48,14 +48,16 @@ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.openrefine.ProjectManager;
import org.openrefine.RefineModel;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RefineModel seems to be from a separate refactoring. I can't find where the package namespace class remapping is done any more, but I'm guessing perhaps it moved there?

It didn't appear to work in my experimentation and I also discovered that there are hardcoded classnames (!!) embedded in the serialized workspace data which caused workspace corruption for me, so this is an area that I'd want to inspect very carefully to make sure we have a robust solution in place.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, RefineModel is introduced in the previous step (#334) to separate out some functionality (class loading, version information…) from the servlet. The idea being that OpenRefine's operations (and other core logic, such as facets, clustering, importers, exporters…) are exposed in an independent Maven module (or-workflow) which can be used outside of the context of a Butterfly module (which does the work of defining the web API to interact with this functionality).

Comment on lines 45 to 47
registerClassMapping("com.metaweb.refine.*", "org.openrefine.*");
registerClassMapping("com.google.gridworks.*", "org.openrefine.*");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can probably be removed.

registerClassMapping("com.google.gridworks.*", "com.google.refine.*");
registerClassMapping("com.metaweb.refine.*", "org.openrefine.*");
registerClassMapping("com.google.gridworks.*", "org.openrefine.*");
registerClassMapping("com.google.refine.*", "org.openrefine.*");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? In my version it couldn't read serialized JSON data with embedded com.google.refine classnames embedded in it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into the issue closely given that I gave up on backwards compatibility with previous workspaces (see #335 (comment)). It's very well possible that somehow this class mapping mechanism has been broken by something else in the past and we just did not notice because it's not actually relied on unless someone is upgrading from a 10-year old version.
I could look into fixing it but I am not sure it's worth the effort, I'd rather make sure we don't use class names in workspace serialization in the new architecture (which should not be hard to reach).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit which fixes this class name mapping functionality: ef5f47b


public class TopListTests {
@Test
public void serializeTopList() throws JsonParseException, JsonMappingException, IOException {
String json = "{"
+ "\"class\":\"com.google.refine.preference.TopList\","
+ "\"class\":\"org.openrefine.preference.TopList\","

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be left with the original name to test loading legacy data. For newly written projects we need to make sure they don't include hardcoded classnames.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated tests so that they point to an ostensibly old class name (com.google.gridworks) to make it clear that we're testing for backwards compatibility in ef5f47b

@wetneb
Copy link
Owner Author

wetneb commented Oct 11, 2023

Thanks for the review!

Indeed, I did not attempt to maintain compatibility with previous workspaces at all. Although that could indeed be done at this stage, I did not judge it being worth the effort given that the 4.0 version is meant to use a different workspace directory than the previous versions. That is because project serialization and architecture changes are too big to bridge in a reliable way.

I did implement later on an importer to read OpenRefine 3.x projects in 4.0, but it does so by discarding the history of the project (only importing its current state). By using different workspaces we are making sure that old projects don't get corrupted by the new version and can still be opened with the old one.

I agree that it's a bit unfortunate that class names are stored in the project serialization. The new architecture addresses this problem later on, when I removed the notion of Change entirely (whose functionality is served by Operation classes). But class names are still used in the preference store, that's something that could be worth cleaning up too. And one could want to import the preference store from the old workspace into the new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants