Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Fluent 4.0 (Silverstripe 4.0 version) #261

Closed
32 of 40 tasks
tractorcow opened this issue Jul 20, 2017 · 31 comments
Closed
32 of 40 tasks

RFC Fluent 4.0 (Silverstripe 4.0 version) #261

tractorcow opened this issue Jul 20, 2017 · 31 comments

Comments

@tractorcow
Copy link
Collaborator

tractorcow commented Jul 20, 2017

This RFC covers a new major version of Fluent with an updated data model, designed to address some of the weaknesses of the original fluent ORM.

Support

In order to reduce maintenance burden, this new version will be specific to framework 4.x, and will not be back-ported to framework 3.x.

History

This major version update should address the following issues:

ORM

The new ORM for fluent will move from column based localisation to table based localisations. In keeping with the initial principles of the original ORM, fluent can be safely added to any existing site, and likewise can be safely removed without any migration needed, as these extra tables will only contain extra information, and the base tables will contain all information for the default locale.

E.g. tables for SiteTree will look like the below. In this example SiteTree.translated config will be Title, Content

SiteTree / SiteTree_Live Unique index: ID

  • ID
  • ClassName
  • Title
  • Content
  • NonLocalisedField

SiteTree_Versions Unique index: RecordID, Version

  • ID
  • RecordID
  • Version
  • ClassName
  • Title
  • Content
  • NonLocalisedField

SiteTree_Localised / SiteTree_Localised_Live Unique index: RecordID, Locale

  • ID
  • RecordID
  • Locale
  • ClassName
  • Title
  • Content

SiteTree_Localised_Versions Unique index: RecordID, Locale, Version

  • ID
  • RecordID
  • Version
  • Locale
  • ClassName
  • Title
  • Content

Note that for SiteTree_Localised and SiteTree_Live the ID is not the record ID, but rather a unique index that's ignorable by the application. All joins will be on SiteTree.ID = SiteTree_Localised.RecordID AND SiteTree_Localised.Locale = 'the_locale'.

A second major change for this version will be the behaviour of coalescing of localisations over default locale. Instead of defaulting on a per-column basis, the join will switch on the joined SiteTree_Localised.ID column. If a row exists in the SiteTree_Localised on any given join then all localised fields will be selected from that table, even if null.

We may provide the option to default on a per-field or a per-row basis (e.g. check SiteTree_Localised.RecordID IS NULL on the join vs SiteTree_Localised.Title IS NULL).

Default values

Defaults can be configurable on a per-locale basis; E.g. if DE defaults to NL, we would join twice on SiteTree, SiteTree_Localised on Locale = DE and SiteTree_Localised on Locale = NL.

It will be possible to have locales which don't fail over to the main default locale. E.g. CN locale perhaps should not default to a western locale.

Instead, the query builder will have an injectable FluentDefaultProvider service, which given a class, field, and locale, will return the default value the query should return.

E.g.

class i18nDefaultProvider implements FluentDefaultProvider
{
	public function provideDefault($class, $field, $locale) {
		$default = i18n::get_locale();
		try {
			i18n::set_locale($locale);
			return _t($class.'.'.$field.'_DEFAULT', '');
		} finally {
			i18n::set_locale($default);
		}
	}
}

Locale management

UI components will remain much the same as in fluent 3.x, but with the move away from per-column localisation, we may include a "locale" config admin which will allow locales to be created via a gridfield. Locales will no longer be set via config.

Versioning

Initial implementation may not include support for historic versioning of localisations. (i.e. the SiteTree_Localised_Versions table), but if not this will be built at a later date.

Fluent will implement the VersionableExtension interface to work with versioned.

Existing features to port across

These features will be ported without much change:

  • Routing
  • Templates
  • SEO / sitemap.xml
  • frontend (css / js)

Other architecture

This new version of fluent will move from static class to a fully injectable service to allow for better dependency injection.

Tasks

  • ORM
    • Implement table building
    • Migrate augmentSQL()
    • Localise WHERE fragment
    • Migrate FluentFilteredExtension
    • Migrate FluentMenuExtension (if applicable)
    • Ensure DataQuery params are set properly
    • Migrate augmentWrite()
    • Default fallback / nullability behaviour
      • Implement per-locale/row default behaviour
      • Implement default fallback behaviour service
      • Implement cascading locale default (configurable via CMS)
    • Versioning support
      • Versioned tables are built
      • _Localised_Versioned records are populated when a version is built
      • All versioned reading models (archived, version, stage, etc) all work with localisation
      • Restore to an earlier version restores the correct version of localisations
    • Permissions
      • Allow locale-specific group permissions (similar to translatable)
  • CMS
    • Fluent ModelAdmin for domains / locales
    • Update fluent locale selector dropdown
    • Add "publish all locales" CMS action
    • Add "Publish just this locale" CMS action
    • Add fluent icon to localised fields
  • Routing
    • Convert FluentRequestFilter to HTTPMiddleware
    • Migrate Fluent::is_frontend()
    • Update Director.rules with fluent routes
    • Implement prefix-omitted option for default locale
    • Migrate Fluent.domains feature
    • Migrate Fluent::detect_browser_locale to pluggable home redirect service
    • Migrate FluentRootURLController
  • SEO
    • Migrate GoogleSiteMap integration
    • Migrate meta tags
  • Support full text search of localised content
  • Templates / Helpers
  • Unit tests
  • Documentation
@tractorcow tractorcow added this to the 4.0.0 milestone Jul 20, 2017
@tractorcow
Copy link
Collaborator Author

This RFC is a work in progress, so I'm going to open the floor to anyone who has suggestions on what they would like to see in a new version of fluent. :)

@robbieaverill
Copy link
Contributor

Related to #156, would you be open to the idea of addressing #238 with this change as well?


A second major change for this version will be the behaviour of coalescing of localisations over default locale. Instead of defaulting on a per-column basis, the join will switch on the joined SiteTree_Localised.ID column. If a row exists in the SiteTree_Localised on any given join then all localised fields will be selected from that table, even if null.

So we'd be leaving it to fluent to decide where to use null fields and where to use base locale values instead, rather than coalescing all the things in the DB query?

we may include a "locale" config admin which will allow locales to be created via a gridfield. Locales will no longer be set via config

I like this idea. It would be nice to support config as well, perhaps as a sort of "populate the DB with anything in config" method, but wouldn't be a high priority.


Can we include ClassName in the index specs? I know that this would be configurable later on by devs, but just trying to think of things that could be considered early on to help prevent performance issues when say a website with 10,000 pages across 20 locales ends up with a SiteTree_Localisations table with 200,000 rows in it and is being queried for e.g. "a Page record with ID 123 and locale ab_CD".

Of course, the versioned table would be worse, but less impact on users since it's not queried as much. We could consider including extra columns in the unique index for this table too ([Record]ID + ClassName + VersionID + Locale for example).

Again though - users could always tweak indexes if they need to later.

I assume the Locale column would actually be LocaleID as a foreign key to a FluentLocales table (or similar)?

@tractorcow
Copy link
Collaborator Author

So we'd be leaving it to fluent to decide where to use null fields and where to use base locale values instead, rather than coalescing all the things in the DB query?

I just discussed this; I think we need an injectable default provider to address this.

@tractorcow
Copy link
Collaborator Author

Can we include ClassName in the index specs? I know that this would be configurable later on by devs, but just trying to think of things that could be considered early on to help prevent performance issues when say a website with 10,000 pages across 20 locales ends up with a SiteTree_Localisations table with 200,000 rows in it and is being queried for e.g. "a Page record with ID 123 and locale ab_CD".

Anything that's already a non-PK index in the original table will automatically be an index in the localised tables too.

@tractorcow
Copy link
Collaborator Author

I assume the Locale column would actually be LocaleID as a foreign key to a FluentLocales table (or similar)?

I might leave it as a locale string, since it would support temporary locales (and restoration of such).

@robbieaverill
Copy link
Contributor

Defaults can be configurable on a per-locale basis; E.g. if DE defaults to NL, we would join twice on SiteTree, SiteTree_Localised on Locale = DE and SiteTree_Localised on Locale = NL.

Oh this is nice! Do we allow for recursion in the case where the default is also null?

@tractorcow
Copy link
Collaborator Author

That's precisely what the FluentDefaultProvider is for.

@mmitranic
Copy link

First of all, thanks for this awesome module. I gotta ask, what's the reason for moving from column-based localisations to table-based localisations? Wasn't this the main differentiating factor from other modules such as Translatable and Multilingual?

@phptek
Copy link
Collaborator

phptek commented Jul 23, 2017

I find myself curious in the same vain as @mmitranic. I realise of course that issues such as the infamous #145 are real problems. I very much liked the column-based approach and it is partly what drew me to the module - it was very simple to reason about on a table by table basis.

@tractorcow
Copy link
Collaborator Author

First of all, thanks for this awesome module. I gotta ask, what's the reason for moving from column-based localisations to table-based localisations? Wasn't this the main differentiating factor from other modules such as Translatable and Multilingual?

The issue with the approach taken by Translatable is that it localises data in multiple SiteTree records; This creates a segmented sitetree per locale, which contributes to a fragmented site structure.

Fluent addresses this issue by inlining all localisation into the same row. However, this seems to have several limitations which make this module impractical for large sites.

The approach I am looking at now is a lot harder to implement, which is still a "one record per sitetree" model, but with localisations shifted out to another table.

image

The cost to the proposed approach is that it will be a bit more code and hard-work to make it work, especially with versioned, but 4.x has improved versioning support, and we have all learned a lot since I first wrote fluent. :) What I found difficult then is probably possible now given what I have learned.

@madmatt
Copy link

madmatt commented Jul 27, 2017

Regarding default values being provided on a locale-by-locale basis, would that mean that something like this is possible:

en_CA -> en_US -> en_GB -> en_NZ

Where $Title will first be looked for in the en_CA locale, then fallback to en_US, then fallback to en_GB, then the default being en_NZ - and if en_NZ doesn't have it, then you use the FluentDefaultProvider as the last resort? Or is the idea to encode this type of multi-fallback logic in an implementation of FluentDefaultProvider?

Would you see this being on a per-field basis, or a per-locale basis (e.g. if en_CA doesn't provide a value for $Title but does for $Content, will it take the $Title value from en_US, and the $Content value from en_CA?)

@robbieaverill
Copy link
Contributor

Yeah it would be a cascading fallback system. I think by locale makes most sense, doing it by field would probably need to be done in config to avoid cluttering up the UI, and I'm not sure when it would be useful?

@madmatt
Copy link

madmatt commented Jul 27, 2017

As part of this it seems like it would be easy to change the frontend a little to show the difference between locales that exist and those that don't (assuming that a record is only created in the SiteTree_Localised table when the first piece of content is added to that locale).

It might be nice to show in the dropdown in the top-left of the CMS two lists - "Existing" and "Create New" - with the locales separated by whether they currently exist or not.

This would make it easier to tell whether content had already been translated or not for a given object.

@robbieaverill
Copy link
Contributor

It might be nice to show in the dropdown in the top-left of the CMS two lists - "Existing" and "Create New" - with the locales separated by whether they currently exist or not.

@tractorcow this might be a nicer way of implementing the current locale indicator and the locale in use message we spoke about earlier on Slack?

@madmatt
Copy link

madmatt commented Jul 27, 2017

Yeah it would be a cascading fallback system. I think by locale makes most sense, doing it by field would probably need to be done in config to avoid cluttering up the UI, and I'm not sure when it would be useful?

My thinking is that if you only want to change one small piece of content for a given locale (e.g. just change $Title in en_AU, but leave all other content the same as the fallback of en_US)... I wouldn't want to see a largely-blank page with just $Title showing because it was the only field defined in en_AU.

Might be difficult to do unless fields were nullable though - it'd make for some gnarly CASE statements hehe

@robbieaverill
Copy link
Contributor

Or is the idea to encode this type of multi-fallback logic in an implementation of FluentDefaultProvider?

I think that the default provider would be an injectable service that would provide a fallback recursively as required, probably configured dynamically. Each locale would be injected with an implementation of that interface and the top level called to get the value if the current DataObject doesn't have a value (so should inherit), then cascading down.

@robbieaverill
Copy link
Contributor

Yeah it would be a cascading fallback system. I think by locale makes most sense, doing it by field would probably need to be done in config to avoid cluttering up the UI, and I'm not sure when it would be useful?
My thinking is that if you only want to change one small piece of content for a given locale (e.g. just change $Title in en_AU, but leave all other content the same as the fallback of en_US)... I wouldn't want to see a largely-blank page with just $Title showing because it was the only field defined in en_AU.

The way it works currently is that it shows the defaulted value in the CMS with a visual indicator for whether it's a default value or is changed for the current locale.

@tractorcow
Copy link
Collaborator Author

Regarding default values being provided on a locale-by-locale basis, would that mean that something like this is possible:
en_CA -> en_US -> en_GB -> en_NZ

Yes.

Where $Title will first be looked for in the en_CA locale, then fallback to en_US, then fallback to en_GB, then the default being en_NZ - and if en_NZ doesn't have it, then you use the FluentDefaultProvider as the last resort?

Yes

Or is the idea to encode this type of multi-fallback logic in an implementation of FluentDefaultProvider?

No

Would you see this being on a per-field basis, or a per-locale basis (e.g. if en_CA doesn't provide a value for $Title but does for $Content, will it take the $Title value from en_US, and the $Content value from en_CA?)

Neither, per row. I've considered per-field, but it lacks the ability to safely determine if the field should fail over or not. Now that we have a table_Localised sub-table we can use the RecordID on the join to determine if a localisation exists or not for any locale-record combination.

Originally I'd tasked Implement per-field default behaviour, but I've removed this now for the time being in favour of the per-row default behaviour (with provider fallback).

@tractorcow
Copy link
Collaborator Author

tractorcow commented Jul 27, 2017

E.g. if your rule for falback is de > nl > en (default)

SELECT SiteTree.ID, SiteTree.ClassName, 
CASE
    WHEN SiteTree_Localised_de_DE.ID IS NOT NULL then SiteTree_Localised_de_DE.Title
    WHEN SiteTree_Localised_nl_NL.ID IS NOT NULL then SiteTree_Localised_nl_NL.Title
    ELSE SiteTree.Title END AS Title
FROM SiteTree
LEFT JOIN SiteTree_Localised AS SiteTree_Localised_de_DE ON SiteTree.ID = SiteTree_Localised_de_DE.RecordID AND SiteTree_Localised_de_DE.Locale = 'de_DE'
LEFT JOIN SiteTree_Localised AS SiteTree_Localised_nl_NL ON SiteTree.ID = SiteTree_Localised_nl_NL.RecordID AND SiteTree_Localised_nl_NL.Locale = 'nl_NL'

if your fallback is de > nl (not default) then the default provider would be used in place of the ELSE above.

The default provider would give us:

$titleFallback = Injector::inst()->get(FluentDefaultProvider::class)-> provideDefault(SiteTree.class, 'Title', 'de_DE');

@robbieaverill
Copy link
Contributor

Nice. Doing it in one query is much better than multiple, thanks for correcting.

@madmatt
Copy link

madmatt commented Jul 27, 2017

Right, gotcha. Could I register my interest for a hook into the SQL augmentation then? :D

Sounds like a great system though - let me know if/when I can help out!

@tractorcow
Copy link
Collaborator Author

tractorcow commented Jul 27, 2017

Right, gotcha. Could I register my interest for a hook into the SQL augmentation then? :D

You mean an extension point? I can sure. I could even make it a DI service if you want.

@madmatt
Copy link

madmatt commented Jul 30, 2017

Yep exactly - I'll likely need field-level translations for an upcoming project. I'm (perhaps naïvely) thinking I can amend the CASE statement to the below, and ensure that if a field is not translated, a NUL is inserted into the appropriate column:

CASE
    WHEN SiteTree_Localised_de_DE.Title IS NOT NULL then SiteTree_Localised_de_DE.Title
    WHEN SiteTree_Localised_nl_NL.Title IS NOT NULL then SiteTree_Localised_nl_NL.Title
    ELSE SiteTree.Title END AS Title

@tractorcow
Copy link
Collaborator Author

tractorcow commented Jul 31, 2017

The problem is that empty strings are sometimes converted to null (a config setting), so you can't easily distinguish between "not translated" and "empty".

Also some fields are empty by default, rather than null, so you'd have the opposite problem.

This was an issue with the original design of fluent, hence the shift in behaviour.

@madmatt
Copy link

madmatt commented Jul 31, 2017

So if we ensured that all values were NULL (meaning "no translation exists for this field"), then empty strings and actual strings could work as expected? This would also work for has_one relationships (e.g. if de_DE's ImageID value is NULL, then take it from en_NZ instead).

Just considering my options here - is it a new Config setting in SS4 that sets what happens with null vs. empty values?

I'm also presuming here that the frontend could be modified in such a way that indicating null or '' values can be obvious - even if it's a bit painful e.g. a checkbox-per-translatable-field that you can tick to 'Inherit default translation' which would ensure the value is NULL or something similar.

Anyway, this is getting a little off-topic - the concept and plan in general gets a big 👍 from me :)

@tractorcow
Copy link
Collaborator Author

I'm proposing that in this new version there's no more a concept of a per-field localised state; It will be the entire row based on a record in the translations table. You could do mixed localisation by copying content to this new row, or you could force fallback by deleting the entire row.

@tractorcow
Copy link
Collaborator Author

We could also have a "Publish all locales" or "Publish just this locale" as new CMS features. :)

@tractorcow
Copy link
Collaborator Author

@madmatt have a look at tractorcow@2ba71c4

@tractorcow
Copy link
Collaborator Author

A bit more work done, updated status on the above tasks. ;0

@tractorcow
Copy link
Collaborator Author

4.0.0-alpha1 is about to get tagged :)

@robbieaverill
Copy link
Contributor

Closing - anything that was missed from this RFC can be raised as features from here on if people need them =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants