New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DT] A couple of fixes discussed earlier with @apipkin and @lsmith #946

Merged
merged 8 commits into from Sep 26, 2013

Conversation

Projects
None yet
5 participants
@Satyam
Copy link
Contributor

Satyam commented Jul 1, 2013

I delayed this fix because I was much into row editing but I had the TODO for them pending so I put them both together here.

See the individual commit messages below.

Satyam added some commits Jul 1, 2013

Fixes getColumn when column is named (but no key)
Added a check for column name besides column key.
Also, added an optimization so that if there is a `host` configured (as
it is usually the case in actual usage, not testing), it can get the
value from `_columnMap` which is much faster.
Fixed an error in deep cloning column info.
The arguments to the search into the array of `known` objects were
reversed.  Cloning of objects with circular references could go forever.
Also, used the newly added `getColumn` method in `body.js` to complete
the functionality of `getColumn` so that columns can be fetched by cell
reference.
@Satyam

This comment has been minimized.

Copy link
Contributor

Satyam commented Jul 2, 2013

I am afraid the situation is quite worst than I thought and will need further study.
There are too many separate properties in the internal copy of the columns definitions that are generated and then used inconsistently in many places.

A column can have a name, a key given by the user. The key is the field name within the record and is often used as the main identifier. Columns that don't have an associated piece of data don't need a key and can have a name, such as headers for nested columns (that have a children property with further columns definitions, or calculated columns, those that have a formatter that produces the cell contents from who knows where.

Since most columns have associated data, when the name is not available, the key is used instead. Thus, the expression var key = col.name || col.key is found quite often.

However, when the columns definitions are cloned to the internal copy, two further properties are added: One is the id which, if not provided by the developer, it will be generated via Y.stamp. The other is _id which is generated internally from the name or the field or the key or the id in that order of preference, but then it is sanitized by replacing any whitespace by hyphens and adding an ordinal number if the resulting identifier is duplicate.

This last _id (the one with the leading underscore, which is different from the one without) is used as the index into the _columnMap hash, along with the key unless they are the same. A single column might have two entries into _columnMap if it has a name or id besides a key. It can also have two entries if it has not name or id but its key happens to have whitespaces.

Since this _columnMap hash is the fastest way to get a column by key, it is used quite often but it will fail miserably for columns with a name that contains whitespace since the index is the _id, not the name.

Also, in several places, the cell is located using the CSS className of the cell which is derived from the column _id, but it is done using the name or the key and it usually works, but may fail if the name has whitespace or, if there is not name, the key has whitespace.

Besides this, there is always the conflict that the name property of one column cannot match the key property of another column which would also cause conflict.

I'm still to find out what the id property is used for. Besides being used to generate the _id if there is not name or key, I haven't seen it used elsewhere.

So, there are too many ways to refer to a colun and they are not used consistently all across and, being poorly documented, later developers, as @apipkin or myself, can (and do) get caught. The _id which is so important (being the index into the _columnMap hash and used to locate the cells by className) cannot be easily found from the name or the key which is often passed as an argument to so many methods. Thus, if either has spaces, many functions will fail.

I haven't looked into all of it, I might still find out what is id (the one without underscode) used for (besides adding further confusion). The expression col.name || col.key is not always there and, even if it were, it might not be enough. Duplicate names, keys or ids amongst different columns might produce _id values with an ordinal added to it and there is no easy way of reproducing it since the algorithm that produced it depends on the order in which the columns were traversed. Modules like body.js don't have the whole set of columns, only the ones that get data displayed (nested column headers are unknown to it) so it would not be able to reproduce it.

So, just so you know, it is quite a mess!

@Satyam Satyam closed this Jul 3, 2013

@Satyam Satyam reopened this Jul 3, 2013

@Satyam

This comment has been minimized.

Copy link
Contributor

Satyam commented Jul 3, 2013

Sorry, I meant to close a draft comment I was writing and closed the pull request instead. Sorry as well for the long comment above, but I wanted to put together all my thoughts.

This are some changes I suggest:

  • Stop using col._id
  • In _setColumns ** make sure name gets a value: col.name = col.name || col.key || col.id.
  • Check all the code for instances of key = col.name || col.key since now col.name will always have a value.
  • Use col.name without any sanitizing as the main key to all internal references since there will always be one. Basically the problem is that _id got sanitized while name would not. In sanitizing, duplicates could arise (a b and a-b resulted in the same _id) which would then have an ordinal added to it, which would then result in an unpredictable identifier hardly related to its source. With proper sanitation (dots, hash signs, underscores are not replaced) the odds for duplication would increase.
  • Add Y.log warning if any col.name matches any col.key
  • Generate CSS classNames out of a properly sanitized col.name when required. Do not add ordinals to avoid duplicates, it shouldn't affect the internals and if it does affect the styles, then it will be quite obvious. Take care of cleaning up all invalid characters.
  • start deprecation for field, id and _id, they are mostly potential sources of duplicate identifiers of questionable use.
  • Add a fake DataTable.Column class full of API doc comments, not a real class in code, just the API Docs and drop Appendix A from the user guide. It just feels worng not to have the Column documented along the rest of the API

I'll start with those changes and see how they work out in the tests, but I'll love to hear your comments while I do so.
Thanks

@Satyam

This comment has been minimized.

Copy link
Contributor

Satyam commented Jul 4, 2013

While doing the changes above I have found a bunch of inconsistencies in the use of all those identifiers which, no matter how to deal with the points above, will need to be addressed. Cells are located by CSS className selectors based on name, key or even id which only work basically because all the tests include key and hardly ever any of the other and because such keys don't have whitespace and thus the sanitized version is the same as the original. This happens most often in modules added or changed later which speaks of the lack of clarity in the code. (this certainly includes code I added which is the reason I started this revision). It is not so much as that they are not documented (columns don't have an API entry but are documented in an appendix in the user guide) but that the code itself is misleading by using and abusing the identifier key to refer to different things which, in some conditions might not have the same value. Sometimes it is truly the key, but sometimes it is the sanitized _id which is derived from the name, if one exists, and not from the key.

They can be fixed without going through deprecating some of all those sources for identifiers: field, name, key, id and, if all fails, a plain Y.stamp(), but it would certainly be easier and clearer for the future, if all those get reduced to something manageable.

As I went through all that code, I found myself wondering, whether the section views (head, body and foot) will ever be run independently. They all take a reference to a host (the DataTable instance itself) but don't count on it. Most unit tests are run on instances of these sections on their own, not as part of the DataTable. All of them contain extra code to cope with having no host. For example, all three have exact copies (fortunately they actually are indeed the same) of a getClassName method which call on the host getClassName method, if there is a host or manage on their own. There are many cases of code asking for host and doing their best to cope. This is not just wasted code (and duplicated like getClassName) but it has the funny effect that sometimes the unit tests don't test the actual working code (where there is a host) and instead test the branch that copes with the no-host situation, which is never actually used in real life.

Satyam added some commits Jul 6, 2013

Fixed url and returned false from nodeFormatter
A URL was missing the #
Also, the examples of `nodeFormatter` were not returning false
as the documentation insists they should.
Added documentation-only DataTable.Column class
In order to have some place to put column definitions,
I decided to create a fake class DataTable.Column which
doesn't really have any code but holds the definition
of the properties in the columns definitions.

The column properties have been spread over the files that
actually use them, though the ones in `core.js` should be
distributed into others (like `formatter` in BodyView or
`label` in HeaderView) but they are all listed as belonging
to the base in the user guide.
@Satyam

This comment has been minimized.

Copy link
Contributor

Satyam commented Jul 6, 2013

I scaled back on the fixes to code (which I'm still eager to discuss) and submitted only the suggested documentation changes.

I added a fake DataTable.Column class which doesn't actually have any code at all, only API doc entries to document the column configuration information.

I believe this lack of a single place to document such an important set of configuration information to be the cause of a lot of errors in the code due to misunderstanding about what is what. All these properties were documented, somehow, but in the user guide or in API doc entries of the methods using those properties, not in a centralized and cross-linked fashion.

With this added information, I feel like much of those long descriptions at the top of several of the classes can go away as much of it is in the User Guide and the Columns, which were perhaps the main missing part, now do have their entries.

Those long Class descriptions are a little annoying as you have to scroll down several pages to get to whatever it is you are looking for. I'll be happy to check if anything in these descriptions is missing from other places such as the Users Guide or API docs and reduce them to the bare minimum.

Satyam added some commits Jul 7, 2013

Added a couple of Y.log statements
Not a big deal since it won't show in the raw or minified version
but if there are duplicate keys, it will log a warning in the
-debug version
Changed `columns` into `displayCols`
To make it clearly different from the whole tree of column defs,
I called this flat array `displayCols` but only in the local variables,
the public interfaces are the same.
Fixed the API docs of `refreshRow` which incorrectly stated that the
third argument wasan array of columns while it really is an array of
names.  Also changed the way to look for the columns since the
className might not be built out of the `key`
@apipkin

This comment has been minimized.

Copy link
Contributor

apipkin commented Jul 8, 2013

@Satyam Yeah I'm totally feeling the notion of the headers being too large although I do not see those changes in this pull request. I have often felt DataTable was way too large for just one page in the user guides and forces the need to put extra documentation in the API docs.

I'll be checking this in today and testing it out and hopefully get it into dev-3.x.

@Satyam

This comment has been minimized.

Copy link
Contributor

Satyam commented Jul 8, 2013

@apipkin The headers are there indeed, I asked for comments on whether anybody else felt the headers could be stripped. Actually, many of the comments above are suggestions for comments. Very few of them did I translate into actual code since a few of them might risk breaking backward compatibility, though I can hardly imagine anyone using them.

One item that I would really like everyone's opinion on is whether to keep the separate section views as standalone, with conditionals asking whether there is a host or not. The tests run those section views without a host (and they are the only ones doing so) so that the tests do not test the branches that will actually work in real life. Plus lots of repetition. Three getClassName methods just to be able to run those views standalone are three too many.

@apipkin

This comment has been minimized.

Copy link
Contributor

apipkin commented Jul 8, 2013

I think there is a lot to be said for the idea of a view being independent of the host. However, there is a lot of code in the view that doesn't exist or work at all without one it seems. At the very least, it seems to add a lot of overhead and complexity. My opinion is that the views should be swappable, but the host should not be. I think if we remove this type of conditional contract in the views, the complexity of the views could go way down. My fear is that this will cause an issue with backwards compatibility.

@lsmith what is your thoughts on this? What was your plan with this design pattern originally?

@lsmith

This comment has been minimized.

Copy link
Contributor

lsmith commented Jul 8, 2013

I'm going to ramble a bit here, so bear with me.

I was just learning how to use Views when building it. I built it wrong. There should only be one view for the entire table, if any at all (versus having the rendering code directly in the widget). The view, if it exists, should be responsible for rendering the configuration state into the DOM and firing events in response to user interaction. The widget (base) should subscribe to those events and modify attribute state, which the view would propagate back to the DOM. The challenging part is justifying why there needs to two objects in the first place, since the table is essentially a view of a ModelList. Beyond the data ModelList, I can't think of any configuration that isn't view related. This opens the question: should it be just a widget, or just a view? The only reasons I can figure for the "just a view" argument are

  1. It provides a UI for a ModelList, and
  2. Widget was a rollup of the MVC pattern, which was later given proper separation in the App components, so Widget should be avoided for new components.

I don't know what the consensus is on the YUI team wrt #2, or if there are guidelines for the breakdown between widgets and views/models when it comes to building new components. This was an open issue when I left.

Ignoring any (possibly imagined) motivation to avoid widget in favor of model/view constructions, DataTable does seem a good candidate for Widget, since it uses the attributes that differentiate between Widget and Base.

What follows is a question of features. Should widget features be added via plugin, or is the Base.mix() approach I used in DT adequate or even preferable. I don't like subclassing because it doesn't allow a mix-and-match approach, which wi DT needs. I still like the Base.mix() approach, but it could well be that features coming in two parts (the view and the class extension that ties DT to the view) overcomplicates things. It is in keeping with the general pattern of Object-to-configure + object-to-render seen in DT proper. If that core distinction is ill conceived (there should be only a Widget or only a view), then it should follow that the features also don't split out views. However, if features are mixed in with Base.mix(), it seems odd to me that they would be mixed into a View rather than a Widget, but only because a feature like cell editing would need to also play the Controller role, which would be inappropriate in View logic. But where else would it go? And wouldn't the mixin and/or configuration/instantiation process then become burdensome?

I can see a world where a DT widget exists as well as a series of Node plugins to enable features for existing nodes, without the need for a complex table widget. I've often wondered how far I could get, starting from the node plugin angle instead. Mixing and matching features could quickly become a nightmare in this scenario, but maybe a compatibility matrix would be possible, with the DT widget allowing for full coexistence at the cost of complexity.

Now back to your question.

A standalone table View makes sense insofar as my above rant hasn't persuaded you that maybe there shouldn't be any Views in DT at all. I can't see a substantial value in having the thead and tbody Views standalone. The introduction of the table View was the first step in phasing them out. So I'm not terribly concerned about maintaining any perceived "architectural purity" when it comes to the sub-Views packaged with the core DT. Please feel free to improve the systems inside DT as you see fit.

@Satyam, sorry I haven't responded to your individual questions. There's a lot of code and architecture in DT to upload into my brain before giving a meaningful, and hopefully accurate response. I haven't done that because of other tasks that require my time and attention. And to be honest, at this point, you and @apipkin probably understand DT well enough that I'd be catching up with you to add my voice to the conversation, but little value.

@apipkin

This comment has been minimized.

Copy link
Contributor

apipkin commented Jul 8, 2013

@lsmith I think this is a very well thought post and I can say I agree with your position and as I feel Model + ModelList + View is the correct approach here as a base component. I think DT is a complex beast overall and there is a need for several positions around all of this. That being said, a view approach is clearly the best, especially when scrolling or resizing is added to it. I personally think there are some things that could be addressed a little better (clearer?) than the way it is currently set up.

For instance, right now there are three files that seems necessary for a single DT to be rendered: core, base, and table. I cannot for the life of me understand why this is in so many pieces and mixes in a View with a Widget which sounds like overkill to me.

There is also the idea of the templates living in separate files, which I addressed in the Paginator addition to DataTable. Paginator also took on the role of having a Model + View + Controller where the Controller was Base.mix()ed into DT. Once I got around the initial "curve" of this approach, it felt really natural to me and felt as though it was the appropriate way to build.

new topic piece?

I think there is a limitation to the current structure of DT as it is a very flat architecture living under the full umbrella of YUI. There should be a directory for templates, models, views, and controllers (similar to https://gist.github.com/apipkin/5953144?). They should be perceived as individual files and should be able to be swapped out or added to as needed. The only problem with this thought is a third introduction to a new architecture of DT and may be frowned upon or leave people with a very bad taste in their mouth. I think there is a lot of good in the way DT is currently built, but as we pile more and more on it, there seems to be some weakness in it. It seems as though any update or addition renders the topic of "this broke that, so we have to update that to make this work, but if we do, this other thing wont work any more" and maybe a compatibility matrix is the best solution.

Now bear in mind that I am by no means proposing a solution to the issue at hand, nor am I saying we start from scratch. But there is room for improvement, and there is possibly a way to reach the speed that I would like to see and promote as much flexibility as it currently has or even add more to it.

@triptych

This comment has been minimized.

Copy link
Contributor

triptych commented Aug 29, 2013

So @lsmith @apipkin @Satyam would you consider this PR still a work in progress, mainly a discussion PR, or is this something that's simply waiting for reviewer approval to get pulled in?

{ key: 'MfgvaPrtNum', label: 'Part Number' }
@property label
@@type String

This comment has been minimized.

@lsmith

lsmith Sep 22, 2013

Contributor

Typo @@, and should be @type {HTML}

/**
Assigns the value `<th abbr="HERE">`.
{

This comment has been minimized.

@lsmith

lsmith Sep 22, 2013

Contributor

Should these be in @example blocks or triple-backtick blocks instead or indented?

This comment has been minimized.

@rgrove

rgrove Sep 23, 2013

Contributor

@example probably (though it's not required), but triple backticks are only supported in GitHub-flavored Markdown. Indentation is the right thing to do for normal Markdown.

This comment has been minimized.

@lsmith

lsmith Sep 23, 2013

Contributor

Good to know, @rgrove. Thanks.

{ label: 'Name', children: [
{ key: 'firstName', label: 'First`},
{ key: 'lastName', label: 'Last`},

This comment has been minimized.

@lsmith

lsmith Sep 22, 2013

Contributor

trailing comma

}
@property headerTemplate
@type HTML template

This comment has been minimized.

@lsmith

lsmith Sep 22, 2013

Contributor

Can just be @type HTML

[Y.DataTable.BodyView.Formatters](DataTable.BodyView.Formatters.html).
If one such formatting function exists, it will be used.
If not named formatted function is found, it will be assumed to be a template

This comment has been minimized.

@lsmith

lsmith Sep 22, 2013

Contributor

If no such named formatter is found...

* __record__ The Model for this row.
* __column__ The column configuration object.
* __rowIndex__ The index of the current Model in the ModelList.
_Typically_ corelates to the row index as well.

This comment has been minimized.

@lsmith

lsmith Sep 22, 2013

Contributor

Typo correlates

}
@property emptyCellValue
@type String

This comment has been minimized.

@lsmith

lsmith Sep 22, 2013

Contributor

@type {String|HTML}, depending on the setting of allowHTML. Same with formatter results, and it's probably worth noting that there as well.

@for DataTable
*/
/**
Format specificacion for columns using the

This comment has been minimized.

@lsmith

lsmith Sep 22, 2013

Contributor

Typo specification

@for DataTable.Column
*/
/**
Format specificacion for columns using the

This comment has been minimized.

@lsmith

lsmith Sep 22, 2013

Contributor

Ditto typo

two record Models to compare, and the third being a boolean
`true` if the sort order should be descending.
The function should return `-1` to sort `a` above `b`, `-1`

This comment has been minimized.

@lsmith

lsmith Sep 22, 2013

Contributor

Typo, you say to return -1 for both.

attribute for that.
@property sortDir
@type Integer

This comment has been minimized.

@lsmith

lsmith Sep 22, 2013

Contributor

@type {Number}

@lsmith

This comment has been minimized.

Copy link
Contributor

lsmith commented Sep 22, 2013

@Satyam I love this PR. The added docs are fantastic and the minor code updates definitely help readability. A big 👍. @apipkin Can you merge this in? @Satyam, do you want to correct the few typos I nitpicked beforehand?

As to the code/architectural discussion above, let's move that to an issue rather than let it hold up this PR.

@Satyam

This comment has been minimized.

Copy link
Contributor

Satyam commented Sep 23, 2013

@lsmith Thanks for your comments. As mentioned elsewhere, I am not going to touch any of these many pending PRs.

@apipkin

This comment has been minimized.

Copy link
Contributor

apipkin commented Sep 25, 2013

@lsmith @Satyam I'll try to get this pulled down and updated and get it merged in. Thanks for the 👍 @lsmith

@lsmith

This comment has been minimized.

Copy link
Contributor

lsmith commented Sep 25, 2013

Awesome! Thanks @apipkin.

@apipkin apipkin merged commit a6016fb into yui:dev-3.x Sep 26, 2013

1 check passed

default The Travis CI build passed
Details

@ghost ghost assigned apipkin Oct 8, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment