Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App. H: <text address> Byte Order #67

Open
2 of 6 tasks
tajmone opened this issue Mar 1, 2020 · 9 comments
Open
2 of 6 tasks

App. H: <text address> Byte Order #67

tajmone opened this issue Mar 1, 2020 · 9 comments
Assignees
Labels
📋 Roody Notes From @roodyyogurt annotated PDF of Hugo Book 🕑 pending approval Issue requires approval by Ken Tessman 🕑 pending decision Issue requires decisions by maintainers 💀 code problems Problems with code examples or syntax definitions. 💀 text problems Text problems, typos or obsolete contents.
Milestone

Comments

@tajmone
Copy link
Owner

tajmone commented Mar 1, 2020

  • Get @tessman approval for these changes.
  • Fix document source.
  • Document the changes in CHANGES.md.
  • Document in ChangeLog?
  • Update commented annotation in source file.

Regarding the following passage from App. H: Code Patterns, on the textdata# token ($47):

47 <text address>

Where <text address> is three bytes (in lowest-to-highest byte order) giving the address of the entry in the text bank.

@juhana commented in @roodyyogurt's annotated PDF:

Actually the byte order is 3-1-2 with the highest byte first and the lowest byte second, e.g. address 65 43 21 is given as 65 21 43.

  • Work out if and how the original text should be modified in view of the above comment.
@tajmone tajmone added 🕑 pending decision Issue requires decisions by maintainers 🕑 pending approval Issue requires approval by Ken Tessman 📋 Roody Notes From @roodyyogurt annotated PDF of Hugo Book labels Mar 1, 2020
@tajmone tajmone added this to the text fixes milestone Mar 1, 2020
@tajmone tajmone pinned this issue Mar 8, 2020
@tajmone
Copy link
Owner Author

tajmone commented Mar 8, 2020

Actually the byte order is 3-1-2 with the highest byte first and the lowest byte second, e.g. address 65 43 21 is given as 65 21 43.

I'm not quite convinced about this argument.

Having briefly worked with the Hugo Engine sources, I remember quite clearly that all addresses are stored in the .HEX story file as two-bytes (in Big Endian), so there's no third byte involved here.

Maybe the above comment regards how the Hugo Engine interprets the stored value to calculate the actual offset (or "real address") — a third value (address_scale) intervenes in memory, as an offset multiplier, and there are also the Table offsets (memory segments) to account for, in some contexts.

It's not clear to me which context the above is referring to when it mentions handling addresses via three bytes. The Hugo Engine uses various formulas to calculate addresses, depending on the context and whether the story file is loaded in a memory buffer or read from disk on demand.

For example, in the Hugo heheader.h source header (LL 1211), PeekWord() is defined as:

HUGO_INLINE unsigned int PeekWord(long a)
   { return (unsigned char)MEM(defseg*16L+a) + (unsigned char)MEM(defseg*16L+a+1)*256; }

So, the third byte mentioned in the comment might refer to memory segment and its scale-factor (MEM(defseg*16L+a), where 16L is the equivalent of the address_scale factor)

In the Hugo he.c source file (LL 53) we read:

/* address_scale refers to the factor by which addresses are multiplied to
   get the "real" address.  In this way, a 16-bit integer can reference
   64K * 16 = 1024K of memory.
*/
int address_scale = 16;

Regarding table offsets, in footnote 64 of the Hugo Book we read:

64. Table offsets are equal to the offset of the beginning of the table from the start of data, divided by 16.

But the above passage from App. H seems to refer to how Hugo code is stored in the .HEX file, not to how it's interpreted by the Engine for practical purposes. Appendix H clearly states in the opening paragraph that:

What follows is a detailed breakdown of how the set of valid tokens in Hugo is encoded and read within compiled code.

So, in that context, the definition of <dictionary entry> seems correct to me — it doesn't need to speculate on how the memory address is converted to a real offset (especially since this might vary depending on whether the whole .HEX file is read into memory or from file).

Any further speculations on how "real addresses" are calculated might depend on the architecture on which the Engine is running (its endianess, pointer size, etc.).

@tessman, any thoughts on this?

@roodyyogurt: you mentioned in a private message that this note was pointed to you by a third party, any changes to ask him/her for clarifications?

@juhana
Copy link

juhana commented Mar 8, 2020

Hi, the note came from me originally. This issue is about token 47 (text), not token 46. The manual states:

47 <text address>

Where <text address> is three bytes (in lowest-to-highest byte order) giving the address of the entry in the text bank.

"Three bytes (in lowest-to-highest byte order)" suggests a little-endian three-byte value, but it's either a mixed-endian 3-1-2 three-byte value, or one offset byte + a two-byte value (which is the same thing in practice.) In either case the documentation could be improved in this part, I remember this being a source of confusion when I was implementing tools that parsed .hex files.

@tajmone tajmone changed the title App. H: <dictionary entry> Byte Order App. H: <text address> Byte Order Mar 8, 2020
@tajmone
Copy link
Owner Author

tajmone commented Mar 8, 2020

@juhana:

This issue is about token 47 (text), not token 46.

Now it makes perfect sense (and makes me wonder why I didn't consider the surrounding tokens in the first place).

I've amended the Issue accordingly.

the note came from me originally.

On Roody Yogurt's advise, I've mentioned you and Nikos Chantziaras in the Acknowledgements section in relation to your contribution to his PDF notes.

the documentation could be improved in this part, I remember this being a source of confusion when I was implementing tools that parsed .hex files.

I totally agree. I'm also working on a proof-of-concept Hugo terp and am struggling with various offsets and segments, and have to resort to studying the C sources to get a clearer picture.

Addition of new contents is currently being discussed in Issue #65. As a general rule, the goal of this project is limited to preserving the original text, with the exception of minor corrections. But I'm strongly in favour of adding similar clarifications to the existing text (that is, with @tessman's approval) if these improve the usability of the original manual.

@tajmone
Copy link
Owner Author

tajmone commented Mar 8, 2020

Token $47 in Book Examples

It's worth noting here that in §21.1. Before, After, and Other Complex Properties token $47 appears twice in the break-down analysis of the compiled example:

00004C: 47 00 00 00

The <textdata#> label is followed by three bytes giving the address in the text bank of the printed string "You pick up the object."

and

000059: 47 00 19 00 0D 00 00

The second line of text is printed here, followed by $0D to signal the end of this block of code and zero-padding to the next address boundary.

Possibly, the same clarifications might apply here (in the first occurrence), either via an admonition note (or tip) block, or via addition of a footnote (the former would make the text easier to understand for the reader, even though the central topic of this section is not about how text strings are handled).

Either way, it seems worth to add a cross reference to the textdata# token definition in App. H, as well as a cross reference to the above example in the definition itself (requires adding two new custom anchor IDs to allowing links targeting).

@tajmone tajmone added 💀 code problems Problems with code examples or syntax definitions. 💀 text problems Text problems, typos or obsolete contents. labels Mar 8, 2020
@juhana
Copy link

juhana commented Mar 8, 2020

I suggest two options:

47 <offset> <text address>

Where <text address> is two bytes (in lowest-to-highest byte order) giving the address of the entry in the text bank at <offset> (one byte).

(Additions in bold text.) This would be a larger change but a more accurate description.

The other option is to remove the parenthetical:

47 <text address>

Where <text address> is three bytes giving the address of the entry in the text bank.

This would be a less distruptive change. It wouldn't tell how the address is constructed, but it would also not give misleading information.

@tajmone
Copy link
Owner Author

tajmone commented Mar 8, 2020

I like your first proposal, for it provides more insight into Hugo inner workings (since Book II is for advanced users, providing such info seems preferable to withholding it).

Also, I don't think it would be a disruptive change (not more than some text tweaks that made it through to v1.0.0), for it's still within the scope and intention of the original book — if I've understood correctly some sparse notes I've come across, at some point Kent suffered an hard disk crash which caused the loss of his working copy of the Hugo Book, which somehow affected the ongoing work of the last due release (i.e. track was lost of what was already edited and what was pending). Probably that was the reason for these discrepancies (which are few in number, since the book is very well polished in general).

I've personally experienced the loss of my working copy of documents in Word format, and know how hard it can be to resume work when you're not sure what survived and what was lost of your planned changes. Without version controlled sources, it's very hard to track changes between document versions when some files are lost for good.

Let's see what @tessman decides in this regard.

@tessman
Copy link
Collaborator

tessman commented Mar 9, 2020

Sorry to be late to follow up on this. @juhana is right about the mixed byte-order. If I remember right, it was an extension to the original two-byte address, until a game exceeded that with a large amount of text, so the solution was to increase the total possible amount of text by adding another address component. Unfortunately this was neither documented nor commented particularly clearly.

(I do still have most/all material from Hugo's active development days; it's just been sort of piled into one corner to await my going through it.)

tajmone added a commit that referenced this issue Mar 9, 2020
Update the definition of the `textdata#` token ($47) to mirror current
Hugo behaviour (See #67).
@tajmone
Copy link
Owner Author

tajmone commented Mar 9, 2020

(I do still have most/all material from Hugo's active development days; it's just been sort of piled into one corner to await my going through it.)

Good to know that it didn't go lost then. Would be nice to have access to it.

Definition Amended!

OK, I've now also amended the definition of the textdata# token to:

47 <offset> <text address>

Where <text address> is two bytes (in lowest-to-highest byte order) giving the address of the entry in the text bank at <offset> (one byte).

I think that this type of text fixes still qualify as the "legitimate significant changes" that have so far considered acceptable under the definition:

By "significant changes" we mean those changes which would be worth mentioning in an Errata.

This is exactly the type of fixes that (in case of a printed book) the author/editor would supply in an Errata document. It's significant because it affects what the reader learns, and it's worth fixing because with digital docs is cheap to do so and it fulfils the original book goals.

I've also documented all the chances so far, but marked them in the source comments as "pending approval".

@tajmone
Copy link
Owner Author

tajmone commented Mar 9, 2020

Cross References Proposal

Regarding the above mentioned examples of Token 47 in the book, I propose the following:

  1. In the definition of textdata# in App. H, add immediately after its definition a TIP admonition cross referencing the example in §21.1. Before, After, and Other Complex Properties:

    For a practical example of textdata# in Hugo compiled code, see Sec. §21.1.

    pointing to a custom anchor ID in §21.1, where the example is found.

  2. In §21.1, where the following example is found:

    00004C: 47 00 00 00
    

    The <textdata#> label is followed by three bytes giving the address in the text bank of the printed string "You pick up the object."

    Make the text "<textdata#> label" into a link pointing to a custom anchor ID where the definition of textdata# in App. H is located.

IMO, these small additions could improve the reading/studying experience, saving the end user the trouble of having to sift through all the book (which is rather big). These changes being just a link and an aside addition, should qualify as minor changes in line with the original book goals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📋 Roody Notes From @roodyyogurt annotated PDF of Hugo Book 🕑 pending approval Issue requires approval by Ken Tessman 🕑 pending decision Issue requires decisions by maintainers 💀 code problems Problems with code examples or syntax definitions. 💀 text problems Text problems, typos or obsolete contents.
Projects
None yet
Development

No branches or pull requests

3 participants