Support adding/replacing MERGEFIELDs #170

0xdevalias · 2018-03-29T01:22:57Z

I'm trying to understand if/how 'MERGEFIELDS' are supported within gooxml, or if it is the kind of thing I would need to drop into .X() to handle?

I did see that there are doc.FormFields(), r.AddField(), etc functions, but as best I could tell, these didn't seem to do what I want. I also came across the 'KnownFields', which seems to correlate with this, but couldn't tell if it was associated to some deeper support/code:

https://github.com/baliance/gooxml/blob/master/document/knownfields.go

Essentially, is there a way to create, read, edit/update, etc these elements in a gooxml native way currently? And if not, do you have any suggestions of the best way to interact with them?

Below is a snippet from a document that uses these fields:

<w:p w14:paraId="1566BC4D" w14:textId="3B6A9F12" w:rsidR="006D368D" w:rsidRPr="00497636" w:rsidRDefault="000E0283">
        <w:pPr>
            <w:rPr>
                <w:lang w:val="en-AU"/>
            </w:rPr>
        </w:pPr>
        <w:r>
            <w:rPr>
                <w:lang w:val="en-AU"/>
            </w:rPr>
            <w:t>Merge Field:</w:t>
        </w:r>
        <w:r w:rsidR="006D368D">
            <w:rPr>
                <w:lang w:val="en-AU"/>
            </w:rPr>
            <w:t xml:space="preserve">
            </w:t>
        </w:r>
        <w:r w:rsidRPr="00497636">
            <w:rPr>
                <w:lang w:val="en-AU"/>
            </w:rPr>
            <w:fldChar w:fldCharType="begin"/>
        </w:r>
        <w:r w:rsidRPr="00497636">
            <w:rPr>
                <w:lang w:val="en-AU"/>
            </w:rPr>
            <w:instrText xml:space="preserve"> MERGEFIELD  $Foo.Bar  \* MERGEFORMAT </w:instrText>
        </w:r>
        <w:r w:rsidRPr="00497636">
            <w:rPr>
                <w:lang w:val="en-AU"/>
            </w:rPr>
            <w:fldChar w:fldCharType="separate"/>
        </w:r>
        <w:r w:rsidRPr="00497636">
            <w:rPr>
                <w:lang w:val="en-AU"/>
            </w:rPr>
            <w:t>«$Foo.Bar»</w:t>
        </w:r>
        <w:r w:rsidRPr="00497636">
            <w:rPr>
                <w:lang w:val="en-AU"/>
            </w:rPr>
            <w:fldChar w:fldCharType="end"/>
        </w:r>
    </w:p>

Refs:

The text was updated successfully, but these errors were encountered:

tbaliance · 2018-03-29T02:57:33Z

_examples/document/header-footer/main.go has example usage of AddField, which just calls:

func (r Run) AddFieldWithFormatting(code string, fmt string)

This may come close:

run.AddFieldWithFormatting("MERGEFIELD","$Foo.Bar  \* MERGEFORMAT")

but it won't do the separate field. If you look at AddFieldWithFormatting, you can see how it's done though and probably start with that function and knock it out in a few minutes.

If you get it working, paste your code back here and I'll clean it up and try to figure out a generic API around it to check in (or feel free to send a PR as well).

0xdevalias · 2018-03-29T02:59:57Z

Thanks for the pointers :) Shall have a bit of a play and hopefully have something to paste back here.

0xdevalias · 2018-04-03T06:15:51Z

I figure I'll add some context/PoC code here as I go, in case it helps others in future that want to explore adding something. To start off, I wanted to understand how to basically create the equivalent structure of the merge field run that my original document contained:

PoC Go Code

func test_PoC_AppendMergeFieldRun() {
	outName := "PoC_AppendMergeFieldRun.docx"

	d := document.New()
	p := d.AddParagraph()
	PoC_AppendMergeFieldRun(&p, "$Foo.Bar")
	d.SaveToFile(outName)

	log.Println("Written file to: ", outName)
}

func PoC_AppendMergeFieldRun(p *document.Paragraph, fieldName string) *document.Paragraph {
	// Helpers
	fldCharBegin := &wml.CT_FldChar{FldCharTypeAttr: wml.ST_FldCharTypeBegin}
	fldCharSeparate := &wml.CT_FldChar{FldCharTypeAttr: wml.ST_FldCharTypeSeparate}
	fldCharEnd := &wml.CT_FldChar{FldCharTypeAttr: wml.ST_FldCharTypeEnd}

	preserve := "preserve"

	ricFldChar := func(fc *wml.CT_FldChar) *wml.EG_RunInnerContent {
		return &wml.EG_RunInnerContent{FldChar: fc}
	}

	mergeField := func(fieldName string) *wml.EG_RunInnerContent {
		instrText := wml.NewCT_Text()

		instrText.SpaceAttr = &preserve
		instrText.Content = fmt.Sprintf(` MERGEFIELD  %s  \* MERGEFORMAT `, fieldName)
		// TODO: This format can have different options aside from MERGEFORMAT..

		return &wml.EG_RunInnerContent{InstrText: instrText}
	}

	appendRunInnerContent := func(r *document.Run, c *wml.EG_RunInnerContent) {
		r.X().EG_RunInnerContent = append(r.X().EG_RunInnerContent, c)
	}

	// Start the run

	// <w:t>Merge Field:</w:t>
	r1 := p.AddRun()
	r1.AddText("Merge Field:")

	// <w:t xml:space="preserve">
	//            </w:t>
	r2 := p.AddRun()
	//r2.AddText("")
	ps := wml.NewCT_Text()
	ps.SpaceAttr = &preserve
	ps.Content = "\n"
	appendRunInnerContent(&r2, &wml.EG_RunInnerContent{T: ps})

	// <w:fldChar w:fldCharType="begin"/>
	r3 := p.AddRun()
	appendRunInnerContent(&r3, ricFldChar(fldCharBegin))

	// <w:instrText xml:space="preserve"> MERGEFIELD  $Foo.Bar  \* MERGEFORMAT </w:instrText>
	r4 := p.AddRun()
	appendRunInnerContent(&r4, mergeField(fieldName))

	// <w:fldChar w:fldCharType="separate"/>
	r5 := p.AddRun()
	appendRunInnerContent(&r5, ricFldChar(fldCharSeparate))

	// <w:t>«$Foo.Bar»</w:t>
	r6 := p.AddRun()
	r6.AddText(fmt.Sprintf("«%s»", fieldName))

	// <w:fldChar w:fldCharType="end"/>
	r7 := p.AddRun()
	appendRunInnerContent(&r7, ricFldChar(fldCharEnd))

	return p
}

This resulted in the following structure in my produced .docx:

Output Paragraph XML Structure

..snip..
<w:body>
    <w:p>
        <w:r>
            <w:t>Merge Field:</w:t>
        </w:r>
        <w:r>
            <w:t xml:space="preserve">
            </w:t>
        </w:r>
        <w:r>
            <w:fldChar w:fldCharType="begin"/>
        </w:r>
        <w:r>
            <w:instrText xml:space="preserve"> MERGEFIELD  $Foo.Bar  \* MERGEFORMAT </w:instrText>
        </w:r>
        <w:r>
            <w:fldChar w:fldCharType="separate"/>
        </w:r>
        <w:r>
            <w:t>«$Foo.Bar»</w:t>
        </w:r>
        <w:r>
            <w:fldChar w:fldCharType="end"/>
        </w:r>
    </w:p>
</w:body>
</w:document>

Obviously this full run as implemented here wouldn't be required to add a 'create merge field' type helper function, as this has some static text beforehand and similar. While the naive merge field would be easy to do, given they can support all sorts of weird/wonderful caveats, features, nesting, etc, i'm not sure if it would be worth the effort to try and figure out a powerful 'general' pattern. Though maybe we can just support the basic use case for it.

Now that I more or less understand how to put it in (minus a few bits that seemed probably irrelevant for my needs) my next step from here is to go backwards, and figure out how to parse this out of an existing document, so I can replace the MergeField with some static text. I think I understand the components, just a matter of playing around/implementing it.

So my basic approach will likely be:

For a given paragraph
For each run
Get the EG_RunInnerContent slice
For each element
- If element is a FldChar begin, then collect it in a 'field' slice and set 'insideBegin' flag
- If we're not insideBegin, append the element to our 'normal' slice, go to next element
- If we are insideBegin, append to our 'field' slice, go to next element
- If we are insideBegin, and the element is an InstrText and a MERGEFIELD, take note of it's fieldName
- If we are insideBegin and the element is a FldChar end, decide if we are trying to replace the captured fieldName
  - Yes: clear the 'field' slice, add a new CT_Text with our replacement text to the 'normal' slice
  - No: append the collected 'field' slice elements to the the 'normal' slice

That's the basic naive approach i'm thinking of. There are almost certainly some potential issues/caveats that will need to be addressed such as:

Ensuring any formatting/properties are appropriately collected/copied over somehow
Potentially handling nested fields
Etc

For reference, I'm sort of looking to support (or understand how hard it would be to implement) similar functionality to https://github.com/opensagres/xdocreport

Will likely keep looking into this tomorrow.

0xdevalias · 2018-04-03T23:51:26Z

Building on what we have above, here is some sample code that will display some of the basics of the relevant tags for a given paragaph:

func test_PoC_ExtractParagraphMergeFields() {
	d := document.New()
	p := d.AddParagraph()
	PoC_AppendMergeFieldRun(&p, "$Foo.Bar")
	PoC_ExtractParagraphMergeFields(&p)
}

func PoC_ExtractParagraphMergeFields(p *document.Paragraph) {
	for _, run := range p.Runs() {
		log.Println("Next run, innerContentLen: ", len(run.X().EG_RunInnerContent))
		for _, innerContent := range run.X().EG_RunInnerContent {
			switch {
			case innerContent.FldChar != nil && innerContent.FldChar.FldCharTypeAttr == wml.ST_FldCharTypeBegin:
				log.Println("Found FldChar Begin")
			case innerContent.FldChar != nil && innerContent.FldChar.FldCharTypeAttr == wml.ST_FldCharTypeEnd:
				log.Println("Found FldChar End")
			case innerContent.InstrText != nil && strings.Contains(innerContent.InstrText.Content, "MERGEFIELD"):
				log.Println("Found MERGEFIELD: ", innerContent.InstrText.Content)
			}
		}
	}
}

This produces the following output:

⇒  go run *.go
2018/04/04 09:51:13 Next run, innerContentLen:  1
2018/04/04 09:51:13 Next run, innerContentLen:  1
2018/04/04 09:51:13 Next run, innerContentLen:  1
2018/04/04 09:51:13 Found FldChar Begin
2018/04/04 09:51:13 Next run, innerContentLen:  1
2018/04/04 09:51:13 Found MERGEFIELD:   MERGEFIELD  $Foo.Bar  \* MERGEFORMAT
2018/04/04 09:51:13 Next run, innerContentLen:  1
2018/04/04 09:51:13 Next run, innerContentLen:  1
2018/04/04 09:51:13 Next run, innerContentLen:  1
2018/04/04 09:51:13 Found FldChar End

It looks like my original theory will have to be modified slightly, as the elements are all split over a number of runs, with a single element in each.

It's also worth noting that according to the 'Complex Fields' section of http://officeopenxml.com/WPfields.php:

Complex fields are used when multiple runs are necessary due to differences in formatting. They can span multiple paragraphs or runs.

0xdevalias · 2018-04-04T00:45:15Z

Ok, so this is a rather naive implementation, and may not account for all of the potential intricacies/edge cases.. but it works in this most basic of test cases:

func test_PoC_ExtractParagraphMergeFields() {
	outName := "PoC_ExtractParagraphMergeFields.docx"

	replacements := map[string]string{
		"$foo.bar": "REPLACEMENT!",
	}

	d := document.New()
	p := d.AddParagraph()
	PoC_AppendMergeFieldRun(&p, "$foo.bar")
	PoC_ReplaceParagraphMergeFields(&p, replacements)
	d.SaveToFile(outName)

	log.Println("Written file to: ", outName)
}

func PoC_ReplaceParagraphMergeFields(p *document.Paragraph, replacements map[string]string) {
	var insideComplexField = false
	var hitSeparate = false
	var mergeFieldName string

	regexMergeFieldName := regexp.MustCompile(`(?:MERGEFIELD\s*?)([^\s]+)`)

	for _, run := range p.Runs() {
		log.Printf(
			"Next run, innerContentLen(%v) insideComplexField(%v) hitSeparate(%v) mergeFieldName(%v)\n",
			len(run.X().EG_RunInnerContent),
			insideComplexField,
			hitSeparate,
			mergeFieldName)

		innerContent := run.X().EG_RunInnerContent[0] // TODO: Be less hacky, these runs seem to only have 1 inner element.
		//for _, innerContent := range run.X().EG_RunInnerContent {
		switch {
		case innerContent.FldChar != nil && innerContent.FldChar.FldCharTypeAttr == wml.ST_FldCharTypeBegin:
			log.Println("Found FldChar Begin")
			insideComplexField = true
			hitSeparate = false
			p.RemoveRun(run)
		case innerContent.FldChar != nil && innerContent.FldChar.FldCharTypeAttr == wml.ST_FldCharTypeSeparate:
			log.Println("Found FldChar Separate")
			hitSeparate = true
			p.RemoveRun(run)
		case innerContent.FldChar != nil && innerContent.FldChar.FldCharTypeAttr == wml.ST_FldCharTypeEnd:
			log.Println("Found FldChar End")
			insideComplexField = false
			p.RemoveRun(run)
		case innerContent.InstrText != nil && strings.Contains(innerContent.InstrText.Content, "MERGEFIELD"):
			log.Println("Found MERGEFIELD: ", innerContent.InstrText.Content)
			mergeFieldName = regexMergeFieldName.FindStringSubmatch(innerContent.InstrText.Content)[1]
			p.RemoveRun(run)
		case hitSeparate && innerContent.T != nil:
			if strings.Contains(innerContent.T.Content, mergeFieldName) { // TODO: Not sure it actually has to match this to be valid..?
				if replacement, ok := replacements[mergeFieldName]; ok {
					log.Printf("Replacing mergefield '%s' with content: %s\n", mergeFieldName, replacement)
					innerContent.T.Content = replacement
				} else {
					log.Println("Couldn't find a replacement for our mergefield.. skipping:", replacement)
				}
			} else {
				log.Println("Text doesn't seem to match our mergefield.. skipping:", innerContent.T.Content)
			}
		case insideComplexField:
			log.Printf("Inside Complex Field, Unhandled case, removing run.. %+v", innerContent)
			p.RemoveRun(run)
		}
	}
}

In my little test run, this will maintain any formatting applied to the run, since we are only updating it's 'inner text' rather than replacing it entirely. This code also isn't properly accounting for the nuances of 'MERGEFORMAT'/other options like that, and it will always just keep the existing format.

It would probably make more sense for replacements to actually be able to insert it's own runs rather than just static text (possibly even need to 'push it up another level' so it can insert it's own paragraphs of runs to truly work 'properly') And then i'd imagine some helpers at the top level, so I can just say document.doMyMergeFields(replacements) and have the whole document cleanly handled, possibly in a similar way that the current 'form fields' are handled?

At this stage I'm not sure i'll continue down this path (at least for the current project), as the overhead of implementing the full support is leaning me more towards the existing JVM-based solution. Though if this ends up landing in the main library in a nice-to-use way, I would definitely be interested in checking it out/seeing if it is fit for purpose.

0xdevalias · 2018-06-22T04:15:37Z

@tbaliance Curious if this is something you'd be interested in/have time to clean up/implement at all? It's probably the main/only blocker for me to switching to this lib vs continuing with our legacy system built with opensagres/xdocreport (and all of it's weird, strange intricacies)

tbaliance · 2018-06-22T11:43:09Z

@0xdevalias I'll take a look and see if I can come up with something.

tbaliance · 2018-07-01T00:40:25Z

@0xdevalias Can you attach a sample document to perform replacement on?

Fixes #170

tbaliance · 2018-07-11T12:25:53Z

@0xdevalias Can you try out that branch and let me know if it works for you, it's only got replacing of merge fields and doesn't handle everything but does handle stuff like \f, \b, * Upper, etc.

0xdevalias · 2018-07-14T00:36:03Z

@tbaliance Sorry for the slow replies.. been pretty busy of late. Added to my todo list to checkout when I have a spare moment. Will let you know.

tbaliance · 2018-07-27T20:15:32Z

I'm going to merge the code in for now, feel free to open another issue if you run into problems with it.

Fixes #170

0xdevalias · 2018-09-05T06:19:41Z

@tbaliance Thanks for that! I have finally got around to playing with this, sent an email (to info@) with some richer comments/feedback.

0xdevalias changed the title ~~Understanding MERGEFIELD support~~ Support adding/replacing MERGEFIELDs Apr 4, 2018

tbaliance added a commit that referenced this issue Jul 11, 2018

document: support for replacing mail merge fields

341f8e6

Fixes #170

tbaliance mentioned this issue Jul 27, 2018

document: support for replacing mail merge fields #193

Merged

tbaliance closed this as completed in #193 Jul 27, 2018

tbaliance added a commit that referenced this issue Jul 27, 2018

document: support for replacing mail merge fields

3e25a72

Fixes #170

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support adding/replacing MERGEFIELDs #170

Support adding/replacing MERGEFIELDs #170

0xdevalias commented Mar 29, 2018 •

edited

Loading

tbaliance commented Mar 29, 2018

0xdevalias commented Mar 29, 2018

0xdevalias commented Apr 3, 2018 •

edited

Loading

0xdevalias commented Apr 3, 2018 •

edited

Loading

0xdevalias commented Apr 4, 2018 •

edited

Loading

0xdevalias commented Jun 22, 2018 •

edited

Loading

tbaliance commented Jun 22, 2018

tbaliance commented Jul 1, 2018

tbaliance commented Jul 11, 2018

0xdevalias commented Jul 14, 2018

tbaliance commented Jul 27, 2018

0xdevalias commented Sep 5, 2018

Support adding/replacing MERGEFIELDs #170

Support adding/replacing MERGEFIELDs #170

Comments

0xdevalias commented Mar 29, 2018 • edited Loading

tbaliance commented Mar 29, 2018

0xdevalias commented Mar 29, 2018

0xdevalias commented Apr 3, 2018 • edited Loading

0xdevalias commented Apr 3, 2018 • edited Loading

0xdevalias commented Apr 4, 2018 • edited Loading

0xdevalias commented Jun 22, 2018 • edited Loading

tbaliance commented Jun 22, 2018

tbaliance commented Jul 1, 2018

tbaliance commented Jul 11, 2018

0xdevalias commented Jul 14, 2018

tbaliance commented Jul 27, 2018

0xdevalias commented Sep 5, 2018

0xdevalias commented Mar 29, 2018 •

edited

Loading

0xdevalias commented Apr 3, 2018 •

edited

Loading

0xdevalias commented Apr 3, 2018 •

edited

Loading

0xdevalias commented Apr 4, 2018 •

edited

Loading

0xdevalias commented Jun 22, 2018 •

edited

Loading