Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New bug in ApplyTemplate method after V2.1.0 released #445

Open
podolsky-v opened this issue Sep 27, 2022 · 4 comments
Open

New bug in ApplyTemplate method after V2.1.0 released #445

podolsky-v opened this issue Sep 27, 2022 · 4 comments

Comments

@podolsky-v
Copy link

podolsky-v commented Sep 27, 2022

Hello,
My old code stopped working after I updated nuget package from V2.0.0 to V2.3.0. The reason is that ApplyTemplate( Stream templateStream, bool includeContent ) method now throws an System.InvalidOperationException when includeContent is set to false because of those reasons:

 if( !includeContent )
        {
          foreach( Paragraph paragraph in this.Paragraphs )
          {
            paragraph.Remove( false );
          }
        }

In this part of ApplyTemplate method paragraph.Remove( false ); modifies Paragraphs collection, and this causes foreach-loop to break. This is happening because in V2.1.0 (commit 9c431afb) realization of Paragraphs collection was changed and now there is underlying List<Paragraph> _editableParagraphsCollection list which is modified when paragraph.Remove( false ); calls RemoveParagraphFromCache( paragraph );

The code to reproduce the bug is as simple as that:

using (var document = DocX.Create("test.docx"))
            {
                using (var templateStream = new FileStream("template.docx", FileMode.Open, FileAccess.Read))
                {
                    document.ApplyTemplate(templateStream, false);
                }
                document.Save();
            }

where template.docx is any docx created in MS Word, because even empty document includes one empty paragraph. I'm not sure which is the most accurate way to fix this issue.

@podolsky-v
Copy link
Author

Also, there is old bug in this method - whereas Paragraphs collection is cleaned, Tables and Images are left intact, which may cause some content of template to be present in resulting document despite ApplyTemplate is called with includeContent set to false. I've got around this bug by manually cleaning these collections before applying template.

@XceedBoucherS
Copy link
Collaborator

Hello,
Thank you for your feedback on those 2 issues.

  1. InvalidOperationException on Paragraph.Remove().

An easy fix could be to use a copy of the Paragraphs, by replacing the for loop with:
"foreach( Paragraph paragraph in this.Paragraphs.ToList() )"

  1. Paragrah removal keeps Tables and Images

In Container.RemoveParagraph(), the call to "paragraph.Xml.Remove()" should remove any images contained in a paragraph. On the other hand, tables are currently not removed because they are located just after their associated paragraph in the ooxml.

Here's what you could do in Container.RemoveParagraph() to remove the paragraph's tables:
if( paragraph.FollowingTables != null )
{
foreach( var table in paragraph.FollowingTables.ToList() )
{
paragraph.FollowingTables.Remove( table );

          table.Remove();
        }
      }

Those 2 issues will be fixed in v2.5.
Thank you

@podolsky-v
Copy link
Author

Thanks for the feedback, @XceedBoucherS.
Yes, images from template docx are removed from visible content when includeContent is false, but resulting document Images collection still contains images from template and resulting docx contains actual image files in .docx\word\media and refs for them in .docx_rels\document.xml. This can make resulting file unexpectedly large (if template has a lot of images) and even possibly expose sensitive data from the template.
Sorry for not making it clear in the original comment.

@XceedBoucherS
Copy link
Collaborator

Hi @podolsky-v ,
I see what you mean about images. You are right !

This will be fixed in v2.5.

In the meantime, you can go in Container.RemoveParagraph() and add:
foreach( var picture in paragraph.Pictures )
{
picture.Remove();
}
You can then replace the Picture.Remove() method with:
public void Remove()
{
this.Xml.Remove();
_img.Remove();
}
And then replace the Image.Remove() method with this:
public void Remove()
{
// No more of this image in the Document.
if( !_document.Pictures.Any( picture => picture.Id == this.Id ) )
{
if( _pr.Package != null )
{
var uriString = _pr.TargetUri.OriginalString;
if( !uriString.StartsWith( "/" ) )
{
uriString = "/" + uriString;
}
if( !uriString.StartsWith( "/word/" ) )
{
uriString = "/word" + uriString;
}

      var uri = new Uri( uriString, UriKind.Relative );

      _pr.Package.DeletePart( uri );
    }

    if( _document.PackagePart != null )
    {
      _document.PackagePart.DeleteRelationship( _id );
    }
  }
}

Thank you for your feedback !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants