Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Accessibility tags missing on download #552

Open
vikraman-alea-bah opened this issue Apr 24, 2024 · 32 comments
Open

[BUG] Accessibility tags missing on download #552

vikraman-alea-bah opened this issue Apr 24, 2024 · 32 comments
Labels
enhancement New feature or request

Comments

@vikraman-alea-bah
Copy link

Description

Accessibility tags allow blind and low vision screen reader users the ability to access the information on PDFs. On download, the PDF needs to preserve the accessibility tags (and their order), the alternative text in images, and the title of the document. Without the tags a blind screen reader user cannot access the information in a human-readable way (if at all) on a digital PDF.

Bug:

A PDF with accessibility tags that is flattened and downloaded using unipdf loses some accessibility features such as accessibility tags and alternative text for images

.

Expected Behavior

Accessibility tags (in order) and alt text for images are preserved on download

How to test

You can test if a PDF has accessibility tags a few ways.

  1. Use the free tool PAC
  2. Use the adobe acrobat reader free version to see if PDF is tagged (image attached)
  3. Use adobe acrobat pro paid version and run accessibility checker (image attached)

Attachments

Attached you'll find:

  • PDF with accessibility tags (accessibility-pdf.pdf)
  • the output of the file on download with the tags removed (output.pdf)
  • an image of where to find if the document is tagged with adobe acrobat free
  • an image of what tags look like in adobe acrobat pro

adobe acrobat pro

adobe-acrobat-pro

adobe acrobat free

adobe-acrobat-reader

pdf with accessibility tags

accessible-pdf.pdf

pdf downloaded using unipdf (with lost accessibility tags)

output.pdf

Code

  pdfReader, err :=pdf.NewPdfReader(bytes.NewReader(data))
    if err != nil {
        return nil, err
    }
 
    acroForm := pdfReader.AcroForm
    if acroForm == nil {
        return nil, errors.New("no form data present in pdf template")
    }
 
    w := &pdfFieldWriter{}
    w.SetFields(acroForm.Fields)
    w.LoadFieldOptions()
    truPdf := pdfcore.PdfObjectBool(true)
    acroForm.NeedAppearances = &truPdf
 
    w.Write(FieldOptionTypes.ReservationNumber, issuance.ReservationNumber)
    w.Write(FieldOptionTypes.Comment, issuance.Comment)
 
    // this returns pdf with lost accessibility tags
    return pdfReader.ToBytes(), nil
Copy link

Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/

@vikraman-alea-bah
Copy link
Author

We have a license

@ipod4g
Copy link

ipod4g commented Apr 25, 2024

Dear @vikraman-alea-bah

Thank you for providing such a detailed report on the issue. We have investigated it thoroughly and will get back to you with a response as soon as possible.

Thank you for your patience and understanding.

@3ace
Copy link

3ace commented May 9, 2024

Hi @vikraman-alea-bah

We have investigated this matter further and can confirm that UniPDF does not yet fully support accessibility features. However, we are committed to addressing this issue and have added it to our roadmap for future development. Our goal is to enhance the accessibility support for PDF documents.

Based on you attached document, we observed that some accessibility checks done in Adobe Acrobat are failing while others are passing. Nevertheless, we understand the importance of ensuring that every feature is properly implemented to generate documents with comprehensive accessibility support.

@vikraman-alea-bah
Copy link
Author

Thanks for looking into it @3ace! The most important thing is the preservation of the accessibility tags in order. Without that, we can't say the PDF is accessible.

Do you have a public facing roadmap that I can share with my team? PDF accessibility has become a requirement for us and it would be great if we could follow it on the roadmap to get an idea of when to expect it.

@3ace
Copy link

3ace commented May 21, 2024

@vikraman-alea-bah Continuing from our last update, we've tried to investigate what needs to be improved on to support generating accessibility documents.

We've started by fixing the form flattening process to preserve all accessibility information.

Currently, we are in the process of creating an internal roadmap and do not have a public-facing version available.

However, I can assure you that PDF accessibility is a priority for us.

@3ace
Copy link

3ace commented Jun 29, 2024

Hi @vikraman-alea-bah we have made some updates that should helps to preserve an existing accessibility features when processing existing PDF document. Here is the accessibility reports generated by Adobe Acrobat before and after the implementation.

The fix should be available with our next UniPDF release version 3.61.0 that should be released next month.

@vikraman-alea-bah
Copy link
Author

Thank you so much for this! I also checked it against adobe acrobat pro and all the accessibility tags were preserved and in order. Amazing work!!

@ipod4g ipod4g added the enhancement New feature or request label Jul 10, 2024
@vikraman-alea-bah
Copy link
Author

Hi @3ace I saw that the 3.61.0 got released, did these changes make it in? My team plans on integrating it soon.

@3ace
Copy link

3ace commented Jul 30, 2024

@vikraman-alea-bah yes, the update should have been included with the release. Let us know if you have further issue.

@3ace 3ace closed this as completed Aug 5, 2024
@vikraman-alea-bah
Copy link
Author

@3ace my team has been testing with the latest release and we are still seeing the issue where the accessibility tags are removed on download. We have to flatten the files so they cannot be edited after download. Just wondering if you tested with anything like that and if it worked for you.

@3ace
Copy link

3ace commented Aug 30, 2024

hi @vikraman-alea-bah thanks for reaching out. Is the document that has the issue is the same document attached above? If it's happening in other files, it would be great if you could attach it here.

We did test it using a flattening process using this code example https://github.com/unidoc/unipdf-examples/blob/master/forms/pdf_form_flatten.go

@3ace 3ace reopened this Aug 30, 2024
@vikraman-alea-bah
Copy link
Author

@3ace thank you for giving us this example so quickly! We used your code example to help us narrow down where we think the issue is. We successfully got the accessibility tags to stay after flattening in most scenarios. However, there may be a bug in the AddPage method.

Here is a code example of the two scenarios we tried that did not keep the accessibility tags:

 creator := creator.New()
	 err = creator.AddPage(page)
	 if err != nil {
	   return err
	 }

pdfWriter := model.NewPdfWriter()
  err = pdfWriter.AddPage(page)
  if err != nil {
    return err
  } 

@3ace
Copy link

3ace commented Sep 2, 2024

@vikraman-alea-bah just to clarify the flow, are you adding the page after or before flattening the document?

@vikraman-alea-bah
Copy link
Author

We add the pages after flattening. However, we tested after and before and in both cases it didn't include the accessibility tags.

@3ace
Copy link

3ace commented Sep 3, 2024

@vikraman-alea-bah thanks for the confirmation. We'll take a look at it

@3ace
Copy link

3ace commented Sep 5, 2024

@vikraman-alea-bah The issue you're experiencing is due to the current limitations in UniPDF. At the moment, UniPDF doesn't support adding new accessibility tags; it only support existing ones.

This means that when a new page without accessibility tags is added to a PDF document that already has them, the final document will end up with incomplete accessibility tags.

However, if the new page you're adding already includes accessibility tags, the resulting PDF should have a complete set of tags.

We are actively working on adding support for new accessibility tags and will notify you once it's ready.

@vikraman-alea-bah
Copy link
Author

Thank you for looking into that @3ace !

@Preziotti-Matthew-bah
Copy link

Good morning @3ace. Sorry for the back and forth on this one. Just wanted to make sure that we aren't doing something wrong on our end with that last post that @vikraman-alea-bah made.

When using that script that you all were using for testing https://github.com/unidoc/unipdf-examples/blob/master/forms/pdf_form_flatten.go it seemed like using the pdfWriter, err := pdfReader.ToWriter(opt) worked perfectly and the accessibility tags were persisted.

Now when we changed this up to using the Creator or PdfWriter packages to write the files for whatever reason the accessibility tags were then removed. After reading in a page with pdfReader. (from a pdf that had the tags)

Providing short example below.

page, err := pdfReader.GetPage(1)
if err != nil {
    return err
}

creator := creator.New()
err = creator.AddPage(page)
if err != nil {
    return err
}

err = creator.WriteToFile(outputPath)
if err != nil {
    return err
}

Just wanted to confirm that this should in theory result in the pdf being output to contain the original accessibility tags. Thanks!

@3ace
Copy link

3ace commented Sep 5, 2024

@Preziotti-Matthew-bah @vikraman-alea-bah there's a difference between using pdfReader.ToWriter and creating a new Creator and then add a page there.

ToWriter method work by copying the page to a PdfWriter object and also copies other metadata such as PDF version, PDF info, Catalog metadata, etc. This is one of the reason that we could keep some Accessibility information intact.

So if you would like to not use ToWriter method, you'll need to do something like this where you copy some information from source file to the target

func process(inputPath, outputPath string) error {
	f, err := os.Open(inputPath)
	if err != nil {
		return err
	}
	defer f.Close()

	pdfReader, err := model.NewPdfReader(f)
	if err != nil {
		return err
	}

	page, err := pdfReader.GetPage(1)
	if err != nil {
		return err
	}

	pdfWriter := model.NewPdfWriter()

	err = pdfWriter.AddPage(page)
	if err != nil {
		return err
	}

	// Copy PDF version.
	version := pdfReader.PdfVersion()
	pdfWriter.SetVersion(version.Major, version.Minor)

	// Copy PDF info.
	info, err := pdfReader.GetPdfInfo()
	if err != nil {
		common.Log.Debug("ERROR: %v", err)
	} else {
		pdfWriter.SetDocInfo(info)
	}

	// Copy Catalog Metadata.
	if meta, ok := pdfReader.GetCatalogMetadata(); ok {
		if err := pdfWriter.SetCatalogMetadata(meta); err != nil {
			return err
		}
	}

	// Copy catalog mark information.
	if markInfo, ok := pdfReader.GetCatalogMarkInfo(); ok {
		if err := pdfWriter.SetCatalogMarkInfo(markInfo); err != nil {
			return err
		}
	}

	// Copy AcroForm.
	err = pdfWriter.SetForms(pdfReader.AcroForm)
	if err != nil {
		common.Log.Debug("ERROR: %v", err)
		return err
	}

	// Copy viewer preferences.
	if pref, ok := pdfReader.GetCatalogViewerPreferences(); ok {
		if err := pdfWriter.SetCatalogViewerPreferences(pref); err != nil {
			return err
		}
	}

	// Copy language preferences.
	if lang, ok := pdfReader.GetCatalogLanguage(); ok {
		if err := pdfWriter.SetCatalogLanguage(lang); err != nil {
			return err
		}
	}

	// Copy document outlines.
	pdfWriter.AddOutlineTree(pdfReader.GetOutlineTree())

	// Copy OC Properties.
	props, err := pdfReader.GetOCProperties()
	if err != nil {
		common.Log.Debug("ERROR: %v", err)
	} else {
		err = pdfWriter.SetOCProperties(props)
		if err != nil {
			common.Log.Debug("ERROR: %v", err)
		}
	}

	// Copy page labels.
	labelObj, err := pdfReader.GetPageLabels()
	if err != nil {
		common.Log.Debug("ERROR: %v", err)
	} else {
		err = pdfWriter.SetPageLabels(labelObj)
		if err != nil {
			common.Log.Debug("ERROR: %v", err)
		}
	}

	// Copy named destinations.
	namedDest, err := pdfReader.GetNamedDestinations()
	if err != nil {
		common.Log.Debug("ERROR: %v", err)
	} else {
		err = pdfWriter.SetNamedDestinations(namedDest)
		if err != nil {
			common.Log.Debug("ERROR: %v", err)
		}
	}

	// Copy name dictionary.
	nameDict, err := pdfReader.GetNameDictionary()
	if err != nil {
		common.Log.Debug("ERROR: %v", err)
	} else {
		err = pdfWriter.SetNameDictionary(nameDict)
		if err != nil {
			common.Log.Debug("ERROR: %v", err)
		}
	}

	structTreeRoot, found := pdfReader.GetCatalogStructTreeRoot()
	if found {
		err := pdfWriter.SetCatalogStructTreeRoot(structTreeRoot)
		if err != nil {
			common.Log.Debug("ERROR: %v", err)
		}
	}

	// Copy global page rotation.
	if pdfReader.Rotate != nil {
		if err := pdfWriter.SetRotation(*pdfReader.Rotate); err != nil {
			common.Log.Debug("ERROR: %v", err)
		}
	}

	err = pdfWriter.WriteToFile(outputPath)
	if err != nil {
		return err
	}

	return nil
}

I hope this helps

@Preziotti-Matthew-bah
Copy link

@3ace Thank you for this, this is extremely insightful!

The main reason we are using the Creator is that we need to draw an image on the pdf (add a QR code). This code was written years ago and most of the examples I've seen use the Creator package. Would the best way forward be to continue using the Creator and then add the metadata the way you have done above? Or is there a better way to add an image using PdfWriter?

Thanks again for all the support.

@3ace
Copy link

3ace commented Sep 6, 2024

@Preziotti-Matthew-bah You can keep using Creator package since most of our functionality is available using that package (for example to add image to PDF), the PdfWriter package contains some other functionality so you could use whichever necessary depending on your need.

@Preziotti-Matthew-bah
Copy link

@3ace we have made some good progress on our end, but still have a couple issues tripping us up.

  1. Is there a way to duplicate a page's metadata? One thing we do on our end is duplicate a template page n number of times. When we copy over the metadata only the first page has the information (which makes sense). Is there a way we could copy this for n number of pages?
  2. Do you all have some documentation around the metadata methods used above? Would be helpful during our debugging and currently we can only see the packaged source code.

Thanks again.

@3ace
Copy link

3ace commented Sep 23, 2024

@Preziotti-Matthew-bah

  1. That metadata info I've mentioned above is document wide metadata so it shouldn't only affect first page. Which data do you have an issue with?
  2. We do have some documentation related to metadata here https://docs.unidoc.io/docs/unipdf/guides/metadata/overview/ but it might not cover all the functions used above.

@Preziotti-Matthew-bah
Copy link

Preziotti-Matthew-bah commented Sep 23, 2024

Hey @3ace,

Sorry probably didn't do a good job of phrasing question 1. Here is an example situation for that:

  1. We read in a template page (just a one page document)
  2. We duplicate that page n times to produce a pdf with n pages. We are looking for the metadata to be the same for all of these pages based off the original template, but the content will be different for each. (We are generating tickets in this example)
  3. Once the new pdf is created with the n number of pages we are looking to write the metadata from the original (one page) template and see that only the first page has the metadata. Looking to see how we can duplicate the metadata from the template and apply for all pages we add.

Hopefully that explains it a bit better. Thanks as well for the link to the documentation.

@3ace
Copy link

3ace commented Sep 23, 2024

Hi @Preziotti-Matthew-bah before we are continuing, my colleague should have sent invitation mail to @vikraman-alea-bah so that you could post a request trough our service desk board where we could better monitor the request and you might be feel more comfortable with sharing some information there.

@vikraman-alea-bah
Copy link
Author

thank you for all of your help @3ace it has really helped us in moving forward while we try to solve this issue.

Can you also please send the service desk link to @Preziotti-Matthew-bah he is the developer working on this so will be able to speak more technically about the issue.

@3ace
Copy link

3ace commented Sep 25, 2024

@Preziotti-Matthew-bah could you provide me the email address for the invitation email?

@Preziotti-Matthew-bah
Copy link

@3ace sure thing. That will go to Preziotti_Matthew@bah.com. Thanks!

@Preziotti-Matthew-bah
Copy link

@3ace has this been sent over already? Just want to make sure I haven't missed it.

@3ace
Copy link

3ace commented Sep 27, 2024

@Preziotti-Matthew-bah sorry that just see this reply. I saw a ticket created by you, so I guess you've received the invitation right?

@Preziotti-Matthew-bah
Copy link

Yeah we got all squared away 👍 Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants