Skip to content
This repository has been archived by the owner on Apr 26, 2022. It is now read-only.

Official community for EPUB? #103

Open
hmltn-0 opened this issue Jan 8, 2022 · 4 comments
Open

Official community for EPUB? #103

hmltn-0 opened this issue Jan 8, 2022 · 4 comments

Comments

@hmltn-0
Copy link

hmltn-0 commented Jan 8, 2022

I’d like to write a script to render EPUB with Python.

Is there an official EPUB community mail list where I can learn more about how EPUBs are rendered?

Thank you very much

@GeorgeKerscher
Copy link

Hello, This issue was sent to the Publishing at W3C Community Group. I would think that all you need to do is to join that community group and we could then help you in your efforts.

Best
George

@dauwhe
Copy link
Contributor

dauwhe commented Jan 10, 2022

Perhaps you could tell us more about what you want to do? EPUB consists largely of HTML, so rendering an EPUB would involve presenting HTML to the end user.

@hmltn-0
Copy link
Author

hmltn-0 commented Jan 10, 2022

Thanks for your message.

I want to write my own Python script which extracts the text from an EPUB.

To do this I need to understand how the files are arranged. Are they in order in the directory?

Is it as simple as getting all “p” and “h” tags or are there certain kinds of tags that contain text and certain ones that do not?

Thank you

@dauwhe
Copy link
Contributor

dauwhe commented Jan 10, 2022

To do this I need to understand how the files are arranged. Are they in order in the directory?

Not necessarily. The components of the EPUB and their order are described in the XML package file. A basic introduction to EPUB can be found in the overview.

The spec defining the file format is here

Is it as simple as getting all “p” and “h” tags or are there certain kinds of tags that contain text and certain ones that do not?

I would highly recommend using an HTML parsing library in Python. I have personal experience with Beautiful Soup.

Good luck!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants