Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for xml based third party metadata formats #96

Open
NetherKing1357 opened this issue Oct 29, 2019 · 7 comments
Open

Support for xml based third party metadata formats #96

NetherKing1357 opened this issue Oct 29, 2019 · 7 comments
Labels
enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@NetherKing1357
Copy link

This issue is an offshoot of the discussion that began in the forum

Please provide support for reading metadata files written by other comic readers such as (but not limited to) ComicRack.

As far as ComicRack is concerned, metadata support would entail reading an .xml file that can be backed up to any location by the user or one stored within the comic file. The first is stored by default in C:/Users/%user%/AppData/Roaming/cYo/ComicRack/ComicDb.xml if direct import is to be supported by YAC.
Relevant links:
http://comicrack.cyolito.com/software/windows/windows-documentation/7-meta-data-in-comic-files
http://comicrack.cyolito.com/forum/8-help/26757-where-is-the-metadata-stored

@NetherKing1357
Copy link
Author

Some relevant comments on the forum:

[quote="matthew" post=2058]
Luis, here are the XML tags currently supported by ComicRack:

<ComicInfo xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<Title>Hope And Glory - Part II: Bitter Beginnings</Title>
	<Series>Ninjak</Series>
	<Number>3</Number>
	<Count>6</Count>
	<Volume>1994</Volume>
	<StoryArc>Arthur</StoryArc>
	<SeriesGroup>Islands</SeriesGroup>
	<Summary>The secret origin of Ninjak continues!</Summary>
	<Notes>Scraped metadata from ComicVine [CVDB141693].</Notes>
	<Year>1995</Year>
	<Month>6</Month>
	<Day>24</Day>
	<Writer>Mark Moretti</Writer>
	<Penciller>Bob McLeod, Mark Moretti</Penciller>
	<Inker>Bob McLeod, Dick Giordano</Inker>
	<Colorist>Kathryn Bolinger</Colorist>
	<Letterer>Bob McLeod, Dick Giordano</Letterer>
	<CoverArtist>Bob McLeod, Kathryn Bolinger, Mark Moretti</CoverArtist>
	<Editor>Bob Layton</Editor>
	<Publisher>Valiant</Publisher>
	<Imprint>Aircel Publishing</Imprint>
	<Genre>Action, Fantasy</Genre>
	<Web>http://www.comicvine.com/ninjak-00-hope-and-glory-part-ii-bitter-beginnings/4000-141693/</Web>
	<PageCount>35</PageCount>
	<LanguageISO>en</LanguageISO>
	<Format>Director's Cut</Format>
	<AgeRating>Mature 17+</AgeRating>
	<BlackAndWhite>No</BlackAndWhite>
	<Manga>No</Manga>
	<Characters>Crimson Dragon, Dr. Silk, Fitzhugh, Iwatsu, Michiko Okubo, Neville Alcott, Ninjak, Senator Yusaku Okubo</Characters>
	<Teams>X-Men</Teams>
	<Locations>California, England, Japan, London, Tokyo</Locations>
	<Pages>
		<Page Image="0" ImageSize="568730" ImageWidth="1280" ImageHeight="1977" Type="FrontCover" />
		<Page Image="1" ImageSize="709786" ImageWidth="1280" ImageHeight="1995" />
	</Pages>
</ComicInfo>

[/quote]

[quote="selmf" post=4883]
Since this is requested regularly I'd like to point out a few things that can be done to speed things up a little. If we want to implement metadata import, we roughly have this todo list:

[ol]
[li]Research the format specification for all metadata files we want to support[/li]
[li]Compare the available metadata entries with YACReader's available database entries[/li]
[li]Map foreign metadata to YACReader's metadata, decide what to do with edge cases[/li]
[li]Aquire a set of example files that are [b]fully tagged[/b] in [u]all[/u] metadata format and legal (not pirated!!!) comics[/li]
[li]Add metadata detection to our library and comic routines[/li]
[li]Run tests to make sure it is working correctly[/li]
[li]Write some basic import routines for the most important tags[/li]
[li]Add logic to handle edge cases like multiple metadata files present and other stuff[/li]
[li]Finetune our import dialog to make all options available[/li]
[/ol]

As you can see this is a feature that isn't implemented quickly. If you want to help out, you can create a bug on our Github page and start working on collecting the info that is needed to actually start the task.

[/quote]

[quote="Luis Ángel" post=4884]
To that list I would add an option to re-scan the comics in a library for metada (posibliy add an option to do it for a folder or a spedific file). Once this is implemented people will want the metadata available for the comics already in the library.

Some help with this would be great, anyone?
[/quote]

@selmf selmf changed the title Metadata compatibility support Support for xml based third party metadata formats Oct 29, 2019
@selmf selmf added enhancement New feature or request help wanted Extra attention is needed labels Oct 29, 2019
@selmf
Copy link
Member

selmf commented Oct 29, 2019

A first issue I am seeing is that the way we manage libraries is placing our data in a hidden directory in the root directory of the collection in question. That does not really align very well with the concept of a central xml file to "rule them all", so we will have to think about how to handle this or if we're going to handle this at all.
There is also no info on the structure of this database, other than "xml snippets" or "one huge xml file".

Another issue is that the way per-file metadata is stored is not consistent. Sometimes it is in the archives, sometimes not, it might even be "hidden" using special NTFS filesystem features. Supporting all of these variants probably doesn't make sense.

Metadata format seems to be roughly what ComicVine is giving us (@luisangelsm is that more or less correct?) so mapping should be possible.

We also still need some test files. If anyone is interested, Pepper and Carrot is a great open source web comic we have used for testing and showcase purposes in the past, so you could grab a cbz of it and tag it via ComicRack.

@NetherKing1357
Copy link
Author

Based on you response in the forum, I guess we could begin attempts for support with the .xml files stored within CBZ and CB7 files.
I've attached a zip file with a CBZ within. This is a comic file with every entry in the CR metadata editor filled in.

peppercarrot_episode01.zip

The following entries have no information stored in the .xml file:

  • Rating
  • Community Rating
  • Series Complete
  • Proposed Values
  • Tags
  • Review
  • Characters

This is the content of the .xml file:

<?xml version="1.0"?>
<ComicInfo xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Title>Episode 1</Title>
  <Series>Pepper and Carrot</Series>
  <Number>1</Number>
  <Count>23</Count>
  <Volume>1</Volume>
  <AlternateSeries>Pepper and Carrot</AlternateSeries>
  <AlternateNumber>1</AlternateNumber>
  <StoryArc>None</StoryArc>
  <SeriesGroup>Pepper and Carrot</SeriesGroup>
  <AlternateCount>23</AlternateCount>
  <Summary>This is an open source comic. I have added this information to understand how ComicRack adds metadata to comic files.</Summary>
  <Notes>This is an open source comic. I have added this information to understand how ComicRack adds metadata to comic files.</Notes>
  <Year>2017</Year>
  <Month>3</Month>
  <Day>6</Day>
  <Writer>David Revoy</Writer>
  <Penciller>David Revoy</Penciller>
  <Inker>David Revoy</Inker>
  <Colorist>David Revoy</Colorist>
  <Letterer>David Revoy</Letterer>
  <CoverArtist>David Revoy</CoverArtist>
  <Editor>David Revoy</Editor>
  <Publisher>David Revoy</Publisher>
  <Imprint>David Revoy</Imprint>
  <Genre>Web Comic</Genre>
  <Web>https://archive.org/details/peppercarrot-en</Web>
  <PageCount>4</PageCount>
  <LanguageISO>en</LanguageISO>
  <Format>Web Comic</Format>
  <AgeRating>Everyone</AgeRating>
  <BlackAndWhite>No</BlackAndWhite>
  <Manga>No</Manga>
  <Characters>Pepper, Carrot</Characters>
  <Teams>Pepper and Carrot</Teams>
  <Locations>Carrotland</Locations>
  <ScanInformation>Internet Archive HTML5 Uploader 1.6.3</ScanInformation>
  <Pages>
    <Page Image="0" ImageSize="346512" ImageWidth="992" ImageHeight="1373" Type="FrontCover" />
    <Page Image="1" ImageSize="348534" ImageWidth="992" ImageHeight="1373" />
    <Page Image="2" ImageSize="244617" ImageWidth="992" ImageHeight="1373" />
    <Page Image="3" ImageSize="184320" ImageWidth="720" ImageHeight="177" />
  </Pages>
</ComicInfo>

Below are screenshots of the editor itself with all entries filled in. Web alone has been filled in later, and has a entry in the .xml file.

CopyQ vU5648
CopyQ ba5648
CopyQ Gy5648

Every file scraped by cbnack's ComicRack ComicVine scraper has the following information appended:

  • Web has a link to the ComicVine entry for that issue
  • Either Tags or Notes has this message: Scraped metadata from ComicVine [CVDBxxxxxx].

Example: If Immortal Hulk, issue 14 were scraped:

<Notes>Scraped metadata from ComicVine [CVDB702466].</Notes>
<Web>https://comicvine.gamespot.com/the-immortal-hulk-14-we-only-meet-at-funerals/4000-702466/</Web>

If all else fails, we can use this information to recursively run the YAC scraper for all the files.

I would need some documentation on the way YACReader stores metadata info to compile a map of CR to YAC tags. Could anyone point me in that direction?

@selmf
Copy link
Member

selmf commented Nov 1, 2019

YACReaderLibrary stores its metadata in a hidden directory called .yacreaderlibrary which contains a directory with covers and a database file called library.db.
You can use https://sqlitebrowser.org/ to open this file and inspect the entries. For any questions related to the format in general, you will need to ask @luisangelsm - the database is his mess speciality and I have successfully avoided working on it until now.

@NetherKing1357
Copy link
Author

I've done a basic mapping. Please take a look and let me know if I've got anything wrong.

mapping.xlsx

@selmf
Copy link
Member

selmf commented Nov 3, 2019

Thanks for taking the time to do this. This should be enough for me to writing a first draft for an importer. I still need to do some investigations on my own to see for which technical option to support XML in general we should opt and I will need to discuss this technical decision with @luisangelsm to get his input and OK on it.
We might also use this opportunity to take a closer look at our own library metadata and maybe do some improvements on it.

@luisangelsm
Copy link
Member

@NetherKing1357 Thanks for all the resources and research, it has been really useful.

It still needs some work, but it is looking good so far.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants