Support for xml based third party metadata formats #96

NetherKing1357 · 2019-10-29T18:23:07Z

This issue is an offshoot of the discussion that began in the forum

Please provide support for reading metadata files written by other comic readers such as (but not limited to) ComicRack.

As far as ComicRack is concerned, metadata support would entail reading an .xml file that can be backed up to any location by the user or one stored within the comic file. The first is stored by default in C:/Users/%user%/AppData/Roaming/cYo/ComicRack/ComicDb.xml if direct import is to be supported by YAC.
Relevant links:
http://comicrack.cyolito.com/software/windows/windows-documentation/7-meta-data-in-comic-files
http://comicrack.cyolito.com/forum/8-help/26757-where-is-the-metadata-stored

The text was updated successfully, but these errors were encountered:

NetherKing1357 · 2019-10-29T18:30:03Z

Some relevant comments on the forum:

[quote="matthew" post=2058]
Luis, here are the XML tags currently supported by ComicRack:

<ComicInfo xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<Title>Hope And Glory - Part II: Bitter Beginnings</Title>
	<Series>Ninjak</Series>
	<Number>3</Number>
	<Count>6</Count>
	<Volume>1994</Volume>
	<StoryArc>Arthur</StoryArc>
	<SeriesGroup>Islands</SeriesGroup>
	<Summary>The secret origin of Ninjak continues!</Summary>
	<Notes>Scraped metadata from ComicVine [CVDB141693].</Notes>
	<Year>1995</Year>
	<Month>6</Month>
	<Day>24</Day>
	<Writer>Mark Moretti</Writer>
	<Penciller>Bob McLeod, Mark Moretti</Penciller>
	<Inker>Bob McLeod, Dick Giordano</Inker>
	<Colorist>Kathryn Bolinger</Colorist>
	<Letterer>Bob McLeod, Dick Giordano</Letterer>
	<CoverArtist>Bob McLeod, Kathryn Bolinger, Mark Moretti</CoverArtist>
	<Editor>Bob Layton</Editor>
	<Publisher>Valiant</Publisher>
	<Imprint>Aircel Publishing</Imprint>
	<Genre>Action, Fantasy</Genre>
	<Web>http://www.comicvine.com/ninjak-00-hope-and-glory-part-ii-bitter-beginnings/4000-141693/</Web>
	<PageCount>35</PageCount>
	<LanguageISO>en</LanguageISO>
	<Format>Director's Cut</Format>
	<AgeRating>Mature 17+</AgeRating>
	<BlackAndWhite>No</BlackAndWhite>
	<Manga>No</Manga>
	<Characters>Crimson Dragon, Dr. Silk, Fitzhugh, Iwatsu, Michiko Okubo, Neville Alcott, Ninjak, Senator Yusaku Okubo</Characters>
	<Teams>X-Men</Teams>
	<Locations>California, England, Japan, London, Tokyo</Locations>
	<Pages>
		<Page Image="0" ImageSize="568730" ImageWidth="1280" ImageHeight="1977" Type="FrontCover" />
		<Page Image="1" ImageSize="709786" ImageWidth="1280" ImageHeight="1995" />
	</Pages>
</ComicInfo>

[/quote]

[quote="selmf" post=4883]
Since this is requested regularly I'd like to point out a few things that can be done to speed things up a little. If we want to implement metadata import, we roughly have this todo list:

[ol]
[li]Research the format specification for all metadata files we want to support[/li]
[li]Compare the available metadata entries with YACReader's available database entries[/li]
[li]Map foreign metadata to YACReader's metadata, decide what to do with edge cases[/li]
[li]Aquire a set of example files that are [b]fully tagged[/b] in [u]all[/u] metadata format and legal (not pirated!!!) comics[/li]
[li]Add metadata detection to our library and comic routines[/li]
[li]Run tests to make sure it is working correctly[/li]
[li]Write some basic import routines for the most important tags[/li]
[li]Add logic to handle edge cases like multiple metadata files present and other stuff[/li]
[li]Finetune our import dialog to make all options available[/li]
[/ol]

As you can see this is a feature that isn't implemented quickly. If you want to help out, you can create a bug on our Github page and start working on collecting the info that is needed to actually start the task.

[/quote]

[quote="Luis Ángel" post=4884]
To that list I would add an option to re-scan the comics in a library for metada (posibliy add an option to do it for a folder or a spedific file). Once this is implemented people will want the metadata available for the comics already in the library.

Some help with this would be great, anyone?
[/quote]

selmf · 2019-10-29T20:31:54Z

A first issue I am seeing is that the way we manage libraries is placing our data in a hidden directory in the root directory of the collection in question. That does not really align very well with the concept of a central xml file to "rule them all", so we will have to think about how to handle this or if we're going to handle this at all.
There is also no info on the structure of this database, other than "xml snippets" or "one huge xml file".

Another issue is that the way per-file metadata is stored is not consistent. Sometimes it is in the archives, sometimes not, it might even be "hidden" using special NTFS filesystem features. Supporting all of these variants probably doesn't make sense.

Metadata format seems to be roughly what ComicVine is giving us (@luisangelsm is that more or less correct?) so mapping should be possible.

We also still need some test files. If anyone is interested, Pepper and Carrot is a great open source web comic we have used for testing and showcase purposes in the past, so you could grab a cbz of it and tag it via ComicRack.

NetherKing1357 · 2019-11-01T11:36:14Z

Based on you response in the forum, I guess we could begin attempts for support with the .xml files stored within CBZ and CB7 files.
I've attached a zip file with a CBZ within. This is a comic file with every entry in the CR metadata editor filled in.

peppercarrot_episode01.zip

The following entries have no information stored in the .xml file:

Rating
Community Rating
Series Complete
Proposed Values
Tags
Review
Characters

This is the content of the .xml file:

<?xml version="1.0"?>
<ComicInfo xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Title>Episode 1</Title>
  <Series>Pepper and Carrot</Series>
  <Number>1</Number>
  <Count>23</Count>
  <Volume>1</Volume>
  <AlternateSeries>Pepper and Carrot</AlternateSeries>
  <AlternateNumber>1</AlternateNumber>
  <StoryArc>None</StoryArc>
  <SeriesGroup>Pepper and Carrot</SeriesGroup>
  <AlternateCount>23</AlternateCount>
  <Summary>This is an open source comic. I have added this information to understand how ComicRack adds metadata to comic files.</Summary>
  <Notes>This is an open source comic. I have added this information to understand how ComicRack adds metadata to comic files.</Notes>
  <Year>2017</Year>
  <Month>3</Month>
  <Day>6</Day>
  <Writer>David Revoy</Writer>
  <Penciller>David Revoy</Penciller>
  <Inker>David Revoy</Inker>
  <Colorist>David Revoy</Colorist>
  <Letterer>David Revoy</Letterer>
  <CoverArtist>David Revoy</CoverArtist>
  <Editor>David Revoy</Editor>
  <Publisher>David Revoy</Publisher>
  <Imprint>David Revoy</Imprint>
  <Genre>Web Comic</Genre>
  <Web>https://archive.org/details/peppercarrot-en</Web>
  <PageCount>4</PageCount>
  <LanguageISO>en</LanguageISO>
  <Format>Web Comic</Format>
  <AgeRating>Everyone</AgeRating>
  <BlackAndWhite>No</BlackAndWhite>
  <Manga>No</Manga>
  <Characters>Pepper, Carrot</Characters>
  <Teams>Pepper and Carrot</Teams>
  <Locations>Carrotland</Locations>
  <ScanInformation>Internet Archive HTML5 Uploader 1.6.3</ScanInformation>
  <Pages>
    <Page Image="0" ImageSize="346512" ImageWidth="992" ImageHeight="1373" Type="FrontCover" />
    <Page Image="1" ImageSize="348534" ImageWidth="992" ImageHeight="1373" />
    <Page Image="2" ImageSize="244617" ImageWidth="992" ImageHeight="1373" />
    <Page Image="3" ImageSize="184320" ImageWidth="720" ImageHeight="177" />
  </Pages>
</ComicInfo>

Below are screenshots of the editor itself with all entries filled in. Web alone has been filled in later, and has a entry in the .xml file.

Every file scraped by cbnack's ComicRack ComicVine scraper has the following information appended:

Web has a link to the ComicVine entry for that issue
Either Tags or Notes has this message: Scraped metadata from ComicVine [CVDBxxxxxx].

Example: If Immortal Hulk, issue 14 were scraped:

<Notes>Scraped metadata from ComicVine [CVDB702466].</Notes>
<Web>https://comicvine.gamespot.com/the-immortal-hulk-14-we-only-meet-at-funerals/4000-702466/</Web>

If all else fails, we can use this information to recursively run the YAC scraper for all the files.

I would need some documentation on the way YACReader stores metadata info to compile a map of CR to YAC tags. Could anyone point me in that direction?

selmf · 2019-11-01T11:45:14Z

YACReaderLibrary stores its metadata in a hidden directory called .yacreaderlibrary which contains a directory with covers and a database file called library.db.
You can use https://sqlitebrowser.org/ to open this file and inspect the entries. For any questions related to the format in general, you will need to ask @luisangelsm - the database is his ~~mess~~ speciality and I have successfully avoided working on it until now.

NetherKing1357 · 2019-11-03T09:52:48Z

I've done a basic mapping. Please take a look and let me know if I've got anything wrong.

mapping.xlsx

selmf · 2019-11-03T11:41:15Z

Thanks for taking the time to do this. This should be enough for me to writing a first draft for an importer. I still need to do some investigations on my own to see for which technical option to support XML in general we should opt and I will need to discuss this technical decision with @luisangelsm to get his input and OK on it.
We might also use this opportunity to take a closer look at our own library metadata and maybe do some improvements on it.

luisangelsm · 2021-09-26T21:13:45Z

@NetherKing1357 Thanks for all the resources and research, it has been really useful.

It still needs some work, but it is looking good so far.

selmf changed the title ~~Metadata compatibility support~~ Support for xml based third party metadata formats Oct 29, 2019

selmf added enhancement New feature or request help wanted Extra attention is needed labels Oct 29, 2019

selmf mentioned this issue Nov 6, 2019

Support for adding OPDS catalogs as libraries #98

Open

selmf mentioned this issue Jan 12, 2020

A small list of features I would like, with request of advice for what would be easiest to tackle for a newcomer #109

Open

selmf mentioned this issue Jun 14, 2020

YACReader EPUB support #140

Open

selmf modified the milestones: YACReader 10, YACReader 9.7 Jul 9, 2020

luisangelsm mentioned this issue Sep 27, 2021

Feature: support for third party xml info import #276

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for xml based third party metadata formats #96

Support for xml based third party metadata formats #96

NetherKing1357 commented Oct 29, 2019

NetherKing1357 commented Oct 29, 2019

selmf commented Oct 29, 2019

NetherKing1357 commented Nov 1, 2019

selmf commented Nov 1, 2019

NetherKing1357 commented Nov 3, 2019

selmf commented Nov 3, 2019

luisangelsm commented Sep 26, 2021

Support for xml based third party metadata formats #96

Support for xml based third party metadata formats #96

Comments

NetherKing1357 commented Oct 29, 2019

NetherKing1357 commented Oct 29, 2019

selmf commented Oct 29, 2019

NetherKing1357 commented Nov 1, 2019

selmf commented Nov 1, 2019

NetherKing1357 commented Nov 3, 2019

selmf commented Nov 3, 2019

luisangelsm commented Sep 26, 2021