Handling non-ascii charset for the response of api call #841

deskmonster · 2023-02-05T10:10:16Z

What do you want to happen?
I dont know if this is just I don't set something correctly but hydra's response for api call can't handle Non-ascii charset.
As a result, the language such as Japanese and Chinese would be corrupted in the response of the api call (at least for torznab: http://localhost:443/nzbhydra/torznab/api?t=tvsearch&cat=5030,5040,5000&extended=1&apikey=(removed)&offset=0&limit=100&q=Three%20Body&season=1).

Sat, 21 Jan 2023 16:26:00 +0000 3516424851832493469 https://nyaa.si/view/1627605 Anime 461688000 [GM-Team][��][��][The Three-Body Problem][2022][08][AVC][GB][1080P] https://localhost:443/nzbhydra/gettorrent/api/xxxxx?apikey=xxxxx Sat, 21 Jan 2023 16:26:00 +0000 xxxxxhttps://nyaa.si/view/1627605 Anime 461688000 [GM-Team][��][��][The Three-Body Problem][2022][07][AVC][GB][1080P]

If hydra can handle non-acsii charset for api call just as it handles for internal search, it would be great.

If not clear, why do you want it?
To identify release name more accurately
Do you think it's something only you need or something that might be popular?
popular, since no one likes garbled character

theotherp · 2023-02-05T10:21:13Z

Please provide a way for me to reproduce this.

deskmonster · 2023-02-05T11:04:29Z

Please provide a way for me to reproduce this.

add nyaa.is to jackett (The reason choose nyaa is that it's tracker for anime thus may contains many non-ascii character)
add jackett as an indexer to hydra, fill host with nyaa's torznab feed
call the Torznab API endpoint. forexample, this will scan a Chinese anime whose original title is "三体" and you will see some corrupted character.

http://localhost:443/nzbhydra/torznab/api?t=tvsearch&cat=5030,5040,5000&extended=1&apikey=(removed)&offset=0&limit=100&q=Three%20Body&season=1

theotherp · 2023-02-05T12:50:18Z

Works fine for me:

theotherp · 2023-02-05T12:51:24Z

Try making the request agains the actual instance instead of the reverse proxy.

deskmonster · 2023-02-05T13:04:52Z

using the ip adress directly but still..
tested on firefox edge chrome and sonarr

deskmonster · 2023-02-05T13:50:17Z

I'm using the docker image builded by linuxserver for the timebeing.
I set up a fresh install from https://github.com/theotherp/nzbhydra2/releases/tag/v5.1.1 , testd it again and got corrupted characters again. Both version of nzbhdra are v5.1.1.

my system is ubuntu 20.04.5 LTS amd64; also tested on Ubuntu 20.04.5 LTS arm64 for both docker and release

theotherp · 2023-02-05T14:02:24Z

Please post your debug infos zip.

theotherp · 2023-02-05T14:11:52Z

Nevermind, it's an issue with the docker image.

deskmonster · 2023-02-05T14:22:09Z

nzbhydra.log
here is the log if still on demand.

If it's an issue with the docker image, it's strange that when I use the binary directly, it's still corrupted.

theotherp · 2023-02-05T14:32:41Z

The problem is encoding related (obviously).
On my machine and on a self built docker the reported encoding is UTF-8. On the lsio container the reported encoding is ANSI_X3.4-1968.

That's the log, not the debug infos. You can create them in the system section. After you created them you should find an entry in the log saying "File encoding".

theotherp · 2023-02-05T14:38:42Z

See docker-nzbhydra2/issues/41

deskmonster · 2023-02-05T14:57:53Z

yes it's ANSI_X3.4-1968 in the debug infos both for docker and local. I'm going to find the language encoding setting.

Thank you for you kind help and opening the issue on lxio.
Have a nice day!

Fmajor · 2023-11-18T13:05:11Z

Same error using the latest docker image

# docker compose file
---
version: "2.1"
services:
  nzbhydra2:
    image: lscr.io/linuxserver/nzbhydra2:latest
    container_name: nzbhydra2
    environment:
      - PUID=297
      - PGID=297
      - TZ=Asian/Shanghai
    volumes:
      - ./config:/config
      - /backupfs/private/workspaces/nzb:/downloads
    ports:
      - 5076:5076
    restart: unless-stopped
# docker info
 Server Version: 23.0.3
 Kernel Version: 6.1.21-gentoo-x86_64
# docker compose
docker-compose version 1.29.2
# image info
26ccec62fff7   lscr.io/linuxserver/nzbhydra2:latest

I make test request using python

url_jackett = "http://172.17.0.1/jackett/api/v2.0/indexers/simpleanime/results/torznab/api?apikey={removed}&t=search&extended=1&q=Kusuriya%20no%20Hitorigoto%2001&password=1&cat=5000&limit=1000"
url_nzb = "http://127.0.0.1:5076/nzbhydra2/torznab/api?t=tvsearch&cat=5030%2C5040%2C5000&apikey={removed}&offset=0&limit=100&q=Kusuriya+no+Hitorigoto+01"

when query jackett directly, I get

  <item>
   <title>
    [天月搬運組] 藥師少女的獨語  Kusuriya no Hitorigoto  01 (NetFlix 1920x1080 AVC AAC MKV)
   </title>

but the xml result from nzbhydra is like

<item>
   <title>
    [���������������] ���������������������  Kusuriya no Hitorigoto  01 (NetFlix 1920x1080 AVC AAC MKV)
   </title>

If I repeat the search in web frontend (by click Repeat this search with all currently enabled indexers in the history search list), I can also get the right title [天月搬運組] 藥師少女的獨語 Kusuriya no Hitorigoto 01 (NetFlix 1920x1080 AVC AAC MKV) , this bug only affect the API.

I try to debug it by searching "File encoding" in the nzbhydra log files config/logs/nzbhydra2.log, but find nothing

How can I debug this encoding issue?

Fmajor · 2023-11-18T14:26:07Z

Issue resoved by using other docker image
From my tests, these docker images have the File encoding problem

    image: lscr.io/linuxserver/nzbhydra2:latest
    image: hotio/nzbhydra2:latest
    image: ghcr.io/hotio/nzbhydra2

All of them report File encoding: ANSI_X3.4-1968 (maybe they are built based on same base-container)

and this image works fine

    image: binhex/arch-nzbhydra2

I am not familiar with java, but I searched that the file.encoding property has to be specified as the JVM starts up;

So can we add a extra -Dfile.encoding=UTF-8 option to force java use this file encoding, no matter what base os we use?

theotherp · 2023-11-18T15:51:10Z

Thanks for the research. I'm not sure setting that properly actually fixes anything. I added it to the wrapper in a container of lscr.io/linuxserver/nzbhydra2:latest and the API results still return mangled content. The UI though does show the correct results. The reported encoding in the log is misleading, I think.

Can you verify that the results are shown properly in the hydra UI?

theotherp · 2023-11-18T16:08:04Z

Nevermind, found the issue.

See #841

theotherp · 2023-11-19T08:22:55Z

@Fmajor Please check newest image.

Fmajor · 2023-11-20T11:17:05Z

@Fmajor Please check newest image.

still have bug in

917400ade716   lscr.io/linuxserver/nzbhydra2:latest

do i use the right image?

theotherp · 2023-11-20T11:33:11Z

Sorry, had to pull that release, wait for next one.

theotherp · 2023-11-20T19:16:48Z

Should be fixed now.

Fmajor · 2023-11-21T04:10:46Z

bug fixed in container ee9bc2838785 lscr.io/linuxserver/nzbhydra2:latest

deskmonster added the enhancement label Feb 5, 2023

deskmonster closed this as completed Feb 5, 2023

theotherp added a commit that referenced this issue Nov 18, 2023

Write XMLs using UTF-8.

ed6f384

See #841

Fmajor mentioned this issue Dec 1, 2023

Handling non-ascii charset for Save torrent to black hole or send magnet link #904

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling non-ascii charset for the response of api call #841

Handling non-ascii charset for the response of api call #841

deskmonster commented Feb 5, 2023 •

edited

theotherp commented Feb 5, 2023

deskmonster commented Feb 5, 2023

theotherp commented Feb 5, 2023

theotherp commented Feb 5, 2023

deskmonster commented Feb 5, 2023

deskmonster commented Feb 5, 2023 •

edited

theotherp commented Feb 5, 2023

theotherp commented Feb 5, 2023

deskmonster commented Feb 5, 2023

theotherp commented Feb 5, 2023

theotherp commented Feb 5, 2023 •

edited

deskmonster commented Feb 5, 2023

Fmajor commented Nov 18, 2023 •

edited

Fmajor commented Nov 18, 2023

theotherp commented Nov 18, 2023

theotherp commented Nov 18, 2023

theotherp commented Nov 19, 2023

Fmajor commented Nov 20, 2023

theotherp commented Nov 20, 2023

theotherp commented Nov 20, 2023

Fmajor commented Nov 21, 2023

Handling non-ascii charset for the response of api call #841

Handling non-ascii charset for the response of api call #841

Comments

deskmonster commented Feb 5, 2023 • edited

theotherp commented Feb 5, 2023

deskmonster commented Feb 5, 2023

theotherp commented Feb 5, 2023

theotherp commented Feb 5, 2023

deskmonster commented Feb 5, 2023

deskmonster commented Feb 5, 2023 • edited

theotherp commented Feb 5, 2023

theotherp commented Feb 5, 2023

deskmonster commented Feb 5, 2023

theotherp commented Feb 5, 2023

theotherp commented Feb 5, 2023 • edited

deskmonster commented Feb 5, 2023

Fmajor commented Nov 18, 2023 • edited

Fmajor commented Nov 18, 2023

theotherp commented Nov 18, 2023

theotherp commented Nov 18, 2023

theotherp commented Nov 19, 2023

Fmajor commented Nov 20, 2023

theotherp commented Nov 20, 2023

theotherp commented Nov 20, 2023

Fmajor commented Nov 21, 2023

deskmonster commented Feb 5, 2023 •

edited

deskmonster commented Feb 5, 2023 •

edited

theotherp commented Feb 5, 2023 •

edited

Fmajor commented Nov 18, 2023 •

edited