Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snap() captures whole page instead of element - can't replicate in Mac/Ubuntu #513

Closed
yucongshub opened this issue Jan 11, 2024 · 13 comments
Labels

Comments

@yucongshub
Copy link

Hi! I am using RPA to capture the verification code image. It has always been normal when running on Windows. However, when I migrated the code to Linux for execution, I found that it did not capture the verification code image, but the entire page. Below is the test situation, Xpath matches it correctly, and I use r.click() it refreshes a new verification code, but if I use r.snap(), it does not screenshot the element, but the entire page.
P)XC_JEAP 4E52AJ SZ4K6C
View image 111.png, it is the entire page
image

My rpa and python versions are:
`
r.version
'1.50.0'

[root@wc-wuh-13-1-25-new zfw]# python3 -V
Python 3.10.10
`

I tried changing the chrome versions to 120, 108 and 73, but the problem still didn't solve. What could be the reason? How can I debug this?

@yucongshub
Copy link
Author

yucongshub commented Jan 11, 2024

I tested the site "baidu.com" and reproduced the problem.

Linux system:
r.snap('//img[ @id="s_lg_img" ]','222.png')
the result is the entire web page
19)B7SACFT4IJPEQQ@EYXPW

Windows system:
r.snap('//img[ @id="s_lg_img" ]','222.png')
it is the correct element
GL2 LYVB7F28}3W4$Q3}Z9R

@kensoh
Copy link
Member

kensoh commented Jan 14, 2024

Hi @yucongshub thanks for raising this! The backend API only returns what Chrome returns. So my initial hunch is there might be some change or bug in Chrome in Linux that could be the root cause. And this is the first report of such issue.

I've tried replicating on Mac and it is also working fine -

RPA:Desktop kensoh$ python3
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21) 
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import rpa as r
>>> r.init()
True
>>> r.url('www.baidu.com')
True
>>> r.snap('//img[ @id="s_lg_img" ]','222.png')
True

image

I have a Linux VPS but it is an old CentOS so I can't run Chrome there. Can you share with me the log file after running the snap() step from your Linux machine? I think you can directly attach the file here. It can be found by print(r.tagui_location()) and then go to .tagui/src/tagui_chrome.log

That is the transaction log between the TagUI engine and Chrome browser through websocket connection. So it contains the exact and entire low-level information returned from Chrome.

@kensoh
Copy link
Member

kensoh commented Jan 14, 2024

Adding on @yucongshub - I've tried using the Google Colab notebook example (which runs on Ubuntu) and it appears to be working correctly. The notebook example link is the following and found on README.md page of this repo -

https://colab.research.google.com/drive/1or8DtXZP8ZxJYK52me0dA6O9A1dXKKOE?usp=sharing

image

Using the following lines, I have downloaded the log file and attaching it below for comparison later.

log_file = r.tagui_location() + '/.tagui/src/tagui_chrome.log'
from google.colab import files; files.download(log_file)

tagui_chrome.log

@kensoh kensoh changed the title snap() not working - captures whole page instead of web element snap() captures whole page instead of web element - can't replicate in Mac and Ubuntu Jan 14, 2024
@kensoh kensoh added the bug label Jan 14, 2024
@kensoh kensoh changed the title snap() captures whole page instead of web element - can't replicate in Mac and Ubuntu snap() captures whole page instead of element - can't replicate in Mac/Ubuntu Jan 14, 2024
@yucongshub
Copy link
Author

Hi @kensoh , I tested again and provided the previous and this time log files. I saw that the difference between the log files you provided is that there are no pixel coordinates in Page.captureScreenshot?

tagui_chrome.log.2024.1.15.log
tagui_chrome.log.2024.1.11.log

@yucongshub
Copy link
Author

@kensoh I did some tests, using the coordinates returned in the tagui log, I used the following code to screenshot and it works fine on both Windows and Centos.

import rpa as r
import json
from websocket import create_connection
import base64
import requests

r.init()
r.url('https://baidu.com')
r.wait(2)

res = requests.get('http://localhost:9222/json').content
# print(json.loads(res.decode('utf-8'))[0]['webSocketDebuggerUrl'])


wsconn = json.loads(res.decode('utf-8'))[0]['webSocketDebuggerUrl']
ws = create_connection(wsconn,suppress_origin=True)

# r.snap('//img[ @id=\"s_lg_img\" ]','1111.png')

# centos
# [12] {"id":12,"result":{"result":{"type":"object","value":{"top":44,"left":526,"width":270,"height":129}}}}

# windows
# [8] {"id":8,"result":{"result":{"type":"object","value":{"top":44,"left":540,"width":270,"height":129}}}}

# do snap
# [9] {"id":9,"method":"Page.captureScreenshot","params":{"format":"png","quality":80,"clip":{"x":548,"y":44,"width":270,"height":129,"scale":1},"fromSurface":true}}

request = {}
request['id'] = 1
request['method'] = 'Page.captureScreenshot'
request['params'] = {"format":"png","quality":80,"clip":{"x":526,"y":44,"width":270,"height":129,"scale":1},"fromSurface":True}
ws.send(json.dumps(request))
result = ws.recv()
png = json.loads(result)['result']['data']
rpng = base64.b64decode(png)
# print(base64.b64decode(rpng))
open('testbaidu.png','wb').write(rpng)
ws.close()

r.wait(10)

How to continue troubleshooting this problem?

@kensoh
Copy link
Member

kensoh commented Feb 4, 2024

Hi @yucongshub can you tell me more what do you mean? Can you share more on what is the difference between on your Linux and the example using Google Colab Ubuntu? Is it your Linux doesn't return coordinates and the Google Colab Ubuntu returns?

If on your Linux there is no coordinates returned, then you can't use workaround to capture the screenshot. Unless you use another system to first collect the coordinates and hard code the solution on your Linux system. Otherwise directly on Linux you don't have those coordinates to use the workaround.

What is your Linux and version? I suspect it might be some Linux/Chrome edge case issue that is hard to replicate on other Linux/Chrome. The solution might be to use the one you shared above, collect the coordinates on another machine and use them to do the screen shot on your Linux/Chrome.

@yucongshub
Copy link
Author

Hi @kensoh , The Linux operating environment I use is centos7.9. I tested it in Windows, google colab, and centos7.9. Windows and google colab returned the correct element screenshots, but centos7.9 was wrong. Its coordinates and screenshot codes The execution order seems different from windows and google colab. How can I debug the order of obtaining element coordinates and taking screenshots? I want to compare and see how windows and centos7.9 run. I have provided the log files of the three, you can check from the logs.
tagui_chrome.log.google-colab.log
tagui_chrome.log.windows.log
tagui_chrome.log.centos7.log

@yucongshub
Copy link
Author

Hi @kensoh , I recently changed my laptop system to ubuntu 24.04, I tested this problem again and it also recurred. I checked the logs and found something strange. The base64 encoding of the image is first output in the log, and then the coordinates are printed. This order is opposite to the windows and google-colab I provided before, and is the same as centos7 that also has the same problem. Is this the cause of this problem? ?

In addition, my centos7 and ubuntu24 have very few programs installed, and it is relatively clean, so I don’t think this is an accident, related to a specific third-party program or something like that.

tagui_chrome.log.ubuntu24.log

@kensoh
Copy link
Member

kensoh commented Aug 8, 2024

Hi @yucongshub I think you have narrowed down and found the issue. I suspect the most likely reason is somehow certain Linux version, the way the websocket calls are made are somehow not in order. Below is the code which explicitly makes a call to get the bounding rectangle (chrome.getRect(selector))before making a call (chrome_step('Page.captureScreenshot',) to read the image in base64. So it is very puzzling why in your logs for some OS/version the order of the calls made is reversed.

Is this still an issue for you? I can't explain above and can't replicate on my macOS. I'm trying to see if there is another way to getting to the outcome you want. This seems like some low-level OS-level middleware-level issue.

chrome.captureSelector = function(filename,selector) { // capture screenshot of selector to png/jpg/jpeg format
chrome.scrollIntoViewIfNeeded(selector); // adjust to work for new Chromium behaviour and with absolute xy
var selector_rect = chrome.getRect(selector); if (selector_rect.width > 0 && selector_rect.height > 0)
{var format = 'png'; var quality = 80; var fromSurface = true; var screenshot_data = ''; // options not implemented
if ((filename.substr(-3).toLowerCase() == 'jpg') || (filename.substr(-4).toLowerCase() == 'jpeg')) format = 'jpeg';
var clip = {x:selector_rect.left, y:selector_rect.top, width:selector_rect.width, height:selector_rect.height, scale:1};
var ws_message =
chrome_step('Page.captureScreenshot',{format: format, quality: quality, clip: clip, fromSurface: fromSurface});
try {var ws_json = JSON.parse(ws_message); screenshot_data = ws_json.result.data;} catch(e) {screenshot_data = '';}
var fs = require('fs'); fs.write(filename,chrome.decode(screenshot_data),'wb');}}

@kensoh
Copy link
Member

kensoh commented Aug 8, 2024

If your goal is to capture the password maybe you can move the mouse using visual automation with r.hover() to the textbox on the left of the generated 4-digit number. Then add x,y offset to capture that spot containing the number using r.snap() which can take x,y coordinates.

@yucongshub
Copy link
Author

Hi @kensoh , thank you for your reply and solution. Because we plan to run in headless mode, we are currently using the method January 15 to temporarily solve the problem. However, we have encountered other issues during its implementation:

  1. Sometimes HTML cannot be exported using js. In most scenarios, this operation is successful. The code is
    htmlstr = r.dom('return document.body.outerHTML')
    but in some cases, it returns a null value. I also tried to use this code
    htmlstr = r.dom('return document.querySelector(\'iframe\').contentDocument.body.outerHTML')
    It also returns a null value. I find it strange because in the tagui_chrome.log file, I have seen the HTML content, but I don't know why it is not assigned to my code.

  2. Another issue involves clicking. At times, clicking on certain elements does not produce any effect. Upon examining tagui_chrome.log, it was observed that the element coordinates are displayed in the log, and the click is executed. However, there might be an issue with the position of the coordinate point, as illustrated in the accompanying figure.
    This represents the default Chrome window size. It is presumed that tagui consistently targets the center point of the element. It is noticeable that the mouse fails to click on the element.
    image

Nevertheless, in Figure 2, when the mouse is positioned slightly to the left of the center point, between two lines of text, it can click successfully.
image

The method to adjust the coordinate point offset is unknown. As a temporary solution, I adjusted the Chrome startup resolution to 1920*1080, resulting in the text being displayed in a single line, as depicted in Figure 3.
image

However, it remains uncertain if longer text that spans two lines would be unclickable under the 1920*1080 resolution. Is there a more effective approach to address this issue?

@kensoh
Copy link
Member

kensoh commented Aug 16, 2024

Hi @yucongshub,

  1. This is strange, here are some ideas you can try. Can you try r.wait() before running that step to see if it is because the webpage has not completed loading? Or use r.hover('some web element') to ensure that the page has completed loading. Another ways is write a for loop or while loop, to keep repeatedly running that dom() step until you can some content back. In the loop you can add a Python sleep() or r.wait(1) to add some delay.

  2. I think this is an edge case, from above, it sounds like the web element boundary is a rectangle, but due to the text being only in some part of the rectangle area, clicking on some empty spot without text will not work. The tool will click at the centre of the bounding rectangle of the web element. I think your solution is a good solution. Another way is to use dom() to directly modify that text to something shorter, before trying to click on that.

@kensoh
Copy link
Member

kensoh commented Aug 21, 2024

(closing issue for now, but will look out for your reply if any)

@kensoh kensoh closed this as completed Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants