Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git-xet client does not support HTTP proxy #45

Open
MiguelRodo opened this issue Mar 26, 2024 · 13 comments
Open

git-xet client does not support HTTP proxy #45

MiguelRodo opened this issue Mar 26, 2024 · 13 comments
Assignees

Comments

@MiguelRodo
Copy link

MiguelRodo commented Mar 26, 2024

Describe the bug
I run git xet clone --lazy "xet://<owner>/<repo>", but get the following errors:

2024-03-26T09:19:28.778329Z ERROR /root/.cargo/git/checkouts/xet-core-d6815bd7e1c74654/d26d249/rust/gitxetcore/src/merkledb_shard_plumb.rs:399: Error attempting to download shard default-merkledb/b9e0ffaa7e6ada42bd2dab1c4f706e529d5f20306acebafc1ba5362e4f8be7ad: InternalError(Real call failed: TonicTransportError(tonic::transport::Error(Transport, hyper::Error(Connect, Custom { kind: TimedOut, error: Elapsed(()) }))))
2024-03-26T09:20:28.780921Z ERROR /root/.cargo/git/checkouts/xet-core-d6815bd7e1c74654/d26d249/rust/gitxetcore/src/merkledb_shard_plumb.rs:399: Error attempting to download shard default-merkledb/b2d0e1bb459b39a6ba23523b62fad21ef5bea769cd8d6f742607ac31f870b688: InternalError(Real call failed: TonicTransportError(tonic::transport::Error(Transport, hyper::Error(Connect, Custom { kind: TimedOut, error: Elapsed(()) }))))

If you wait, you get more of them, with different hashes after default-merkledb.

If you cancel the command, then you end up with an incomplete repo and the statement Clone succeeded, but checkout failed.

If you don't cancel the command, then you end up with an almost-entirely-empty repo and this:

warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

Preparing to clone Xet repository.
Cloning into 'StoreACSCyTOFTCells'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (3/3), done.

Frequency
Every time.

Steps to reproduce
Run the clone command above.

Expected behavior
To clone the repo.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment

  • Internet connection: unsure, University of Cape Town HPC
  • Device: Linux server
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Address sizes: 45 bits physical, 48 bits virtual
    Byte Order: Little Endian
    CPU(s): 8
    On-line CPU(s) list: 0-7
    Vendor ID: GenuineIntel
    Model name: Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz
    CPU family: 6
    Model: 85
    Thread(s) per core: 1
    Core(s) per socket: 8
    Socket(s): 1
    Stepping: 0
    BogoMIPS: 6784.06
  • OS: HPC OS is Rocky, but this is running in an apptainer container (with an apptainer runtime), which is running Ubuntu Jammy.
  • Browser: N/A
  • CLI version: 0.13.0

Additional context
I've run git-xet install, so that config is fine.

@hoytak
Copy link

hoytak commented Mar 26, 2024

Hmmm.... It appears that there is something wrong with the network config or authentication. It's hard to tell what specifically, and one of the things we're working on is improving all the error messages.

If you don't mind, could you configure a logging dump and send it to us? To do this, run:

mkdir ~/xet_logs/
export XET_LOG_PATH="$HOME/xet_logs/xet_{timestamp}_{pid}.log"
export XET_LOG_LEVEL=debug

# Run all your commands

tar cjf xet_logs.tar.bz2 "$HOME/xet_logs/"

This creates a tarball with a log of your session. If you could email that to me at hoytak@xetdata.com, we will see what we can figure out.

@hoytak
Copy link

hoytak commented Mar 28, 2024

Hello @MiguelRodo,

Thanks for sending me the logs. First, it appears that the version you're using is a bit older and there's been a fix released in the last version that potentially fixes your issue. Could you try again with the latest version? However, if that doesn't fix it, we've got a few other possibilities to look in to.

Thanks!
-- Hoyt

hoytak added a commit to xetdata/xet-core that referenced this issue Mar 29, 2024
Possible fix for xetdata/xet-tools#45

Co-authored-by: Hoyt Koepke <hoytak@xethub.com>
@MiguelRodo
Copy link
Author

Thanks, will get to that Wednesday!

@MiguelRodo
Copy link
Author

MiguelRodo commented Apr 3, 2024

Same error as before, unfortunately. However, using the same container image I get the error only on the new HPC (we're transitioning), but not on the old HPC. I created new log files on both HPCs, using the updated image with the latest git-xet release, and sent them to Hoyt via email.

I see, though, that the latest release doesn't include the PR referenced here. Should I build git-xet from source then?

@MiguelRodo
Copy link
Author

I've run it using the newest version (build from source from xetdata/xet-core), but still get an error (at first glance looks the same to me).

Here's the version: git-xet 0.13.3-9a9506a.

Sent the log files to hoyt via email.

In case there's some issue with how I built it, or if it's of interest to anyone, here's the script (built into an apptainer container with ubuntu:22.04 as the base image):

#!/usr/bin/env bash
set -e
# install rust 
pushd /tmp
if [ -d "xet-core" ]; then
    rm -rf xet-core
fi
git clone https://github.com/xetdata/xet-core
cd xet-core
export RUSTUP_INIT_SKIP_PATH_CHECK=yes
apt-get install -y \
    curl \
    build-essential \
    pkg-config \
    libssl-dev \
    protobuf-compiler \
    clang \
    libclang-dev \
    cmake \
    libudev-dev \
    zlib1g-dev \
    libasound2-dev \
    libdbus-1-dev \
    libgtk-3-dev
curl --proto '=https' --tlsv1.2 -sSf -o rustup-init.sh https://sh.rustup.rs
sh rustup-init.sh -y
rm rustup-init.sh
cd rust
$HOME/.cargo/bin/cargo build --release
mkdir -p /usr/local/bin
cp -r target/release/. /usr/local/bin/
git-xet install
popd
echo "xet-core installed"

I guess that I don't need to copy literally everything in the release folder, but I wasn't sure so I did anyway.

Btw, your README is incorrect - the binary is in target/release and not targets/release (extra s added to targets).

@MiguelRodo
Copy link
Author

Note that I get (what looks like) the same error without the --lazy flag.

@hoytak
Copy link

hoytak commented Apr 9, 2024

Ah, thanks for the corrections on the target release stuff. And thanks for working with us on this. Several possible issues, but it does seem like an ssl config thing, as the system git process is able to connect and work but our https connection to the data servers isn't.

In building it from source, could you try building it with
$HOME/.cargo/bin/cargo build --release --features openssl_vendored

and see if that works?

Either way, thanks for working with us on this — we'll get to the bottom of this!

@MiguelRodo
Copy link
Author

Thanks!

I did so now (built from source with openssl vendored), unfortunately got what looks like the same error. Log files sent across.

@ylow ylow changed the title Error attempting to download shard default-merkledb git-xet client does not support HTTP proxy Apr 11, 2024
@hoytak
Copy link

hoytak commented Apr 19, 2024

Hello @MiguelRodo,

Unfortunately, part of our code paths do not properly support a proxy as is in your configuration. We're working on this now.

In the meantime, one option that would work in your situation is to use our AWS S3 endpoint to access your repo off of xethub.com. This would be through the aws cli, which does have full proxy support. We'll be rolling documentation and full support for this out next week. I'll circle back when it's out and make sure that that unblocks you.

Thanks again for working with us on this!

@MiguelRodo
Copy link
Author

Hi @hoytak , no problem, and thanks for the workaround! I'll keep eyes peeled.

@hoytak
Copy link

hoytak commented May 1, 2024

Hello @MiguelRodo, we now have the ability to interact with a repository as an S3 backend, which means you can use the aws s3 cli to load all your files for distributed compute. The instructions to set that up are at https://xethub.com/assets/docs/advanced/xs3, but it hopefully should be pretty painless. The aws cli has great support for proxies -- see https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-proxy.html -- which hopefully can allow you to use XetHub until we roll out full proxy support. Let us know if you can get this working for your use case, and I'd be curious if you hit any problems on the way.

@MiguelRodo
Copy link
Author

Thanks, @hoytak, will try it and let you know if I run into issues!

@MiguelRodo
Copy link
Author

Hi @hoytak. Sorry this took so long, but I can now confirm this works!

I see you are migrating to HuggingFace. Congratulations! Will these repos be closed, so we then bring up issues elsewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants