Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

更新Julia mirror #311

Closed
Roger-luo opened this issue Jan 20, 2020 · 31 comments
Closed

更新Julia mirror #311

Roger-luo opened this issue Jan 20, 2020 · 31 comments

Comments

@Roger-luo
Copy link

Julia从1.4开始会采用一个新的package server,目前它配套的docker镜像已经全部打包好了,这里面包括了完整的服务器前端etc. https://github.com/staticfloat/PkgServerS3Mirror 只需要部署下应该就可以工作了

cc: @staticfloat

@zhsj
Copy link
Contributor

zhsj commented Jan 20, 2020

不熟悉julia的情况,但是我看到你给的repo里,是一些nginx反代的配置,这在ustclug mirrors服务器上不会使用的。反代的话,我们有统一的nginx配置(如 mirrors 首页上列出的一系列“反向代理列表”)。

@zhsj
Copy link
Contributor

zhsj commented Jan 20, 2020

简单看了下新的 https://github.com/JuliaPackaging/PkgServer.jl ,我觉得这不是我们会使用的架构。我们更偏向原来的 https://github.com/sunoru/julia-mirror 的方式。

@staticfloat
Copy link

staticfloat commented Jan 20, 2020

My Chinese is not good enough to reply in Chinese, but I can explain a bit more in English what's going on here:

The most important thing here is the PkgServer.jl; this will become the new, default, way that packages and artifacts are served to Julia 1.5+. (It will be opt-in in Julia 1.4, and default in Julia 1.5+). It has many advantages over the old system, where all packages are fetched many different servers. At the moment, all it does is serve package and artifact full versions similarly to how you'd get them off of GitHub, but we have many enhancements in the works such as intelligent diff bundles and such, that will minimize the amount of downloading clients need to do.

A PkgServer.jl deployment automatically caches objects locally, so it serves as a kind of edge-cache. We are planning on deploying them in multiple places worldwide, but since cloud hosting in China is more complex than in other parts of the world, we're not quite able to deploy there as we'd like to. Ideally, we will have a PkgServer in many different parts of the world, providing high-speed, cached package versions for all users, and the main pkg.julialang.org endpoint will forward users to more localized versions based on their source IP.

The nginx configuration in PkgServer.jl is simply an HTTPS terminator. You can ignore that, it doesn't matter. The PkgServerS3Mirror allows mirroring of an s3 bucket, for downloading Julia itself (instead of packages or artifacts). Since you already have the julia-mirror, that's not so necessary, but it might be simpler than the mirror_julia.py script that exists (since it's all automatic, with no need for configuration or running it ahead-of-time)

@zhsj
Copy link
Contributor

zhsj commented Jan 21, 2020

A PkgServer.jl deployment automatically caches objects locally, so it serves as a kind of edge-cache.

I understand the intention behind to develop PkgServer.jl. It helps you to start a standalone server quickly. But that's not what we do on mirrors.ustc, which is a shared server for many OSS projects. What we preferred(however not hard limit..) is a cron-job like script to sync upstream frequently. We don't like a separate daemon to run on our server. And we don't prefer cache/proxy server, since it doesn't work well as we observed. (we do have proxy services...)

We are planning on deploying them in multiple places worldwide, but since cloud hosting in China is more complex than in other parts of the world, we're not quite able to deploy there as we'd like to. Ideally, we will have a PkgServer in many different parts of the world, providing high-speed, cached package versions for all users, and the main pkg.julialang.org endpoint will forward users to more localized versions based on their source IP.

If you find it's hard to deploy in mainland China. I suggest you to deploy at Hong Kong. Many cloud providers have Data-center in Hong Kong(like Google Cloud, asia-east2 region). And the network speed to HK pop is sufficient for most users in China.

@johnnychen94
Copy link

如果要抛弃PkgServer的特性的话,搭建一个静态服务器也是可以做到的,目前的话有一个比较粗糙的脚本来下载所需要的所有资源,这种方式的话可以设置成cron,将下载下来的东西存好之后大概是这个样子

.
└── julia
    ├── artifact
    ├── package
    ├── registries
    ├── registry
    └── releases

然后假如说提供出来的是http://mirrors.ustc.edu.cn/julia/的话,在用户端也是可以使用镜像的(目前会有警告,JuliaLang/Pkg.jl#1671 )

JULIA_PKG_SERVER=https://mirrors.ustc.edu.cn/julia/ julia

只是这个方案可能后期会像pypi一样迅速膨胀(目前已经有100G左右的数据了)

这种方案能接受么?如果能的话,我可以花一些时间把这个脚本打磨一下。

@zhsj
Copy link
Contributor

zhsj commented Feb 12, 2020

只是这个方案可能后期会像pypi一样迅速膨胀(目前已经有100G左右的数据了)

我觉得 100G 并不大。相比 pypi 几T的大小来说。。。但是后面再增长的话,可能会重新考虑,像 pypi 这样。

反代并不是不能接受,我前面也提到我们有反代的服务,比如 ubuntu ppa, npm, cargo 等。我们反代的服务器在日本,国内链接速度有时候并不好(至少我家里的网经常连不上)。所以并不会提升多少体验(最多只是从连不上变成能连上)。

这种方案能接受么?如果能的话,我可以花一些时间把这个脚本打磨一下。

@zhsj
Copy link
Contributor

zhsj commented Feb 12, 2020

我觉得从julia社区角度出发的话,套一个cloudflare cdn是最方便的(因为cloudflare国内速度还可以,比fastly这些好非常多);如果能让国内的某个成员注册一个域名,并备案的话,套一个百度云加速(即cloudflare国内节点)会更方便。。

@johnnychen94
Copy link

johnnychen94 commented Feb 12, 2020

关于julia-mirror有一个问题是julia版本长期没有更新了,因为它需要手动更新releaseinfo.json, 我这里有一个自动一点的版本jill.py,它并不完全替代julia-mirror,只会把所有的julia 1.0之后的版本给下载下来. 如果可以的话能不能用这个来更新一下 http://mirrors.ustc.edu.cn/julia/releases/ ? 放在同一个cron里应该就好了

默认的文件的存储格式和julia-mirror是一致的:jill mirror <outpath>

目前的话是6.2G,增长应该会比较慢

@Roger-luo
Copy link
Author

我们现在有备案的域名 juliacn.com ,但是套一个百度云是什么情况?

@zhsj
Copy link
Contributor

zhsj commented Feb 12, 2020

我们现在有备案的域名 juliacn.com ,但是套一个百度云是什么情况?

试一下这个?https://su.baidu.com/ (发现好久不关注,百度云加速免费版限制每日50G流量了,以前还拿它给mirrors.ustc分担流量来着。。)

@zhsj
Copy link
Contributor

zhsj commented Feb 12, 2020

关于julia-mirror有一个问题是julia版本长期没有更新了,因为它需要手动更新releaseinfo.json

你指缺少 1.3.1 和(v1.4.0-rc1)?

@johnnychen94
Copy link

johnnychen94 commented Feb 12, 2020

关于julia-mirror有一个问题是julia版本长期没有更新了,因为它需要手动更新releaseinfo.json

你指缺少 1.3.1 和(v1.4.0-rc1)?

对的,虽然可以通过手动更新releaseinfo.json来做到,但是始终显得有一些麻烦... 另外就是这里把早期版本给删除了,这个对于静态存储来说就有点不是很可靠... 虽然也不是什么大问题...

@sunoru
Copy link

sunoru commented Feb 12, 2020

嗯关于 julia-mirror 里 releaseinfo.json 的问题,其实用 scripts/make_releaseinfo.py 就可以自己更新。我确实该让它自动更新(或者至少提醒我去手动更新)……

@johnnychen94

This comment has been minimized.

@johnnychen94
Copy link

@staticfloat Out of curiosity, is it possible to expose a rsync protocol of pkg.julialang.org to the public? That would significantly simplify the setup of the mirror sites. s3 bucket, if accessible, would be helpful, too.

@staticfloat
Copy link

I'm a little wary of allowing non-HTTP methods, as we have synchronization locks and whatnot within the HTTP server to ensure that, even while we're updating files, you never get a half-baked file. If we provided alternative methods (such as rsync) it's possible the rsync process can get a half-written file. That's solvable, but why do you want to use rsync? You may end up pulling files that are no longer reachable from the registry and whatnot that we want to keep around for paranoia's sake, but which are most likely not needed by your local cache. I would think it would be better for you to just walk the registry and download everything that is reachable (similar to how the gen_static.jl script works).

@johnnychen94
Copy link

bump @zhsj

@johnnychen94
Copy link

更新:

基于上面提到的 StorageServer.jl 的北外镜像站已经搭建起来了:https://mirrors.bfsu.edu.cn/help/julia/

@johnnychen94
Copy link

johnnychen94 commented Aug 29, 2020

@zhsj Any plans to update this mirror?

With https://github.com/johnnychen94/StorageMirrorServer.jl this should be pretty easy to set up. The only issue is that network connection to upstream storage server might not be that stable and fast from mainland China.

Currently, BFSU, TUNA, and SJTUG mirrors are built with this tool.

FWIW, StorageMirrorServer does not provide Julia binary releases http://mirrors.ustc.edu.cn/julia/releases/, which could be easily set up with aws s3 sync.

@taoky
Copy link
Member

taoky commented Aug 30, 2020

jill.py 是一个一键安装julia的工具,它同时提供了 julia-mirror 里下载julia二进制的功能,重点在于它会自动发现新版本.....

默认的配置是跟现有的一致,所以只需要 jill mirror /path/to/mirrors/julia/releases 就可以了

我刚刚在本地测试使用 jill mirror <path> 同步 julia 的 releases,发现同步的目录结构和 https://mirrors.bfsu.edu.cn/julia-releases/ 中的差别比较大。

bash-4.4# tree                                                                                                           
.
└── releases
    ├── v0.6
    │   ├── julia-0.6.3-freebsd-x86_64.tar.gz
    │   └── julia-0.6.3-freebsd-x86_64.tar.gz.asc
    ├── v0.7
    │   ├── julia-0.7.0-freebsd-x86_64.tar.gz
    │   └── julia-0.7.0-freebsd-x86_64.tar.gz.asc
    ├── v1.0
    │   ├── julia-1.0.0-freebsd-x86_64.tar.gz
    │   ├── julia-1.0.0-freebsd-x86_64.tar.gz.asc
    │   ├── julia-1.0.1-freebsd-x86_64.tar.gz
    │   ├── julia-1.0.1-freebsd-x86_64.tar.gz.asc
    │   ├── julia-1.0.2-freebsd-x86_64.tar.gz
    │   ├── julia-1.0.2-freebsd-x86_64.tar.gz.asc
    │   ├── julia-1.0.3-freebsd-x86_64.tar.gz
    │   ├── julia-1.0.3-freebsd-x86_64.tar.gz.asc
    │   ├── julia-1.0.4-freebsd-x86_64.tar.gz
    │   ├── julia-1.0.4-freebsd-x86_64.tar.gz.asc
    │   ├── julia-1.0.5-freebsd-x86_64.tar.gz
    │   └── julia-1.0.5-freebsd-x86_64.tar.gz.asc
    ├── v1.1
    │   ├── julia-1.1.0-freebsd-x86_64.tar.gz
    │   ├── julia-1.1.0-freebsd-x86_64.tar.gz.asc
    │   ├── julia-1.1.1-freebsd-x86_64.tar.gz
    │   └── julia-1.1.1-freebsd-x86_64.tar.gz.asc
(以下省略)

这是预期的吗?

@johnnychen94
Copy link

johnnychen94 commented Aug 30, 2020

jill mirror 的同步与当前 PkgMirror 给出的结构一致 http://mirrors.ustc.edu.cn/julia/releases/

如果需要与 BFSU 的 julia-releases 一致的话,需要利用 aws s3 sync 来做。我不太确定这个应该怎么操作,大概是类似于这样

aws s3 sync s3://julialang2 /mnt/mirrors/julia/julialang2

这个s3 bucket 是 us-east-1 这个区域的

尽量还是用 aws s3 sync 来做吧,jill mirror 这个功能我可能后期会考虑删掉(比较累赘... 当时写的时候不知道 aws s3 sync 这个工具...)

@johnnychen94
Copy link

johnnychen94 commented Aug 30, 2020

目前TUNA和SJTUG的同步方式是:

@taoky
Copy link
Member

taoky commented Oct 4, 2020

https://mirrors.ustc.edu.cn/julia/ 使用 StorageMirrorServer.jl 的镜像(正在初次同步中,可能还需要花掉一些时间才能正式使用)

https://mirrors.ustc.edu.cn/julia-legacy/ 原先的 Julia 旧镜像

https://mirrors.ustc.edu.cn/julia-releases/ Releases 目录(同步自 s3://julialang2

johnnychen94 added a commit to johnnychen94/jill.py that referenced this issue Oct 4, 2020
@johnnychen94
Copy link

中文社区在国内目前有几台交给 Julia 官方统一维护的 pkgserver(缓存服务器)即 https://pkg.julialang.org,在这里征求一下你们的意见能否将USTC也添加到上游中。

大概的情况是:

  • 所有国内的 Julia 用户在不配置镜像的情况下会默认使用这一套缓存服务器
  • 不需要 USTC 这边作出其他额外的维护性工作,也不需要可靠性保证。只是为了加速国内一般用户的访问和下载速度。
  • Pkgserver 相当于作了代理,所以镜像站这边收集到的用户数据(如果有这个需求的话)可能会降低。

CRef: tuna/issues#878

@johnnychen94
Copy link

https://mirrors.ustc.edu.cn/julia-legacy/ 原先的 Julia 旧镜像

PkgMirrors 硬编码了镜像URL,而 PkgMirrors 应该已经停止维护了,所以大概可以直接删除。

cc: @sunoru

@johnnychen94
Copy link

观察了一下似乎是每天同步一次,可以将 julia 这个的同步的频率稍微调高一些么,比如说2-4小时

@taoky
Copy link
Member

taoky commented Oct 11, 2020

观察了一下似乎是每天同步一次,可以将 julia 这个的同步的频率稍微调高一些么,比如说2-4小时

已经调整到每 4 小时同步一次了。

@taoky
Copy link
Member

taoky commented Oct 11, 2020

中文社区在国内目前有几台交给 Julia 官方统一维护的 pkgserver(缓存服务器)即 https://pkg.julialang.org,在这里征求一下你们的意见能否将USTC也添加到上游中。

大概的情况是:

  • 所有国内的 Julia 用户在不配置镜像的情况下会默认使用这一套缓存服务器
  • 不需要 USTC 这边作出其他额外的维护性工作,也不需要可靠性保证。只是为了加速国内一般用户的访问和下载速度。
  • Pkgserver 相当于作了代理,所以镜像站这边收集到的用户数据(如果有这个需求的话)可能会降低。

CRef: tuna/issues#878

嗯,没问题。

@johnnychen94
Copy link

除了 julia-legacy 到时候需要移除以外这个 issue 应该没有什么其他要做的工作了。

@sunoru
Copy link

sunoru commented Oct 29, 2020

辛苦了辛苦了

(抱歉回复晚了——

嗯既然有了官方的 pkgserver 和新的包管理/存储协议,StorageMirrorServer.jl 看上去很棒,PkgMirrors.jl 确实可以停止维护了。

@taoky
Copy link
Member

taoky commented Apr 3, 2021

除了 julia-legacy 到时候需要移除以外这个 issue 应该没有什么其他要做的工作了。

Julia 1.6 LTS 已正式发布,julia-legacy 镜像已删除。

@taoky taoky closed this as completed Apr 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants